F2LLM-v2-80M/README.md

---
license: apache-2.0
language:
- en
- zh
- ru
- es
- fr
- de
- ar
- nl
- vi
- hi
- ko
- ja
- it
- id
- pt
- pl
- tr
- da
- th
- sv
- fa
- uk
- cs
- 'no'
- el
- ca
- ro
- fi
- bg
- tl
- gl
- my
- hy
- km
- ne
- hu
- eu
- he
- lo
- sw
- az
- lv
- si
- sk
- tg
- et
- lt
- ms
- hr
- is
- sl
- sr
- ur
- bn
- af
- ta
- ka
- te
- ml
- mn
- nn
- kk
- cy
- mr
- sq
- nb
- mk
- jv
- kn
- eo
- la
- gu
- uz
- am
- oc
- be
- mg
- vo
- pa
- lb
- ht
- br
- ga
- xh
- tt
- bs
- yo
base_model:
- codefuse-ai/F2LLM-v2-0.6B-Preview-Pruned-80M
pipeline_tag: feature-extraction
library_name: transformers
tags:
- sentence-transformers
datasets:
- codefuse-ai/F2LLM-v2
---

# F2LLM-v2-80M

F2LLM-v2 is a family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a curated composite of 60 million publicly available high-quality data, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages.

F2LLM-v2 is fully open. We release base models in 5 sizes, instruct models in 8 sizes, the training data, the training code, and intermediate checkpoints. The three smallest instruct models are pruned and trained from the 0.6B base model.

| Model | Base                                                                                | Instruct                                                            |
| ----- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------------- |
| 80M   |                                                                                     | [🤗F2LLM-v2-80M](https://huggingface.co/codefuse-ai/F2LLM-v2-80M)   |
| 160M  |                                                                                     | [🤗F2LLM-v2-160M](https://huggingface.co/codefuse-ai/F2LLM-v2-160M) |
| 330M  |                                                                                     | [🤗F2LLM-v2-330M](https://huggingface.co/codefuse-ai/F2LLM-v2-330M) |
| 0.6B  | [🤗F2LLM-v2-0.6B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-0.6B-Preview) | [🤗F2LLM-v2-0.6B](https://huggingface.co/codefuse-ai/F2LLM-v2-0.6B) |
| 1.7B  | [🤗F2LLM-v2-1.7B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-1.7B-Preview) | [🤗F2LLM-v2-1.7B](https://huggingface.co/codefuse-ai/F2LLM-v2-1.7B) |
| 4B    | [🤗F2LLM-v2-4B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-4B-Preview)     | [🤗F2LLM-v2-4B](https://huggingface.co/codefuse-ai/F2LLM-v2-4B)     |
| 8B    | [🤗F2LLM-v2-8B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-8B-Preview)     | [🤗F2LLM-v2-8B](https://huggingface.co/codefuse-ai/F2LLM-v2-8B)     |
| 14B   | [🤗F2LLM-v2-14B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-14B-Preview)   | [🤗F2LLM-v2-14B](https://huggingface.co/codefuse-ai/F2LLM-v2-14B)   |

## Usage

### With Sentence Transformers

To encode text with the [Sentence Transformers](https://www.sbert.net/) library:

```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("codefuse-ai/F2LLM-v2-80M", device="cuda:0", model_kwargs={"torch_dtype": "bfloat16"})
# Some sample query and documents
query = "What is F2LLM used for?"
documents = [
    'We present F2LLM, a family of fully open embedding LLMs that achieve a strong balance between model size, training data, and embedding performance.',
    'F2LLM is a model for computing text embeddings that can be used for various NLP tasks such as information retrieval, semantic search, and text classification.',
    'F2LLM 是 CodeFuse 开源的系列嵌入模型。',
    'F2LLM — это модель вычисления встраивания текста, которую можно использовать для различных задач НЛП, таких как поиск информации, семантический поиск и классификация текста.'
]
# Encode the query and documents separately. The encode_query method uses the query prompt
query_embedding = model.encode_query(query)
document_embeddings = model.encode_document(documents)
print(query_embedding.shape, document_embeddings.shape)
# (320,) (4, 320)
# Compute cosine similarity between the query and documents
similarity = model.similarity(query_embedding, document_embeddings)
print(similarity)
# tensor([[0.6968, 0.7818, 0.7165, 0.8374]])
```

### With Transformers

Or directly with the [Transformers](https://huggingface.co/docs/transformers/index) library:

```python
from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F
model_path = "codefuse-ai/F2LLM-v2-80M"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map={'': 0})
query = "What is F2LLM used for?"
query_prompt = "Instruct: Given a question, retrieve passages that can help answer the question.\nQuery: "
documents = [
    'We present F2LLM, a family of fully open embedding LLMs that achieve a strong balance between model size, training data, and embedding performance.',
    'F2LLM is a model for computing text embeddings that can be used for various NLP tasks such as information retrieval, semantic search, and text classification.',
    'F2LLM 是 CodeFuse 开源的系列嵌入模型。',
    'F2LLM — это модель вычисления встраивания текста, которую можно использовать для различных задач НЛП, таких как поиск информации, семантический поиск и классификация текста.'
]
def encode(sentences):
    batch_size = len(sentences)
    # the tokenizer will automatically add eos token
    tokenized_inputs = tokenizer(sentences, padding=True, return_tensors='pt').to(model.device)
    last_hidden_state = model(**tokenized_inputs).last_hidden_state
    eos_positions = tokenized_inputs.attention_mask.sum(dim=1) - 1
    embeddings = last_hidden_state[torch.arange(batch_size, device=model.device), eos_positions]
    embeddings = F.normalize(embeddings, p=2, dim=1)
    return embeddings
# Encode the query and documents
query_embedding = encode([query_prompt + query])
document_embeddings = encode(documents)
print(query_embedding.shape, document_embeddings.shape)
# torch.Size([1, 320]) torch.Size([4, 320])
# Compute cosine similarity between the query and documents
similarity = query_embedding @ document_embeddings.T
print(similarity)
# tensor([[0.6914, 0.7812, 0.7148, 0.8359]], device='cuda:0',
#        dtype=torch.bfloat16, grad_fn=<MmBackward0>)
```

## Intermediate Checkpoints

To facilitate future research, we release intermediate checkpoints in the `intermediate_checkpoints` branch.

## Citation

```
@misc{f2llm-v2,
      title={F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World},
      author={Ziyin Zhang and Zihan Liao and Hang Yu and Peng Di and Rui Wang},
      year={2026},
      eprint={2603.19223},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.19223},
}
```