Model: LiquidAI/LFM2.5-1.2B-JP-202606 Source: Original Platform
language, library_name, pipeline_tag, tags, license, license_name, license_link, arxiv, base_model
| language | library_name | pipeline_tag | tags | license | license_name | license_link | arxiv | base_model | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
transformers | text-generation |
|
other | lfm1.0 | LICENSE |
|
|
🇯🇵 LFM2.5-1.2B-JP-202606
LFM2.5-1.2B-JP-202606 is our latest general purpose Japanese chat model, delivering significant improvements in knowledge, instruction following, math, code, and tool-use over both the models of comparable size and LFM2.5-1.2B-JP. It sets a new benchmark for state-of-the-art performance in Japanese language understanding. Ideal for developers building Japanese-language applications where cultural and linguistic nuance matter.
LFM2.5-1.2B-JP-202606 は、当社の最新の汎用日本語チャットモデルです。知識、指示追従、数学、コード、ツール使用の各領域において、同規模の他モデルおよび LFM2.5-1.2B-JP の双方を大幅に上回る改善を実現しています。日本語全般における最高水準のベンチマーク性能を発揮します。 文化的・言語的なニュアンスが重要となる日本語アプリケーションを構築する開発者に最適です。
Find more information about LFM2.5 in our blog post.
📊 Performance
We compared LFM2.5-1.2B-JP-202606 with relevant sub-2B models on a diverse suite of benchmarks.
| Model | Size | Knowledge | Instruction Following | Math | Code | Tool Use | Domain Avg | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| JMMLU‑ProX | JMMLU | JCulture | JGPQA | Avg | J‑MIFEval | JFBench1 | Avg | J‑GSM8K | J‑MATH500 | Avg | JHumanEval+ | J‑BFCLv32 | |||
| LFM2.5‑1.2B‑JP‑202606 | 1.2B | 36.23 | 54.19 | 35.77 | 28.69 | 38.72 | 79.08 | 54.77 | 66.93 | 62.20 | 62.80 | 62.50 | 49.39 | 48.00 | 53.11 |
| LFM2.5‑1.2B‑Instruct | 1.2B | 31.42 | 47.61 | 28.42 | 31.72 | 34.79 | 40.44 | 36.67 | 38.56 | 50.20 | 50.00 | 50.10 | 28.66 | 46.29 | 39.68 |
| Qwen3‑1.7B (Instruct) | 1.7B | 30.78 | 47.67 | 33.33 | 26.26 | 34.51 | 40.29 | 36.61 | 38.45 | 46.00 | 56.40 | 51.20 | 47.56 | 52.45 | 44.83 |
| Granite‑4.0‑1B | 1.5B | 15.32 | 33.93 | 34.38 | 24.44 | 27.02 | 27.56 | 31.26 | 29.41 | 42.80 | 25.40 | 34.10 | 51.22 | 50.57 | 38.46 |
| Llama‑3.2‑1B‑Instruct | 1.2B | 15.91 | 33.97 | 22.52 | 32.32 | 26.18 | 24.10 | 21.78 | 22.94 | 25.20 | 11.40 | 18.30 | 17.68 | 21.06 | 21.23 |
| Gemma‑3‑1B‑it | 1.0B | 14.12 | 34.45 | 23.42 | 24.24 | 24.06 | 26.31 | 31.15 | 28.73 | 33.60 | 15.60 | 24.60 | 25.00 | 17.26 | 23.93 |
| sarashina2.2‑1b‑instruct‑v0.1 | 1.4B | 18.3 | 40.24 | 25.53 | 26.26 | 27.58 | 21.9 | 27.41 | 24.66 | 44.4 | 24.8 | 34.60 | 21.95 | 13.86 | 24.53 |
| TinySwallow‑1.5B‑Instruct | 1.5B | 21.51 | 47.98 | 31.17 | 29.29 | 32.49 | 36.55 | 34.25 | 35.40 | 47.2 | 22.4 | 34.80 | 26.83 | 11.7 | 28.24 |
| llm‑jp‑3.1‑1.8b‑instruct4 | 1.9B | 17.44 | 43.05 | 27.42 | 17.68 | 26.40 | 33.77 | 30.92 | 32.35 | 52.8 | 17.0 | 34.90 | 35.37 | 11.76 | 28.16 |
| RakutenAI‑2.0‑mini‑instruct | 1.5B | 11.46 | 31.84 | 29.67 | 22.22 | 23.80 | 28.06 | 24.66 | 26.36 | 24.8 | 11.4 | 18.10 | 28.6 | 11.85 | 21.74 |
1 JFBench is evaluated using single-instruction prompts.
2 quickTestingOSSHandler is used for models that do not support function calling (sarashina2.2‑1b‑instruct‑v0.1, TinySwallow‑1.5B‑Instruct, llm‑jp‑3.1‑1.8b‑instruct4, and RakutenAI‑2.0‑mini‑instruct).
🗒️ Model Details
| Model | Parameters | Description |
|---|---|---|
| LFM2.5-1.2B-Base | 1.2B | Pre-trained base model for fine-tuning |
| LFM2.5-1.2B-Instruct | 1.2B | General-purpose instruction-tuned model |
| LFM2.5-1.2B-Thinking | 1.2B | General-purpose reasoning model |
| LFM2.5-1.2B-JP-202606 | 1.2B | Japanese-capable chat model |
| LFM2.5-VL-1.6B | 1.6B | Vision-language model with fast inference |
| LFM2.5-Audio-1.5B | 1.5B | Audio-language model for speech and text I/O |
| LFM2.5-Audio-1.5B-JP | 1.5B | Japanese-capable audio model for speech and text I/O |
LFM2.5-1.2B-JP-202606 is a general-purpose text-only model with the following features:
- Number of parameters: 1.17B
- Number of layers: 16 (10 double-gated LIV convolution blocks + 6 GQA blocks)
- Training budget: 31.5T tokens
- Context length: 32,768 tokens
- Vocabulary size: 65,536
- Knowledge cutoff: Mid-2024
- Languages: English, Japanese
- Generation parameters:
temperature: 0.1top_k: 50repetition_penalty: 1.05
| Model | Description |
|---|---|
| LFM2.5-1.2B-JP-202606 | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM. |
| LFM2.5-1.2B-JP-202606-GGUF | Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage. |
| LFM2.5-1.2B-JP-202606-ONNX | ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile). |
| LFM2.5-1.2B-JP-202606-MLX | MLX format for Apple Silicon. Optimized for fast inference on Mac devices using the MLX framework. |
We recommend using it for agentic workflows, tool use, structured outputs, bilingual English–Japanese assistants, and on-device personal-assistant applications. It is not recommended for knowledge-intensive tasks. It performs best when given clear, explicit instructions that define the task, expected behavior, and output format.
エージェント型ワークフロー、ツール使用、構造化出力、日英バイリンガルアシスタント、オンデバイスのパーソナルアシスタントでの利用を推奨します。一方で、詳細な知識を要するのタスクには推奨されません。タスク内容、期待される動作、出力形式を明確かつ具体的に指示することで、最も高い性能を発揮します。
Chat Template
LFM2.5 uses a ChatML-like format. See the Chat Template documentation for details. Example:
<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
日本の首都は?<|im_end|>
<|im_start|>assistant
You can use tokenizer.apply_chat_template() to format your messages automatically.
Tool Use
LFM2.5 supports function calling as follows:
- Function definition: We recommend providing the list of tools as a JSON object in the system prompt. You can also use the
tokenizer.apply_chat_template()function with tools. - Function call: By default, LFM2.5 writes Pythonic function calls (a Python list between
<|tool_call_start|>and<|tool_call_end|>special tokens), as the assistant answer. You can override this behavior by asking the model to output JSON function calls in the system prompt. - Function execution: The function call is executed, and the result is returned as a "tool" role.
- Final answer: LFM2 interprets the outcome of the function call to address the original user prompt in plain text.
See the Tool Use documentation for the full guide. Example:
<|startoftext|><|im_start|>system
List of tools: [{"name": "get_candidate_status", "description": "採用プロセスにおける候補者の現在のステータスを取得します", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "候補者の一意の識別子"}}, "required": ["candidate_id"]}}]<|im_end|>
<|im_start|>user
候補者ID 12345 の現在のステータスは何ですか?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>候補者ID 12345 の現在のステータスを確認しています。<|im_end|>
<|im_start|>tool
[{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|>
<|im_start|>assistant
ID 12345 の候補者は現在、Clinical Research Associate のポジションで「面接予定」の段階にあり、面接日は 2023年11月20日に設定されています。<|im_end|>
🏃 Inference
LFM2.5 is supported by many inference frameworks. See the Inference documentation for the full list.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| Transformers | Simple inference with direct access to model internals. | Link | ![]() |
| vLLM | High-throughput production deployments with GPU. | Link | ![]() |
| llama.cpp | Cross-platform inference with CPU offloading. | Link | ![]() |
| MLX | Apple's machine learning framework optimized for Apple Silicon. | Link | — |
| LM Studio | Desktop application for running LLMs locally. | Link | — |
Here's a quick start example with Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LFM2.5-1.2B-JP-202606"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "日本の首都は?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.1,
top_k=50,
repetition_penalty=1.05,
max_new_tokens=512,
streamer=streamer,
)
🔧 Fine-Tuning
We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| CPT (Unsloth) | Continued Pre-Training using Unsloth for text completion. | Link | ![]() |
| CPT (Unsloth) | Continued Pre-Training using Unsloth for translation. | Link | ![]() |
| SFT (Unsloth) | Supervised Fine-Tuning with LoRA using Unsloth. | Link | ![]() |
| SFT (TRL) | Supervised Fine-Tuning with LoRA using TRL. | Link | ![]() |
| DPO (TRL) | Direct Preference Optimization with LoRA using TRL. | Link | ![]() |
| GRPO (Unsloth) | GRPO with LoRA using Unsloth. | Link | ![]() |
| GRPO (TRL) | GRPO with LoRA using TRL. | Link | ![]() |
📬 Contact
- Got questions or want to connect? Join our Discord community
- If you are interested in custom solutions with edge deployment, please contact our sales team.
Citation
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}



