LiquidAI/LFM2.5-1.2B-JP-202606

Go to file

ModelHub XC c64c2a19c9 初始化项目，由ModelHub XC社区提供模型

Model: LiquidAI/LFM2.5-1.2B-JP-202606
Source: Original Platform

2026-06-10 05:55:17 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

LICENSE

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-06-10 05:55:17 +08:00

README.md

language, library_name, pipeline_tag, tags, license, license_name, license_link, arxiv, base_model

language

library_name

pipeline_tag

🇯🇵 LFM2.5-1.2B-JP-202606

LFM2.5-1.2B-JP-202606 is our latest general purpose Japanese chat model, delivering significant improvements in knowledge, instruction following, math, code, and tool-use over both the models of comparable size and LFM2.5-1.2B-JP. It sets a new benchmark for state-of-the-art performance in Japanese language understanding. Ideal for developers building Japanese-language applications where cultural and linguistic nuance matter.

LFM2.5-1.2B-JP-202606 は、当社の最新の汎用日本語チャットモデルです。知識、指示追従、数学、コード、ツール使用の各領域において、同規模の他モデルおよび LFM2.5-1.2B-JP の双方を大幅に上回る改善を実現しています。日本語全般における最高水準のベンチマーク性能を発揮します。文化的・言語的なニュアンスが重要となる日本語アプリケーションを構築する開発者に最適です。

Find more information about LFM2.5 in our blog post.

📊 Performance

We compared LFM2.5-1.2B-JP-202606 with relevant sub-2B models on a diverse suite of benchmarks.

Model	Size	Knowledge					Instruction Following			Math			Code	Tool Use	Domain Avg
Model	Size	JMMLU‑ProX	JMMLU	JCulture	JGPQA	Avg	J‑MIFEval	JFBench¹	Avg	J‑GSM8K	J‑MATH500	Avg	JHumanEval+	J‑BFCLv3²	Domain Avg
LFM2.5‑1.2B‑JP‑202606	1.2B	36.23	54.19	35.77	28.69	38.72	79.08	54.77	66.93	62.20	62.80	62.50	49.39	48.00	53.11
LFM2.5‑1.2B‑Instruct	1.2B	31.42	47.61	28.42	31.72	34.79	40.44	36.67	38.56	50.20	50.00	50.10	28.66	46.29	39.68
Qwen3‑1.7B (Instruct)	1.7B	30.78	47.67	33.33	26.26	34.51	40.29	36.61	38.45	46.00	56.40	51.20	47.56	52.45	44.83
Granite‑4.0‑1B	1.5B	15.32	33.93	34.38	24.44	27.02	27.56	31.26	29.41	42.80	25.40	34.10	51.22	50.57	38.46
Llama‑3.2‑1B‑Instruct	1.2B	15.91	33.97	22.52	32.32	26.18	24.10	21.78	22.94	25.20	11.40	18.30	17.68	21.06	21.23
Gemma‑3‑1B‑it	1.0B	14.12	34.45	23.42	24.24	24.06	26.31	31.15	28.73	33.60	15.60	24.60	25.00	17.26	23.93
sarashina2.2‑1b‑instruct‑v0.1	1.4B	18.3	40.24	25.53	26.26	27.58	21.9	27.41	24.66	44.4	24.8	34.60	21.95	13.86	24.53
TinySwallow‑1.5B‑Instruct	1.5B	21.51	47.98	31.17	29.29	32.49	36.55	34.25	35.40	47.2	22.4	34.80	26.83	11.7	28.24
llm‑jp‑3.1‑1.8b‑instruct4	1.9B	17.44	43.05	27.42	17.68	26.40	33.77	30.92	32.35	52.8	17.0	34.90	35.37	11.76	28.16
RakutenAI‑2.0‑mini‑instruct	1.5B	11.46	31.84	29.67	22.22	23.80	28.06	24.66	26.36	24.8	11.4	18.10	28.6	11.85	21.74

¹ JFBench is evaluated using single-instruction prompts.
² quickTestingOSSHandler is used for models that do not support function calling (sarashina2.2‑1b‑instruct‑v0.1, TinySwallow‑1.5B‑Instruct, llm‑jp‑3.1‑1.8b‑instruct4, and RakutenAI‑2.0‑mini‑instruct).

🗒️ Model Details

Model	Parameters	Description
LFM2.5-1.2B-Base	1.2B	Pre-trained base model for fine-tuning
LFM2.5-1.2B-Instruct	1.2B	General-purpose instruction-tuned model
LFM2.5-1.2B-Thinking	1.2B	General-purpose reasoning model
LFM2.5-1.2B-JP-202606	1.2B	Japanese-capable chat model
LFM2.5-VL-1.6B	1.6B	Vision-language model with fast inference
LFM2.5-Audio-1.5B	1.5B	Audio-language model for speech and text I/O
LFM2.5-Audio-1.5B-JP	1.5B	Japanese-capable audio model for speech and text I/O

LFM2.5-1.2B-JP-202606 is a general-purpose text-only model with the following features:

Number of parameters: 1.17B
Number of layers: 16 (10 double-gated LIV convolution blocks + 6 GQA blocks)
Training budget: 31.5T tokens
Context length: 32,768 tokens
Vocabulary size: 65,536
Knowledge cutoff: Mid-2024
Languages: English, Japanese
Generation parameters:
- temperature: 0.1
- top_k: 50
- repetition_penalty: 1.05

Model	Description
LFM2.5-1.2B-JP-202606	Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM.
LFM2.5-1.2B-JP-202606-GGUF	Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage.
LFM2.5-1.2B-JP-202606-ONNX	ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile).
LFM2.5-1.2B-JP-202606-MLX	MLX format for Apple Silicon. Optimized for fast inference on Mac devices using the MLX framework.

We recommend using it for agentic workflows, tool use, structured outputs, bilingual English–Japanese assistants, and on-device personal-assistant applications. It is not recommended for knowledge-intensive tasks. It performs best when given clear, explicit instructions that define the task, expected behavior, and output format.

エージェント型ワークフロー、ツール使用、構造化出力、日英バイリンガルアシスタント、オンデバイスのパーソナルアシスタントでの利用を推奨します。一方で、詳細な知識を要するのタスクには推奨されません。タスク内容、期待される動作、出力形式を明確かつ具体的に指示することで、最も高い性能を発揮します。

Chat Template

LFM2.5 uses a ChatML-like format. See the Chat Template documentation for details. Example:

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
日本の首都は？<|im_end|>
<|im_start|>assistant

You can use tokenizer.apply_chat_template() to format your messages automatically.

Tool Use

LFM2.5 supports function calling as follows:

Function definition: We recommend providing the list of tools as a JSON object in the system prompt. You can also use the tokenizer.apply_chat_template() function with tools.
Function call: By default, LFM2.5 writes Pythonic function calls (a Python list between <|tool_call_start|> and <|tool_call_end|> special tokens), as the assistant answer. You can override this behavior by asking the model to output JSON function calls in the system prompt.
Function execution: The function call is executed, and the result is returned as a "tool" role.
Final answer: LFM2 interprets the outcome of the function call to address the original user prompt in plain text.

See the Tool Use documentation for the full guide. Example:

<|startoftext|><|im_start|>system
List of tools: [{"name": "get_candidate_status", "description": "採用プロセスにおける候補者の現在のステータスを取得します", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "候補者の一意の識別子"}}, "required": ["candidate_id"]}}]<|im_end|>
<|im_start|>user
候補者ID 12345 の現在のステータスは何ですか？<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>候補者ID 12345 の現在のステータスを確認しています。<|im_end|>
<|im_start|>tool
[{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|>
<|im_start|>assistant
ID 12345 の候補者は現在、Clinical Research Associate のポジションで「面接予定」の段階にあり、面接日は 2023年11月20日に設定されています。<|im_end|>

🏃 Inference

LFM2.5 is supported by many inference frameworks. See the Inference documentation for the full list.

Name	Description	Docs	Notebook
Transformers	Simple inference with direct access to model internals.	Link
vLLM	High-throughput production deployments with GPU.	Link
llama.cpp	Cross-platform inference with CPU offloading.	Link
MLX	Apple's machine learning framework optimized for Apple Silicon.	Link	—
LM Studio	Desktop application for running LLMs locally.	Link	—

Here's a quick start example with Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LFM2.5-1.2B-JP-202606"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
#   attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "日本の首都は？"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)
output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.1,
    top_k=50,
    repetition_penalty=1.05,
    max_new_tokens=512,
    streamer=streamer,
)

🔧 Fine-Tuning

We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.

Name	Description	Docs
CPT (Unsloth)	Continued Pre-Training using Unsloth for text completion.	Link
CPT (Unsloth)	Continued Pre-Training using Unsloth for translation.	Link
SFT (Unsloth)	Supervised Fine-Tuning with LoRA using Unsloth.	Link
SFT (TRL)	Supervised Fine-Tuning with LoRA using TRL.	Link
DPO (TRL)	Direct Preference Optimization with LoRA using TRL.	Link
GRPO (Unsloth)	GRPO with LoRA using Unsloth.	Link
GRPO (TRL)	GRPO with LoRA using TRL.	Link

📬 Contact

Got questions or want to connect? Join our Discord community
If you are interested in custom solutions with edge deployment, please contact our sales team.

Citation

@article{liquidai2025lfm2,
  title={LFM2 Technical Report},
  author={Liquid AI},
  journal={arXiv preprint arXiv:2511.23404},
  year={2025}
}

README.md Unescape Escape

🇯🇵 LFM2.5-1.2B-JP-202606

📊 Performance

🗒️ Model Details

Chat Template

Tool Use

🏃 Inference

🔧 Fine-Tuning

📬 Contact

Citation

README.md