213 lines
9.8 KiB
Markdown
213 lines
9.8 KiB
Markdown
---
|
|
license: apache-2.0
|
|
license_link: https://huggingface.co/Qihoo360/Light-TLLM-7B/blob/main/LICENSE
|
|
language:
|
|
- en
|
|
- zh
|
|
pipeline_tag: text-generation
|
|
base_model: Qwen/Qwen2.5-7B
|
|
tags:
|
|
- machine-translation
|
|
- multilingual
|
|
- qwen2
|
|
library_name: transformers
|
|
---
|
|
|
|
# Light-TLLM-7B
|
|
<a href="https://huggingface.co/qihoo360/Light-TLLM-7B" target="_blank" style="margin: 2px;">
|
|
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-FF6B6B" style="display: inline-block; vertical-align: middle;"/>
|
|
</a>
|
|
|
|
## Introduction
|
|
|
|
Light-TLLM-7B is a machine translation focused variant of Qwen2.5-7B developed by 360 AI Research.
|
|
|
|
**This repo contains the machine translation specialized 7B model**, which has the following features:
|
|
- Type: Causal Language Models for Machine Translation
|
|
- Training Stage: Continued pretraining, curriculum SFT, and MtPO reinforcement learning
|
|
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
|
|
- Number of Parameters: 7.61B (6.53B non-embedding)
|
|
- Number of Layers: 28
|
|
- Number of Attention Heads (GQA): 28 for Q and 4 for KV
|
|
- Context Length: Up to 131,072 tokens
|
|
- Vocabulary Size: 180,736 tokens with MtPO vocabulary expansion
|
|
|
|
## Requirements
|
|
|
|
The code of Light-TLLM-7B is compatible with the latest Hugging Face `transformers` library. We recommend using the latest version of `transformers`.
|
|
|
|
With `transformers<4.37.0`, you will encounter the following error:
|
|
```
|
|
KeyError: 'qwen2'
|
|
```
|
|
|
|
## Quickstart
|
|
|
|
Here provides a code snippet to show you how to load the tokenizer and model for machine translation tasks.
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
model_name = "qihoo360/Light-TLLM-7B"
|
|
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_name,
|
|
torch_dtype="auto",
|
|
device_map="auto"
|
|
)
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
|
# Example translation prompt
|
|
prompt = "Translate the following English text to Chinese: Hello, how are you today?"
|
|
messages = [
|
|
{"role": "system", "content": "You are a professional translator. Translate the given text accurately and naturally."},
|
|
{"role": "user", "content": prompt}
|
|
]
|
|
text = tokenizer.apply_chat_template(
|
|
messages,
|
|
tokenize=False,
|
|
add_generation_prompt=True
|
|
)
|
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
|
|
|
generated_ids = model.generate(
|
|
**model_inputs,
|
|
max_new_tokens=512,
|
|
temperature=0.7,
|
|
do_sample=True
|
|
)
|
|
generated_ids = [
|
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
|
]
|
|
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
print(response)
|
|
```
|
|
|
|
## Training Pipeline (MtPO)
|
|
|
|
Runs in four stages from tokenizer expansion to reinforcement learning alignment.
|
|
|
|
- **Stage 1 - Vocabulary expansion:** Extend the Qwen2.5 tokenizer with 3k-4k tokens per target language (Khmer, Lao, Mongolian, Myanmar, Tamil, Thai, Tibetan, Uyghur). FLORES-Plus diagnostics show 2.1x-5.4x compression gains, cutting Khmer token counts from 402 to 103 for representative passages.
|
|
- **Stage 2 - Balanced continued pretraining:** Continue training on 200B tokens with a 1:1 mix between English and the expanded low-resource corpus to preserve high-resource coverage while materially improving low-resource fluency.
|
|
- **Stage 3 - Curriculum SFT:** Train on a 7M-sample blend (5:1 general instructions vs. multilingual data) that progresses from base instruction-following to ASEAN translation and mixed-format prompts.
|
|
- **Stage 4 - MtPO reinforcement learning:** Optimize with entropy-tempered policy updates that keep sampling temperature consistent, apply asymmetric ratio clipping, and normalize advantages at the microbatch level to avoid length bias or entropy collapse.
|
|
|
|
## Verifiable Reward Guardrails
|
|
|
|
Reinforcement Learning with Verifiable Rewards (RLVR) combines the translation reward model with deterministic validators. During RL we sample K candidates per prompt, score them with RLVR, and keep the top-G diverse outputs for gradient updates. Each candidate is checked for:
|
|
- Length ratio safety relative to the source (default bounds 0.5-2.0 with soft penalties outside range)
|
|
- Structural token preservation for HTML, Markdown, and code blocks using lightweight parsers
|
|
- Target-language verification via a confidence-gated language ID classifier
|
|
- Code-mixing penalties that suppress unintended language drift
|
|
|
|
These verifiable rewards are added to the semantic score so bad outputs receive immediate negative credit, while high-quality candidates remain eligible for optimization.
|
|
|
|
## Data and Training Budget
|
|
|
|
Summary of resources and evaluation suites used during MtPO development.
|
|
|
|
- Continued pretraining: 200B tokens with adaptive sampling over English, ASEAN, Tibetan, Mongolian, Tamil, and Uyghur corpora
|
|
- Reinforcement learning: 60k steps, batch size 128, top-G candidate selection with RLVR filtering
|
|
- Reward model: Preference data spans ten error categories (accuracy, fluency, terminology, formatting, code-mixing, etc.)
|
|
- Benchmarks: FLORES-Plus (90 directions), BBH, CMMLU, HellaSwag, MMLU
|
|
|
|
## Model Details
|
|
|
|
- **Model Type**: Qwen2-based Causal Language Model
|
|
- **Language(s)**: Multilingual (English, Chinese, Khmer, Lao, Myanmar, Thai, Tibetan, Mongolian, Tamil, Malay, Indonesian, Filipino, Vietnamese, Uyghur, etc.)
|
|
- **License**: Apache 2.0
|
|
- **Finetuned from**: Qwen/Qwen2.5-7B
|
|
- **Model Size**: 7.61B parameters
|
|
- **Context Length**: 131,072 tokens
|
|
|
|
## Usage
|
|
|
|
This model is specifically designed for machine translation tasks. It can handle various translation scenarios including:
|
|
|
|
- English <-> Chinese translation
|
|
- Multilingual translation tasks
|
|
- Professional document translation
|
|
- Conversational translation
|
|
|
|
## Evaluation
|
|
|
|
### Translation and General Benchmarks
|
|
|
|
Light-TLLM-7B is evaluated on FLORES-Plus (90 directions) and standard instruction-following benchmarks. Scores below use sacreBLEU (higher is better) and zero-shot accuracy (percentage).
|
|
|
|
| Model | Group | xx->en | en->xx | xx->xx | Avg. | BBH | CMMLU | HellaSwag | MMLU |
|
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
|
| Gemma3-27B-IT | Multilingual chat | **36.8** | 30.7 | 22.3 | 24.7 | 55.9 | 55.9 | 55.9 | **56.0** |
|
|
| Qwen3-8B | Multilingual chat | 31.1 | 23.3 | 14.4 | 16.9 | **63.8** | 60.8 | 26.0 | 51.3 |
|
|
| Qwen2.5-7B-Instruct | Multilingual chat | 24.8 | 17.4 | 9.2 | 11.6 | 54.4 | **64.1** | **85.2** | 40.9 |
|
|
| Apertus-8B-Instruct | Multilingual chat | 32.5 | 25.7 | 15.6 | 18.3 | 49.2 | 45.3 | 64.2 | 45.2 |
|
|
| Tower-Plus-9B | Multilingual chat | 28.2 | 18.3 | 9.8 | 12.5 | 40.4 | 57.2 | 73.1 | 42.1 |
|
|
| Qwen-MT-Plus | Translation-focused | 34.0 | 29.6 | 19.6 | 22.1 | - | - | - | - |
|
|
| Seed-X-PPO-7B | Translation-focused | 25.9 | 22.6 | 10.5 | 13.3 | - | - | - | - |
|
|
| Hunyuan-MT-7B | Translation-focused | 24.6 | 23.4 | 14.8 | 16.6 | - | - | - | - |
|
|
| Light-TLLM-7B-SFT | Our models | 35.4 | 32.0 | 22.7 | 24.3 | 59.6 | 61.4 | 83.7 | 47.2 |
|
|
| **Light-TLLM-7B-RL** | Our models | 36.1 | **32.7** | **23.1** | **24.9** | 60.9 | 63.2 | **85.2** | 48.5 |
|
|
|
|
- en->xx directions gain +1.1 BLEU over the next best 7B system while preserving reasoning accuracy (+1.3 MMLU over SFT).
|
|
- Average BLEU across all FLORES-Plus directions rises to 24.9 despite the compact 7B footprint.
|
|
|
|
### Tokenizer Efficiency
|
|
|
|
Vocabulary expansion provides substantial compression on targeted scripts (higher compression ratio means fewer tokens per sentence).
|
|
|
|
| Language | Added tokens | Old compression ratio | New compression ratio | Speedup |
|
|
| --- | --- | --- | --- | --- |
|
|
| Khmer | 3712 | 0.85 | 3.49 | 4.09x |
|
|
| Lao | 3359 | 0.85 | 3.05 | 3.59x |
|
|
| Myanmar | 3226 | 0.69 | 2.87 | 4.17x |
|
|
| Thai | 2958 | 1.79 | 2.97 | 1.66x |
|
|
| Tibetan | 3920 | 0.75 | 4.03 | 5.39x |
|
|
|
|
- Khmer passages shrink from 402 tokens to 103 tokens in the running example used in the paper.
|
|
- Compression gains translate into lower latency and memory cost during decoding for low-resource scripts.
|
|
|
|
### Constraint Reliability (RLVR)
|
|
|
|
RLVR introduces deterministic checks that reduce failure modes compared with general chat models and MT baselines.
|
|
|
|
| Model | Language targeting | Length control | Format preservation | Code mixing | Overall |
|
|
| --- | --- | --- | --- | --- | --- |
|
|
| **Light-TLLM-7B-RL** | **97.8** | 99.2 | **92.15** | 92.3 | **95.3** |
|
|
| Qwen2.5-7B-Instruct | 92.0 | 97.0 | 51.8 | 62.8 | 75.9 |
|
|
| Gemma3-27B-IT | 97.4 | 91.6 | 42.1 | 90.9 | 80.5 |
|
|
| Qwen-MT-Plus | 97.6 | **99.8** | 82.5 | 94.8 | 93.6 |
|
|
| Seed-X-PPO-7B | 97.6 | 79.8 | 79.0 | 90.3 | 86.6 |
|
|
| DeepSeek-V3 | 95.4 | 95.7 | 67.6 | 95.0 | 88.4 |
|
|
| Hunyuan-MT-7B | 91.8 | 90.7 | 71.1 | **96.2** | 87.4 |
|
|
|
|
- Format retention jumps to 92.15 percent versus 51.8 percent for Qwen2.5-7B-Instruct, mitigating HTML or Markdown corruption.
|
|
- Language targeting stays above 97 percent while MtPO avoids verbosity by normalizing advantages at the microbatch level.
|
|
- Overall pass rate reaches 95.3 percent, surpassing Qwen2.5-7B-Instruct by 19.4 points, DeepSeek-V3 by 6.9 points, and Qwen-MT-Plus by 1.7 points despite identical constraint settings.
|
|
|
|
### Per-Language FLORES Highlights
|
|
|
|
- **English->Thai:** 34.1 BLEU, +1.5 over Qwen-MT-Plus.
|
|
- **English->Myanmar:** 12.9 BLEU with stable length control.
|
|
- **English->Filipino:** 35.4 BLEU after MtPO, combining instruction fidelity and translation quality.
|
|
- **Khmer->English:** 44.7 BLEU, reflecting gains from tokenizer expansion.
|
|
- **Vietnamese->English:** 37.6 BLEU with consistent improvements across ASEAN language pairs.
|
|
|
|
## Citation
|
|
|
|
If you find our work helpful, feel free to give us a cite.
|
|
|
|
```
|
|
@inproceedings{liu2026mtpo,
|
|
title = {Light-TLLM-7B},
|
|
author = {Light-MT Team},
|
|
booktitle = {International Conference on Learning Representations},
|
|
year = {2025},
|
|
url = {https://huggingface.co/qihoo360/Light-TLLM-7B}
|
|
}
|
|
```
|
|
|
|
## Disclaimer
|
|
|
|
This model is provided for research and educational purposes. Please ensure responsible use and compliance with applicable laws and regulations when using this model.
|