初始化项目,由ModelHub XC社区提供模型

Model: xiaomi-research/MiLMMT-46-1B-v0.1
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-02 18:05:12 +08:00
commit bf57402c74
11 changed files with 51593 additions and 0 deletions

37
.gitattributes vendored Normal file
View File

@@ -0,0 +1,37 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
main.png filter=lfs diff=lfs merge=lfs -text

87
README.md Normal file
View File

@@ -0,0 +1,87 @@
---
license: gemma
base_model:
- xiaomi-research/MiMT-46-1B-Pretrain
pipeline_tag: translation
library_name: transformers
---
## Model Description
MiLMMT-46-1B-v0.1 is an LLM-based translation model. It has been fintuned on MiLMMT-46-1B-Pretrain, which is a language model developed through continual pretraining of Gemma3-1B using a mix of 143 billion tokens from both monolingual and parallel data across 46 different languages. Please find more details in our paper: [Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models](https://arxiv.org/abs/2602.11961).
- **Supported Languages**: Arabic, Azerbaijani, Bulgarian, Bengali, Catalan, Czech, Danish, German, Greek, English, Spanish, Persian, Finnish, French, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kazakh, Khmer, Korean, Lao, Malay, Burmese, Norwegian, Dutch, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Tamil, Thai, Tagalog, Turkish, Urdu, Uzbek, Vietnamese, Cantonese, Chinese (Simplified), Chinese (Traditional).
- **GitHub**: Please find more details in our [GitHub repository](https://github.com/xiaomi-research/gemmax).
- **Developed by**: Xiaomi Inc.
## Model Performance
![Experimental Result](main.png)
## Translation Prompt
```text
Translate this from <source language name> to <target language name>:
<source language name>: <source language sentence>
<target language name>:
```
Please use the language name specified above in the translation prompt.
## Run the model
#### Using on vLLM:
```python
from vllm import LLM, SamplingParams
model_id = "xiaomi-research/MiLMMT-46-1B-v0.1"
model = LLM(model=model_id)
sampling_params = SamplingParams(top_k=1, temperature=0, max_tokens=2048)
text = "Translate this from Chinese (Simplified) to English:\nChinese (Simplified): 我爱机器翻译\nEnglish:"
outputs = model.generate(text, sampling_params)
print(outputs[0].outputs[0].text)
```
#### Using on Transformers:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "xiaomi-research/MiLMMT-46-1B-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
text = "Translate this from Chinese (Simplified) to English:\nChinese (Simplified): 我爱机器翻译\nEnglish:"
inputs = tokenizer(text, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Citation
```bibtex
@misc{shang2026scalingmodeldatamultilingual,
title={Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models},
author={Yuzhe Shang and Pengzhi Gao and Wei Liu and Jian Luan and Jinsong Su},
year={2026},
eprint={2602.11961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.11961},
}
```
## Limitations
MiLMMT-46 currently supports only the 46 languages listed above, and strong translation performance is not guaranteed for other languages. We will continue to improve the translation quality of MiLMMT-46, and future model releases will follow in due course.

3
added_tokens.json Normal file
View File

@@ -0,0 +1,3 @@
{
"<image_soft_token>": 262144
}

62
config.json Normal file
View File

@@ -0,0 +1,62 @@
{
"_sliding_window_pattern": 6,
"architectures": [
"Gemma3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"attn_logit_softcapping": null,
"bos_token_id": 2,
"cache_implementation": "hybrid",
"eos_token_id": 1,
"final_logit_softcapping": null,
"head_dim": 256,
"hidden_activation": "gelu_pytorch_tanh",
"hidden_size": 1152,
"initializer_range": 0.02,
"intermediate_size": 6912,
"layer_types": [
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention"
],
"max_position_embeddings": 32768,
"model_type": "gemma3_text",
"num_attention_heads": 4,
"num_hidden_layers": 26,
"num_key_value_heads": 1,
"pad_token_id": 0,
"query_pre_attn_scalar": 256,
"rms_norm_eps": 1e-06,
"rope_local_base_freq": 10000,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": 512,
"torch_dtype": "bfloat16",
"transformers_version": "4.55.2",
"use_cache": false,
"vocab_size": 262144
}

13
generation_config.json Normal file
View File

@@ -0,0 +1,13 @@
{
"bos_token_id": 2,
"cache_implementation": "hybrid",
"do_sample": true,
"eos_token_id": [
1,
106
],
"pad_token_id": 0,
"top_k": 64,
"top_p": 0.95,
"transformers_version": "4.55.2"
}

3
main.png Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:824a4427b98127b266c2a4992f64d71d61cdae211859215be0b54f153379d516
size 310340

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:118ceb72904a25d3760df3abc1cac684ffae0c2c633cce2f65e111d7ff12d47e
size 2603791264

33
special_tokens_map.json Normal file
View File

@@ -0,0 +1,33 @@
{
"boi_token": "<start_of_image>",
"bos_token": {
"content": "<bos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eoi_token": "<end_of_image>",
"eos_token": {
"content": "<eos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"image_token": "<image_soft_token>",
"pad_token": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
size 33384568

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
size 4689074

51346
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff