初始化项目,由ModelHub XC社区提供模型

Model: AIDC-AI/Marco-Mini-Global-Base
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-20 21:26:53 +08:00
commit 4db26af786
27 changed files with 325610 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

218
README.md Normal file
View File

@@ -0,0 +1,218 @@
---
license: apache-2.0
language:
- en
- zh
- ar
- de
- es
- fr
- ko
- ja
- pt
- tr
- id
- it
- nl
- pl
- ru
- vi
- th
- he
- uk
- ms
- bn
- cs
- ur
- kk
- el
- ro
- hu
- ne
- az
- da
- sv
- "no"
- ca
- gl
- cy
- ga
- eu
- hr
- lv
- lt
- sk
- sl
- et
- fi
- sr
- bg
- fa
- mt
- hi
- mr
- gu
- pa
- ta
- te
- tl
- jv
- km
- lo
- my
- am
- sw
- yo
- ig
- zu
library_name: transformers
tags:
- moe
- mixture-of-experts
- multilingual
- upcycling
datasets:
- nvidia/Nemotron-CC-v2
- nvidia/Nemotron-Pretraining-SFT-v1
- nvidia/Nemotron-Pretraining-Specialized-v1
- nvidia/Nemotron-CC-v2.1
- allenai/dolmino-mix-1124
- nvidia/Nemotron-CC-Math-v1
- nvidia/OpenMathInstruct-2
- HuggingFaceTB/finemath
- LLM360/MegaMath
- open-thoughts/OpenThoughts3-1.2M
- opencsg/Fineweb-Edu-Chinese-V2.1
- HuggingFaceFW/fineweb-2
- allenai/dolma3_dolmino_mix-100B-1125
---
# Marco-Mini-Global-Base
**Marco-Mini-Global-Base** is an extended variant of [Marco-Mini-Base](https://huggingface.co/AIDC-AI/Marco-Mini-Base) that scales linguistic coverage from 29 to **64 languages**. It is a highly sparse Mixture-of-Experts (MoE) multilingual language model from the [Marco-MoE](https://github.com/AIDC-AI/Marco-LLM) family, developed by Alibaba International Digital Commerce. It activates only **0.86B out of 17.3B total parameters** (5% activation ratio) per token while supporting 64 languages — demonstrating that the MoE architecture enables scalable language expansion without the interference typical of dense models.
## Model Description
Marco-Mini-Global shares the same architecture as Marco-Mini-Base: a decoder-only Transformer with sparse MoE layers replacing standard FFN layers, upcycled from [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using fine-grained sub-matrix splitting combined with Drop-Upcycling.
| Configuration | Value |
|:---|:---:|
| Total Parameters | 17.3B |
| Activated Parameters | 0.86B |
| Activation Ratio | 5% |
| Num Layers | 28 |
| Model Dimension | 1024 |
| FFN Intermediate Dimension | 3072 |
| Q-Heads | 16 |
| KV-Heads | 8 |
| Head Dimension | 128 |
| Expert Dimension | 768 |
| Total Experts | 256 |
| Activated Experts | 8 |
| Tie Embeddings | True |
| Training FLOPs | $1.584 \times 10^{23}$ |
## Training Details
Marco-Mini-Global-Base branches from the Stage-2 checkpoint of Marco-Mini-Base and recalibrates the data mixtures in Stages 3 and 4 to integrate pre-training corpora for 35 newly introduced languages. In total it was trained on 5.5T tokens.
The four-stage curriculum follows the same structure as Marco-Mini-Base:
1. **Stage 1 (0 - 2.4T tokens): Foundational Training** — High-quality English data (Nemotron-CC-v2), reasoning and instruction data, and multilingual web/QA data for 19 languages.
2. **Stage 2 (2.4T - 4.1T tokens): Optimization & Upsampling** — Upsampled reasoning corpora, downsampled English web data, and upsampled Chinese data with learning rate decay.
3. **Stage 3 (4.1T - 5T tokens): Language Expansion** — Recalibrated data mixtures to integrate 35 new languages alongside the original 29.
4. **Stage 4 (5T - 5.5T tokens): Synthetic Data Integration** — Curated multilingual synthetic data including cultural content and synthetic regional MCQs for all 64 languages.
## Supported Languages
**Original 29 languages:** English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani
**35 newly introduced languages:** Danish, Swedish, Norwegian, Catalan, Galician, Welsh, Irish, Basque, Croatian, Latvian, Lithuanian, Slovak, Slovenian, Estonian, Finnish, Serbian, Bulgarian, Persian, Maltese, Hindi, Marathi, Gujarati, Punjabi, Tamil, Telugu, Tagalog, Javanese, Khmer, Lao, Burmese, Amharic, Swahili, Yoruba, Igbo, Zulu
## Evaluation
We compare Marco-Mini-Global-Base against strong multilingual baselines: **Gemma3-4B** (4B activated), **Tiny-Aya-3.35B** (3.35B activated), and **Qwen3-4B** (4B activated). All benchmarks are evaluated across the full 64-language set. Marco-Mini-Global uses only **0.86B activated parameters** while preserving robust English proficiency (63.6 vs. 63.7 for the 29-language Marco-Mini) and increasing the multilingual advantage over Qwen3-4B from +2.6% to +3.6%.
### English
| Benchmark | # Shots | Gemma3-4B | Tiny-Aya-3.35B | Qwen3-4B | **Marco-Mini-Global** |
|:---|:---:|:---:|:---:|:---:|:---:|
| MMLU _(Acc)_ | 5-shot | 61.1 | 58.6 | **75.2** | 72.9 |
| MMLU-Redux _(Acc)_ | 0-shot | 57.7 | 51.7 | **71.3** | 68.9 |
| MMLU-Pro _(Acc)_ | 5-shot | 28.8 | 26.9 | **45.9** | 44.5 |
| AGIEval _(Acc)_ | 0-shot | 32.6 | 29.0 | **44.0** | 41.0 |
| BBH _(EM)_ | 3-shot | 52.2 | 46.8 | **72.3** | 65.0 |
| ARC-Easy _(Acc)_ | 0-shot | **82.6** | 76.5 | 75.0 | 82.4 |
| ARC-Challenge _(Acc)_ | 0-shot | 54.1 | 47.4 | 49.9 | **57.0** |
| HellaSwag _(Acc)_ | 0-shot | 76.7 | 71.0 | 74.4 | **77.2** |
| WinoGrande _(Acc)_ | 0-shot | **61.4** | 56.6 | 59.6 | 58.3 |
| BoolQ _(Acc)_ | 0-shot | **76.6** | 74.6 | 74.2 | 75.6 |
| CommonsenseQA _(Acc)_ | 0-shot | 61.1 | 60.4 | 52.9 | **61.2** |
| OpenBookQA _(Acc)_ | 0-shot | 42.6 | 40.4 | 42.6 | **45.0** |
| PIQA _(Acc)_ | 0-shot | 80.3 | 76.9 | 77.4 | **80.7** |
| SIQA _(Acc)_ | 0-shot | 50.4 | 49.9 | **53.0** | 48.4 |
| GSM8K _(EM)_ | 5-shot | 39.3 | 58.0 | **81.7** | 76.4 |
| **Average** | - | 57.2 | 55.5 | 63.3 | **63.6** |
### Multilingual — General
| Benchmark | # Shots | Gemma3-4B | Tiny-Aya-3.35B | Qwen3-4B | **Marco-Mini-Global** |
|:---|:---:|:---:|:---:|:---:|:---:|
| GlobalMMLU _(Acc)_ | 5-shot | 49.1 | 48.4 | 57.8 | **60.9** |
| MMMLU _(Acc)_ | 0-shot | 45.0 | 42.8 | 54.8 | **58.2** |
| MMLU-ProX-Lite _(Acc)_ | 5-shot | 23.3 | 23.5 | 35.6 | **36.2** |
| BELEBELE _(Acc)_ | 0-shot | 62.3 | 62.5 | 74.0 | **76.0** |
| mHellaSwag _(Acc_norm)_ | 0-shot | 51.9 | 50.3 | 48.5 | **54.4** |
| mARC-Challenge _(Acc_norm)_ | 0-shot | 39.3 | 35.7 | 39.3 | **41.2** |
| FLORES-200 En→Xx _(BLEU)_ | 5-shot | 27.9 | 25.6 | 25.8 | **29.5** |
| FLORES-200 Xx→En _(BLEU)_ | 5-shot | 39.2 | 37.2 | 33.4 | **40.2** |
| WMT24++ En→Xx _(BLEU)_ | 5-shot | **26.0** | 24.4 | 19.6 | **26.0** |
| WMT24++ Xx→En _(BLEU)_ | 5-shot | 34.4 | 32.9 | 31.2 | **34.5** |
| MGSM _(EM)_ | 8-shot | 35.7 | 36.6 | 69.1 | **71.7** |
| **Average** | - | 39.5 | 37.3 | 44.5 | **48.1** |
### Multilingual — Cultural & Regional
| Benchmark | # Shots | Gemma3-4B | Tiny-Aya-3.35B | Qwen3-4B | **Marco-Mini-Global** |
|:---|:---:|:---:|:---:|:---:|:---:|
| INCLUDE _(Acc)_ | 5-shot | 52.3 | 53.5 | 60.0 | **61.1** |
| Global-PIQA _(Acc_norm)_ | 0-shot | 67.8 | 66.7 | 61.8 | **70.2** |
| CMMLU _(Acc)_ | 5-shot | 50.2 | 58.8 | **76.2** | 67.9 |
| C-Eval _(Acc)_ | 5-shot | 48.5 | 57.6 | **76.6** | 66.2 |
| ArabicMMLU _(Acc)_ | 3-shot | 61.6 | 63.2 | **67.0** | 66.6 |
| TurkishMMLU _(Acc)_ | 5-shot | 43.7 | 45.2 | 60.6 | **63.1** |
| GreekMMLU _(Acc)_ | 5-shot | 63.4 | 66.3 | 69.4 | **70.4** |
| KazakhMMLU _(Acc)_ | 5-shot | 52.1 | 47.1 | **62.3** | 61.8 |
| IndoMMLU _(Acc)_ | 0-shot | 48.5 | 52.0 | **60.1** | 59.5 |
| IndoCareer _(Acc)_ | 3-shot | 53.4 | 56.6 | 61.5 | **61.8** |
| IndoCulture _(Acc)_ | 0-shot | 59.1 | 58.5 | 61.1 | **62.5** |
| **Average** | - | 54.6 | 56.9 | **65.1** | 64.7 |
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "AIDC-AI/Marco-Mini-Global-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
input_text = "The capital of France is"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Citation
```bibtex
@article{marco-moe,
title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
year={2026}
}
```
## License
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

40
config.json Normal file
View File

@@ -0,0 +1,40 @@
{
"architectures": [
"Qwen3MoeForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"decoder_sparse_step": 1,
"dtype": "float32",
"eos_token_id": 151643,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"mlp_only_layers": [],
"model_type": "qwen3_moe",
"moe_intermediate_size": 768,
"norm_topk_prob": true,
"num_attention_heads": 16,
"num_experts": 256,
"num_experts_per_tok": 8,
"num_hidden_layers": 28,
"num_key_value_heads": 8,
"output_router_logits": false,
"qkv_bias": false,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"router_aux_loss_coef": 0.001,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_qk_norm": true,
"use_sliding_window": false,
"vocab_size": 151936
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework":"Pytorch","task":"text-generation"}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 151643,
"eos_token_id": 151643,
"transformers_version": "4.57.1"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a56265e417310d4b4eb8689581370b37bfda16d46ea2f5acef84636d04d27de0
size 2000033560

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:50751c07cc20d141dfad37f49442892f9337683d34e7a21a337876e35a17b1af
size 1998751296

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:75b1ebfd395bbb0befa54f171e46500864980d8db7cb1e02363bdb013c6c1106
size 1999795072

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3da59a9a95d45e15ceb951bb07cedbb64cb3903cbdf2bffac6ebaca558736d83
size 1998751072

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ae49b25d1ee9f56bd7ece4b4499cb101981a6074d054fe8f55733d294771ac29
size 1999795304

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2cbb5f8a2f005933f1a13509ae3e0efd91ca60f7ad51b2880bd1e7312a2cd46c
size 1998750992

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b6b50896a5e3b5c35ccf33a41ebbc1cb643a327ef5487fb909adffdc442edbc4
size 1998752080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0290cab2d5376286aecd6a59dfd54bb26a7b2e60682d25e55bd0bf7a4734bb5e
size 1999796504

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:712cdebf5ea3170f80cb2b1e44d4a7d2626f19f4655a4a460187d3ac521fd24d
size 1998752264

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fb54346e458ca93cc6fbd1f5395d8e9f242359576c95842871d8423bb550cc44
size 1998752488

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:22765eacf9b7a25ce4994e76b481ef9290aa869c92acea3053b4e5051d5bbbba
size 1999796440

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:27c4789cca6d43ab427999ee72969b7e9b275712ebf1e13a4f758406ff657f9d
size 1998752272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0d14f52140748dc524ee6c4f7b2ab62303bed6b91c86e9a13c2705c9b8c44257
size 1998752568

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:94e92175329d183dd163a17beff47a2f14573731765426ef009c2534c881fe39
size 1999796352

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f9bd5d0bea6a979272fb34c33a5757d9014b1f7c2716373ed094964625ca416d
size 1998752344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3051d596eb2669c06b6658f1431635c43274f4e17d30d4edc7b8a3e08941aaad
size 1999796584

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:949bf35213acb69396ea905a320a2e6193dfeb5da693b8695da0149714ea18fd
size 1998752264

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:67b24186cd7e42f5054731a2340b578264b42614a752c369f7ccd22884287663
size 828684352

21766
model.safetensors.index.json Normal file

File diff suppressed because it is too large Load Diff

303282
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

207
tokenizer_config.json Normal file
View File

@@ -0,0 +1,207 @@
{
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null,
"add_bos_token": false
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long