初始化项目，由ModelHub XC社区提供模型

Model: AIDC-AI/Marco-Mini-Global-Base Source: Original Platform
2026-04-20 21:26:53 +08:00
commit 4db26af786
27 changed files with 325610 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,218 @@
+---
+license: apache-2.0
+language:
+- en
+- zh
+- ar
+- de
+- es
+- fr
+- ko
+- ja
+- pt
+- tr
+- id
+- it
+- nl
+- pl
+- ru
+- vi
+- th
+- he
+- uk
+- ms
+- bn
+- cs
+- ur
+- kk
+- el
+- ro
+- hu
+- ne
+- az
+- da
+- sv
+- "no"
+- ca
+- gl
+- cy
+- ga
+- eu
+- hr
+- lv
+- lt
+- sk
+- sl
+- et
+- fi
+- sr
+- bg
+- fa
+- mt
+- hi
+- mr
+- gu
+- pa
+- ta
+- te
+- tl
+- jv
+- km
+- lo
+- my
+- am
+- sw
+- yo
+- ig
+- zu
+library_name: transformers
+tags:
+- moe
+- mixture-of-experts
+- multilingual
+- upcycling
+datasets:
+- nvidia/Nemotron-CC-v2
+- nvidia/Nemotron-Pretraining-SFT-v1
+- nvidia/Nemotron-Pretraining-Specialized-v1
+- nvidia/Nemotron-CC-v2.1
+- allenai/dolmino-mix-1124
+- nvidia/Nemotron-CC-Math-v1
+- nvidia/OpenMathInstruct-2
+- HuggingFaceTB/finemath
+- LLM360/MegaMath
+- open-thoughts/OpenThoughts3-1.2M
+- opencsg/Fineweb-Edu-Chinese-V2.1
+- HuggingFaceFW/fineweb-2
+- allenai/dolma3_dolmino_mix-100B-1125
+---
+
+# Marco-Mini-Global-Base
+
+**Marco-Mini-Global-Base** is an extended variant of [Marco-Mini-Base](https://huggingface.co/AIDC-AI/Marco-Mini-Base) that scales linguistic coverage from 29 to **64 languages**. It is a highly sparse Mixture-of-Experts (MoE) multilingual language model from the [Marco-MoE](https://github.com/AIDC-AI/Marco-LLM) family, developed by Alibaba International Digital Commerce. It activates only **0.86B out of 17.3B total parameters** (5% activation ratio) per token while supporting 64 languages — demonstrating that the MoE architecture enables scalable language expansion without the interference typical of dense models.
+
+## Model Description
+
+Marco-Mini-Global shares the same architecture as Marco-Mini-Base: a decoder-only Transformer with sparse MoE layers replacing standard FFN layers, upcycled from [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using fine-grained sub-matrix splitting combined with Drop-Upcycling.
+
+| Configuration | Value |
+|:---|:---:|
+| Total Parameters | 17.3B |
+| Activated Parameters | 0.86B |
+| Activation Ratio | 5% |
+| Num Layers | 28 |
+| Model Dimension | 1024 |
+| FFN Intermediate Dimension | 3072 |
+| Q-Heads | 16 |
+| KV-Heads | 8 |
+| Head Dimension | 128 |
+| Expert Dimension | 768 |
+| Total Experts | 256 |
+| Activated Experts | 8 |
+| Tie Embeddings | True |
+| Training FLOPs | $1.584 \times 10^{23}$ |
+
+## Training Details
+
+Marco-Mini-Global-Base branches from the Stage-2 checkpoint of Marco-Mini-Base and recalibrates the data mixtures in Stages 3 and 4 to integrate pre-training corpora for 35 newly introduced languages. In total it was trained on 5.5T tokens.
+
+The four-stage curriculum follows the same structure as Marco-Mini-Base:
+
+1. **Stage 1 (0 - 2.4T tokens): Foundational Training** — High-quality English data (Nemotron-CC-v2), reasoning and instruction data, and multilingual web/QA data for 19 languages.
+2. **Stage 2 (2.4T - 4.1T tokens): Optimization & Upsampling** — Upsampled reasoning corpora, downsampled English web data, and upsampled Chinese data with learning rate decay.
+3. **Stage 3 (4.1T - 5T tokens): Language Expansion** — Recalibrated data mixtures to integrate 35 new languages alongside the original 29.
+4. **Stage 4 (5T - 5.5T tokens): Synthetic Data Integration** — Curated multilingual synthetic data including cultural content and synthetic regional MCQs for all 64 languages.
+
+## Supported Languages
+
+**Original 29 languages:** English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani
+
+**35 newly introduced languages:** Danish, Swedish, Norwegian, Catalan, Galician, Welsh, Irish, Basque, Croatian, Latvian, Lithuanian, Slovak, Slovenian, Estonian, Finnish, Serbian, Bulgarian, Persian, Maltese, Hindi, Marathi, Gujarati, Punjabi, Tamil, Telugu, Tagalog, Javanese, Khmer, Lao, Burmese, Amharic, Swahili, Yoruba, Igbo, Zulu
+
+## Evaluation
+
+We compare Marco-Mini-Global-Base against strong multilingual baselines: **Gemma3-4B** (4B activated), **Tiny-Aya-3.35B** (3.35B activated), and **Qwen3-4B** (4B activated). All benchmarks are evaluated across the full 64-language set. Marco-Mini-Global uses only **0.86B activated parameters** while preserving robust English proficiency (63.6 vs. 63.7 for the 29-language Marco-Mini) and increasing the multilingual advantage over Qwen3-4B from +2.6% to +3.6%.
+
+### English
+
+| Benchmark | # Shots | Gemma3-4B | Tiny-Aya-3.35B | Qwen3-4B | **Marco-Mini-Global** |
+|:---|:---:|:---:|:---:|:---:|:---:|
+| MMLU _(Acc)_ | 5-shot | 61.1 | 58.6 | **75.2** | 72.9 |
+| MMLU-Redux _(Acc)_ | 0-shot | 57.7 | 51.7 | **71.3** | 68.9 |
+| MMLU-Pro _(Acc)_ | 5-shot | 28.8 | 26.9 | **45.9** | 44.5 |
+| AGIEval _(Acc)_ | 0-shot | 32.6 | 29.0 | **44.0** | 41.0 |
+| BBH _(EM)_ | 3-shot | 52.2 | 46.8 | **72.3** | 65.0 |
+| ARC-Easy _(Acc)_ | 0-shot | **82.6** | 76.5 | 75.0 | 82.4 |
+| ARC-Challenge _(Acc)_ | 0-shot | 54.1 | 47.4 | 49.9 | **57.0** |
+| HellaSwag _(Acc)_ | 0-shot | 76.7 | 71.0 | 74.4 | **77.2** |
+| WinoGrande _(Acc)_ | 0-shot | **61.4** | 56.6 | 59.6 | 58.3 |
+| BoolQ _(Acc)_ | 0-shot | **76.6** | 74.6 | 74.2 | 75.6 |
+| CommonsenseQA _(Acc)_ | 0-shot | 61.1 | 60.4 | 52.9 | **61.2** |
+| OpenBookQA _(Acc)_ | 0-shot | 42.6 | 40.4 | 42.6 | **45.0** |
+| PIQA _(Acc)_ | 0-shot | 80.3 | 76.9 | 77.4 | **80.7** |
+| SIQA _(Acc)_ | 0-shot | 50.4 | 49.9 | **53.0** | 48.4 |
+| GSM8K _(EM)_ | 5-shot | 39.3 | 58.0 | **81.7** | 76.4 |
+| **Average** | - | 57.2 | 55.5 | 63.3 | **63.6** |
+
+### Multilingual — General
+
+| Benchmark | # Shots | Gemma3-4B | Tiny-Aya-3.35B | Qwen3-4B | **Marco-Mini-Global** |
+|:---|:---:|:---:|:---:|:---:|:---:|
+| GlobalMMLU _(Acc)_ | 5-shot | 49.1 | 48.4 | 57.8 | **60.9** |
+| MMMLU _(Acc)_ | 0-shot | 45.0 | 42.8 | 54.8 | **58.2** |
+| MMLU-ProX-Lite _(Acc)_ | 5-shot | 23.3 | 23.5 | 35.6 | **36.2** |
+| BELEBELE _(Acc)_ | 0-shot | 62.3 | 62.5 | 74.0 | **76.0** |
+| mHellaSwag _(Acc_norm)_ | 0-shot | 51.9 | 50.3 | 48.5 | **54.4** |
+| mARC-Challenge _(Acc_norm)_ | 0-shot | 39.3 | 35.7 | 39.3 | **41.2** |
+| FLORES-200 En→Xx _(BLEU)_ | 5-shot | 27.9 | 25.6 | 25.8 | **29.5** |
+| FLORES-200 Xx→En _(BLEU)_ | 5-shot | 39.2 | 37.2 | 33.4 | **40.2** |
+| WMT24++ En→Xx _(BLEU)_ | 5-shot | **26.0** | 24.4 | 19.6 | **26.0** |
+| WMT24++ Xx→En _(BLEU)_ | 5-shot | 34.4 | 32.9 | 31.2 | **34.5** |
+| MGSM _(EM)_ | 8-shot | 35.7 | 36.6 | 69.1 | **71.7** |
+| **Average** | - | 39.5 | 37.3 | 44.5 | **48.1** |
+
+### Multilingual — Cultural & Regional
+
+| Benchmark | # Shots | Gemma3-4B | Tiny-Aya-3.35B | Qwen3-4B | **Marco-Mini-Global** |
+|:---|:---:|:---:|:---:|:---:|:---:|
+| INCLUDE _(Acc)_ | 5-shot | 52.3 | 53.5 | 60.0 | **61.1** |
+| Global-PIQA _(Acc_norm)_ | 0-shot | 67.8 | 66.7 | 61.8 | **70.2** |
+| CMMLU _(Acc)_ | 5-shot | 50.2 | 58.8 | **76.2** | 67.9 |
+| C-Eval _(Acc)_ | 5-shot | 48.5 | 57.6 | **76.6** | 66.2 |
+| ArabicMMLU _(Acc)_ | 3-shot | 61.6 | 63.2 | **67.0** | 66.6 |
+| TurkishMMLU _(Acc)_ | 5-shot | 43.7 | 45.2 | 60.6 | **63.1** |
+| GreekMMLU _(Acc)_ | 5-shot | 63.4 | 66.3 | 69.4 | **70.4** |
+| KazakhMMLU _(Acc)_ | 5-shot | 52.1 | 47.1 | **62.3** | 61.8 |
+| IndoMMLU _(Acc)_ | 0-shot | 48.5 | 52.0 | **60.1** | 59.5 |
+| IndoCareer _(Acc)_ | 3-shot | 53.4 | 56.6 | 61.5 | **61.8** |
+| IndoCulture _(Acc)_ | 0-shot | 59.1 | 58.5 | 61.1 | **62.5** |
+| **Average** | - | 54.6 | 56.9 | **65.1** | 64.7 |
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "AIDC-AI/Marco-Mini-Global-Base"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
+
+input_text = "The capital of France is"
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+## Citation
+
+```bibtex
+@article{marco-moe,
+  title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
+  author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
+  year={2026}
+}
+```
+
+## License
+
+This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
--- a/config.json
+++ b/config.json
@@ -0,0 +1,40 @@
+{
+  "architectures": [
+    "Qwen3MoeForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "decoder_sparse_step": 1,
+  "dtype": "float32",
+  "eos_token_id": 151643,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "max_position_embeddings": 32768,
+  "max_window_layers": 28,
+  "mlp_only_layers": [],
+  "model_type": "qwen3_moe",
+  "moe_intermediate_size": 768,
+  "norm_topk_prob": true,
+  "num_attention_heads": 16,
+  "num_experts": 256,
+  "num_experts_per_tok": 8,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "output_router_logits": false,
+  "qkv_bias": false,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "router_aux_loss_coef": 0.001,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_qk_norm": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework":"Pytorch","task":"text-generation"}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "transformers_version": "4.57.1"
+}
--- a/model-00001-of-00018.safetensors
+++ b/model-00001-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a56265e417310d4b4eb8689581370b37bfda16d46ea2f5acef84636d04d27de0
+size 2000033560
--- a/model-00002-of-00018.safetensors
+++ b/model-00002-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:50751c07cc20d141dfad37f49442892f9337683d34e7a21a337876e35a17b1af
+size 1998751296
--- a/model-00003-of-00018.safetensors
+++ b/model-00003-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:75b1ebfd395bbb0befa54f171e46500864980d8db7cb1e02363bdb013c6c1106
+size 1999795072
--- a/model-00004-of-00018.safetensors
+++ b/model-00004-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3da59a9a95d45e15ceb951bb07cedbb64cb3903cbdf2bffac6ebaca558736d83
+size 1998751072
--- a/model-00005-of-00018.safetensors
+++ b/model-00005-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ae49b25d1ee9f56bd7ece4b4499cb101981a6074d054fe8f55733d294771ac29
+size 1999795304
--- a/model-00006-of-00018.safetensors
+++ b/model-00006-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:2cbb5f8a2f005933f1a13509ae3e0efd91ca60f7ad51b2880bd1e7312a2cd46c
+size 1998750992
--- a/model-00007-of-00018.safetensors
+++ b/model-00007-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:b6b50896a5e3b5c35ccf33a41ebbc1cb643a327ef5487fb909adffdc442edbc4
+size 1998752080
--- a/model-00008-of-00018.safetensors
+++ b/model-00008-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0290cab2d5376286aecd6a59dfd54bb26a7b2e60682d25e55bd0bf7a4734bb5e
+size 1999796504
--- a/model-00009-of-00018.safetensors
+++ b/model-00009-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:712cdebf5ea3170f80cb2b1e44d4a7d2626f19f4655a4a460187d3ac521fd24d
+size 1998752264
--- a/model-00010-of-00018.safetensors
+++ b/model-00010-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fb54346e458ca93cc6fbd1f5395d8e9f242359576c95842871d8423bb550cc44
+size 1998752488
--- a/model-00011-of-00018.safetensors
+++ b/model-00011-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:22765eacf9b7a25ce4994e76b481ef9290aa869c92acea3053b4e5051d5bbbba
+size 1999796440
--- a/model-00012-of-00018.safetensors
+++ b/model-00012-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:27c4789cca6d43ab427999ee72969b7e9b275712ebf1e13a4f758406ff657f9d
+size 1998752272
--- a/model-00013-of-00018.safetensors
+++ b/model-00013-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0d14f52140748dc524ee6c4f7b2ab62303bed6b91c86e9a13c2705c9b8c44257
+size 1998752568
--- a/model-00014-of-00018.safetensors
+++ b/model-00014-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:94e92175329d183dd163a17beff47a2f14573731765426ef009c2534c881fe39
+size 1999796352
--- a/model-00015-of-00018.safetensors
+++ b/model-00015-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f9bd5d0bea6a979272fb34c33a5757d9014b1f7c2716373ed094964625ca416d
+size 1998752344
--- a/model-00016-of-00018.safetensors
+++ b/model-00016-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3051d596eb2669c06b6658f1431635c43274f4e17d30d4edc7b8a3e08941aaad
+size 1999796584
--- a/model-00017-of-00018.safetensors
+++ b/model-00017-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:949bf35213acb69396ea905a320a2e6193dfeb5da693b8695da0149714ea18fd
+size 1998752264
--- a/model-00018-of-00018.safetensors
+++ b/model-00018-of-00018.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:67b24186cd7e42f5054731a2340b578264b42614a752c369f7ccd22884287663
+size 828684352
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,207 @@
+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0].role == 'system' %}\n        {{- messages[0].content + '\\n\\n' }}\n    {%- endif %}\n    {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0].role == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if message.content is string %}\n        {%- set content = message.content %}\n    {%- else %}\n        {%- set content = '' %}\n    {%- endif %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n        {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role + '\\n' + content }}\n        {%- if message.tool_calls %}\n            {%- for tool_call in message.tool_calls %}\n                {%- if (loop.first and content) or (not loop.first) %}\n                    {{- '\\n' }}\n                {%- endif %}\n                {%- if tool_call.function %}\n                    {%- set tool_call = tool_call.function %}\n                {%- endif %}\n                {{- '<tool_call>\\n{\"name\": \"' }}\n                {{- tool_call.name }}\n                {{- '\", \"arguments\": ' }}\n                {%- if tool_call.arguments is string %}\n                    {{- tool_call.arguments }}\n                {%- else %}\n                    {{- tool_call.arguments | tojson }}\n                {%- endif %}\n                {{- '}\\n</tool_call>' }}\n            {%- endfor %}\n        {%- endif %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null,
+  "add_bos_token": false
+}
--- a/vocab.json
+++ b/vocab.json
				`@@ -0,0 +1 @@`
				`{"framework":"Pytorch","task":"text-generation"}`