初始化项目,由ModelHub XC社区提供模型
Model: AIDC-AI/Marco-Mini-Global-Base Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
218
README.md
Normal file
218
README.md
Normal file
@@ -0,0 +1,218 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
- zh
|
||||
- ar
|
||||
- de
|
||||
- es
|
||||
- fr
|
||||
- ko
|
||||
- ja
|
||||
- pt
|
||||
- tr
|
||||
- id
|
||||
- it
|
||||
- nl
|
||||
- pl
|
||||
- ru
|
||||
- vi
|
||||
- th
|
||||
- he
|
||||
- uk
|
||||
- ms
|
||||
- bn
|
||||
- cs
|
||||
- ur
|
||||
- kk
|
||||
- el
|
||||
- ro
|
||||
- hu
|
||||
- ne
|
||||
- az
|
||||
- da
|
||||
- sv
|
||||
- "no"
|
||||
- ca
|
||||
- gl
|
||||
- cy
|
||||
- ga
|
||||
- eu
|
||||
- hr
|
||||
- lv
|
||||
- lt
|
||||
- sk
|
||||
- sl
|
||||
- et
|
||||
- fi
|
||||
- sr
|
||||
- bg
|
||||
- fa
|
||||
- mt
|
||||
- hi
|
||||
- mr
|
||||
- gu
|
||||
- pa
|
||||
- ta
|
||||
- te
|
||||
- tl
|
||||
- jv
|
||||
- km
|
||||
- lo
|
||||
- my
|
||||
- am
|
||||
- sw
|
||||
- yo
|
||||
- ig
|
||||
- zu
|
||||
library_name: transformers
|
||||
tags:
|
||||
- moe
|
||||
- mixture-of-experts
|
||||
- multilingual
|
||||
- upcycling
|
||||
datasets:
|
||||
- nvidia/Nemotron-CC-v2
|
||||
- nvidia/Nemotron-Pretraining-SFT-v1
|
||||
- nvidia/Nemotron-Pretraining-Specialized-v1
|
||||
- nvidia/Nemotron-CC-v2.1
|
||||
- allenai/dolmino-mix-1124
|
||||
- nvidia/Nemotron-CC-Math-v1
|
||||
- nvidia/OpenMathInstruct-2
|
||||
- HuggingFaceTB/finemath
|
||||
- LLM360/MegaMath
|
||||
- open-thoughts/OpenThoughts3-1.2M
|
||||
- opencsg/Fineweb-Edu-Chinese-V2.1
|
||||
- HuggingFaceFW/fineweb-2
|
||||
- allenai/dolma3_dolmino_mix-100B-1125
|
||||
---
|
||||
|
||||
# Marco-Mini-Global-Base
|
||||
|
||||
**Marco-Mini-Global-Base** is an extended variant of [Marco-Mini-Base](https://huggingface.co/AIDC-AI/Marco-Mini-Base) that scales linguistic coverage from 29 to **64 languages**. It is a highly sparse Mixture-of-Experts (MoE) multilingual language model from the [Marco-MoE](https://github.com/AIDC-AI/Marco-LLM) family, developed by Alibaba International Digital Commerce. It activates only **0.86B out of 17.3B total parameters** (5% activation ratio) per token while supporting 64 languages — demonstrating that the MoE architecture enables scalable language expansion without the interference typical of dense models.
|
||||
|
||||
## Model Description
|
||||
|
||||
Marco-Mini-Global shares the same architecture as Marco-Mini-Base: a decoder-only Transformer with sparse MoE layers replacing standard FFN layers, upcycled from [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using fine-grained sub-matrix splitting combined with Drop-Upcycling.
|
||||
|
||||
| Configuration | Value |
|
||||
|:---|:---:|
|
||||
| Total Parameters | 17.3B |
|
||||
| Activated Parameters | 0.86B |
|
||||
| Activation Ratio | 5% |
|
||||
| Num Layers | 28 |
|
||||
| Model Dimension | 1024 |
|
||||
| FFN Intermediate Dimension | 3072 |
|
||||
| Q-Heads | 16 |
|
||||
| KV-Heads | 8 |
|
||||
| Head Dimension | 128 |
|
||||
| Expert Dimension | 768 |
|
||||
| Total Experts | 256 |
|
||||
| Activated Experts | 8 |
|
||||
| Tie Embeddings | True |
|
||||
| Training FLOPs | $1.584 \times 10^{23}$ |
|
||||
|
||||
## Training Details
|
||||
|
||||
Marco-Mini-Global-Base branches from the Stage-2 checkpoint of Marco-Mini-Base and recalibrates the data mixtures in Stages 3 and 4 to integrate pre-training corpora for 35 newly introduced languages. In total it was trained on 5.5T tokens.
|
||||
|
||||
The four-stage curriculum follows the same structure as Marco-Mini-Base:
|
||||
|
||||
1. **Stage 1 (0 - 2.4T tokens): Foundational Training** — High-quality English data (Nemotron-CC-v2), reasoning and instruction data, and multilingual web/QA data for 19 languages.
|
||||
2. **Stage 2 (2.4T - 4.1T tokens): Optimization & Upsampling** — Upsampled reasoning corpora, downsampled English web data, and upsampled Chinese data with learning rate decay.
|
||||
3. **Stage 3 (4.1T - 5T tokens): Language Expansion** — Recalibrated data mixtures to integrate 35 new languages alongside the original 29.
|
||||
4. **Stage 4 (5T - 5.5T tokens): Synthetic Data Integration** — Curated multilingual synthetic data including cultural content and synthetic regional MCQs for all 64 languages.
|
||||
|
||||
## Supported Languages
|
||||
|
||||
**Original 29 languages:** English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani
|
||||
|
||||
**35 newly introduced languages:** Danish, Swedish, Norwegian, Catalan, Galician, Welsh, Irish, Basque, Croatian, Latvian, Lithuanian, Slovak, Slovenian, Estonian, Finnish, Serbian, Bulgarian, Persian, Maltese, Hindi, Marathi, Gujarati, Punjabi, Tamil, Telugu, Tagalog, Javanese, Khmer, Lao, Burmese, Amharic, Swahili, Yoruba, Igbo, Zulu
|
||||
|
||||
## Evaluation
|
||||
|
||||
We compare Marco-Mini-Global-Base against strong multilingual baselines: **Gemma3-4B** (4B activated), **Tiny-Aya-3.35B** (3.35B activated), and **Qwen3-4B** (4B activated). All benchmarks are evaluated across the full 64-language set. Marco-Mini-Global uses only **0.86B activated parameters** while preserving robust English proficiency (63.6 vs. 63.7 for the 29-language Marco-Mini) and increasing the multilingual advantage over Qwen3-4B from +2.6% to +3.6%.
|
||||
|
||||
### English
|
||||
|
||||
| Benchmark | # Shots | Gemma3-4B | Tiny-Aya-3.35B | Qwen3-4B | **Marco-Mini-Global** |
|
||||
|:---|:---:|:---:|:---:|:---:|:---:|
|
||||
| MMLU _(Acc)_ | 5-shot | 61.1 | 58.6 | **75.2** | 72.9 |
|
||||
| MMLU-Redux _(Acc)_ | 0-shot | 57.7 | 51.7 | **71.3** | 68.9 |
|
||||
| MMLU-Pro _(Acc)_ | 5-shot | 28.8 | 26.9 | **45.9** | 44.5 |
|
||||
| AGIEval _(Acc)_ | 0-shot | 32.6 | 29.0 | **44.0** | 41.0 |
|
||||
| BBH _(EM)_ | 3-shot | 52.2 | 46.8 | **72.3** | 65.0 |
|
||||
| ARC-Easy _(Acc)_ | 0-shot | **82.6** | 76.5 | 75.0 | 82.4 |
|
||||
| ARC-Challenge _(Acc)_ | 0-shot | 54.1 | 47.4 | 49.9 | **57.0** |
|
||||
| HellaSwag _(Acc)_ | 0-shot | 76.7 | 71.0 | 74.4 | **77.2** |
|
||||
| WinoGrande _(Acc)_ | 0-shot | **61.4** | 56.6 | 59.6 | 58.3 |
|
||||
| BoolQ _(Acc)_ | 0-shot | **76.6** | 74.6 | 74.2 | 75.6 |
|
||||
| CommonsenseQA _(Acc)_ | 0-shot | 61.1 | 60.4 | 52.9 | **61.2** |
|
||||
| OpenBookQA _(Acc)_ | 0-shot | 42.6 | 40.4 | 42.6 | **45.0** |
|
||||
| PIQA _(Acc)_ | 0-shot | 80.3 | 76.9 | 77.4 | **80.7** |
|
||||
| SIQA _(Acc)_ | 0-shot | 50.4 | 49.9 | **53.0** | 48.4 |
|
||||
| GSM8K _(EM)_ | 5-shot | 39.3 | 58.0 | **81.7** | 76.4 |
|
||||
| **Average** | - | 57.2 | 55.5 | 63.3 | **63.6** |
|
||||
|
||||
### Multilingual — General
|
||||
|
||||
| Benchmark | # Shots | Gemma3-4B | Tiny-Aya-3.35B | Qwen3-4B | **Marco-Mini-Global** |
|
||||
|:---|:---:|:---:|:---:|:---:|:---:|
|
||||
| GlobalMMLU _(Acc)_ | 5-shot | 49.1 | 48.4 | 57.8 | **60.9** |
|
||||
| MMMLU _(Acc)_ | 0-shot | 45.0 | 42.8 | 54.8 | **58.2** |
|
||||
| MMLU-ProX-Lite _(Acc)_ | 5-shot | 23.3 | 23.5 | 35.6 | **36.2** |
|
||||
| BELEBELE _(Acc)_ | 0-shot | 62.3 | 62.5 | 74.0 | **76.0** |
|
||||
| mHellaSwag _(Acc_norm)_ | 0-shot | 51.9 | 50.3 | 48.5 | **54.4** |
|
||||
| mARC-Challenge _(Acc_norm)_ | 0-shot | 39.3 | 35.7 | 39.3 | **41.2** |
|
||||
| FLORES-200 En→Xx _(BLEU)_ | 5-shot | 27.9 | 25.6 | 25.8 | **29.5** |
|
||||
| FLORES-200 Xx→En _(BLEU)_ | 5-shot | 39.2 | 37.2 | 33.4 | **40.2** |
|
||||
| WMT24++ En→Xx _(BLEU)_ | 5-shot | **26.0** | 24.4 | 19.6 | **26.0** |
|
||||
| WMT24++ Xx→En _(BLEU)_ | 5-shot | 34.4 | 32.9 | 31.2 | **34.5** |
|
||||
| MGSM _(EM)_ | 8-shot | 35.7 | 36.6 | 69.1 | **71.7** |
|
||||
| **Average** | - | 39.5 | 37.3 | 44.5 | **48.1** |
|
||||
|
||||
### Multilingual — Cultural & Regional
|
||||
|
||||
| Benchmark | # Shots | Gemma3-4B | Tiny-Aya-3.35B | Qwen3-4B | **Marco-Mini-Global** |
|
||||
|:---|:---:|:---:|:---:|:---:|:---:|
|
||||
| INCLUDE _(Acc)_ | 5-shot | 52.3 | 53.5 | 60.0 | **61.1** |
|
||||
| Global-PIQA _(Acc_norm)_ | 0-shot | 67.8 | 66.7 | 61.8 | **70.2** |
|
||||
| CMMLU _(Acc)_ | 5-shot | 50.2 | 58.8 | **76.2** | 67.9 |
|
||||
| C-Eval _(Acc)_ | 5-shot | 48.5 | 57.6 | **76.6** | 66.2 |
|
||||
| ArabicMMLU _(Acc)_ | 3-shot | 61.6 | 63.2 | **67.0** | 66.6 |
|
||||
| TurkishMMLU _(Acc)_ | 5-shot | 43.7 | 45.2 | 60.6 | **63.1** |
|
||||
| GreekMMLU _(Acc)_ | 5-shot | 63.4 | 66.3 | 69.4 | **70.4** |
|
||||
| KazakhMMLU _(Acc)_ | 5-shot | 52.1 | 47.1 | **62.3** | 61.8 |
|
||||
| IndoMMLU _(Acc)_ | 0-shot | 48.5 | 52.0 | **60.1** | 59.5 |
|
||||
| IndoCareer _(Acc)_ | 3-shot | 53.4 | 56.6 | 61.5 | **61.8** |
|
||||
| IndoCulture _(Acc)_ | 0-shot | 59.1 | 58.5 | 61.1 | **62.5** |
|
||||
| **Average** | - | 54.6 | 56.9 | **65.1** | 64.7 |
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "AIDC-AI/Marco-Mini-Global-Base"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
|
||||
|
||||
input_text = "The capital of France is"
|
||||
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
|
||||
outputs = model.generate(**inputs, max_new_tokens=50)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@article{marco-moe,
|
||||
title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
|
||||
author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
|
||||
year={2026}
|
||||
}
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|
||||
40
config.json
Normal file
40
config.json
Normal file
@@ -0,0 +1,40 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3MoeForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"decoder_sparse_step": 1,
|
||||
"dtype": "float32",
|
||||
"eos_token_id": 151643,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 1024,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 3072,
|
||||
"max_position_embeddings": 32768,
|
||||
"max_window_layers": 28,
|
||||
"mlp_only_layers": [],
|
||||
"model_type": "qwen3_moe",
|
||||
"moe_intermediate_size": 768,
|
||||
"norm_topk_prob": true,
|
||||
"num_attention_heads": 16,
|
||||
"num_experts": 256,
|
||||
"num_experts_per_tok": 8,
|
||||
"num_hidden_layers": 28,
|
||||
"num_key_value_heads": 8,
|
||||
"output_router_logits": false,
|
||||
"qkv_bias": false,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000.0,
|
||||
"router_aux_loss_coef": 0.001,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": true,
|
||||
"transformers_version": "4.57.1",
|
||||
"use_cache": true,
|
||||
"use_qk_norm": true,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework":"Pytorch","task":"text-generation"}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151643,
|
||||
"transformers_version": "4.57.1"
|
||||
}
|
||||
3
model-00001-of-00018.safetensors
Normal file
3
model-00001-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a56265e417310d4b4eb8689581370b37bfda16d46ea2f5acef84636d04d27de0
|
||||
size 2000033560
|
||||
3
model-00002-of-00018.safetensors
Normal file
3
model-00002-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:50751c07cc20d141dfad37f49442892f9337683d34e7a21a337876e35a17b1af
|
||||
size 1998751296
|
||||
3
model-00003-of-00018.safetensors
Normal file
3
model-00003-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:75b1ebfd395bbb0befa54f171e46500864980d8db7cb1e02363bdb013c6c1106
|
||||
size 1999795072
|
||||
3
model-00004-of-00018.safetensors
Normal file
3
model-00004-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3da59a9a95d45e15ceb951bb07cedbb64cb3903cbdf2bffac6ebaca558736d83
|
||||
size 1998751072
|
||||
3
model-00005-of-00018.safetensors
Normal file
3
model-00005-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ae49b25d1ee9f56bd7ece4b4499cb101981a6074d054fe8f55733d294771ac29
|
||||
size 1999795304
|
||||
3
model-00006-of-00018.safetensors
Normal file
3
model-00006-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2cbb5f8a2f005933f1a13509ae3e0efd91ca60f7ad51b2880bd1e7312a2cd46c
|
||||
size 1998750992
|
||||
3
model-00007-of-00018.safetensors
Normal file
3
model-00007-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:b6b50896a5e3b5c35ccf33a41ebbc1cb643a327ef5487fb909adffdc442edbc4
|
||||
size 1998752080
|
||||
3
model-00008-of-00018.safetensors
Normal file
3
model-00008-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0290cab2d5376286aecd6a59dfd54bb26a7b2e60682d25e55bd0bf7a4734bb5e
|
||||
size 1999796504
|
||||
3
model-00009-of-00018.safetensors
Normal file
3
model-00009-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:712cdebf5ea3170f80cb2b1e44d4a7d2626f19f4655a4a460187d3ac521fd24d
|
||||
size 1998752264
|
||||
3
model-00010-of-00018.safetensors
Normal file
3
model-00010-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:fb54346e458ca93cc6fbd1f5395d8e9f242359576c95842871d8423bb550cc44
|
||||
size 1998752488
|
||||
3
model-00011-of-00018.safetensors
Normal file
3
model-00011-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:22765eacf9b7a25ce4994e76b481ef9290aa869c92acea3053b4e5051d5bbbba
|
||||
size 1999796440
|
||||
3
model-00012-of-00018.safetensors
Normal file
3
model-00012-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:27c4789cca6d43ab427999ee72969b7e9b275712ebf1e13a4f758406ff657f9d
|
||||
size 1998752272
|
||||
3
model-00013-of-00018.safetensors
Normal file
3
model-00013-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0d14f52140748dc524ee6c4f7b2ab62303bed6b91c86e9a13c2705c9b8c44257
|
||||
size 1998752568
|
||||
3
model-00014-of-00018.safetensors
Normal file
3
model-00014-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:94e92175329d183dd163a17beff47a2f14573731765426ef009c2534c881fe39
|
||||
size 1999796352
|
||||
3
model-00015-of-00018.safetensors
Normal file
3
model-00015-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f9bd5d0bea6a979272fb34c33a5757d9014b1f7c2716373ed094964625ca416d
|
||||
size 1998752344
|
||||
3
model-00016-of-00018.safetensors
Normal file
3
model-00016-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3051d596eb2669c06b6658f1431635c43274f4e17d30d4edc7b8a3e08941aaad
|
||||
size 1999796584
|
||||
3
model-00017-of-00018.safetensors
Normal file
3
model-00017-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:949bf35213acb69396ea905a320a2e6193dfeb5da693b8695da0149714ea18fd
|
||||
size 1998752264
|
||||
3
model-00018-of-00018.safetensors
Normal file
3
model-00018-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:67b24186cd7e42f5054731a2340b578264b42614a752c369f7ccd22884287663
|
||||
size 828684352
|
||||
21766
model.safetensors.index.json
Normal file
21766
model.safetensors.index.json
Normal file
File diff suppressed because it is too large
Load Diff
303282
tokenizer.json
Normal file
303282
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
207
tokenizer_config.json
Normal file
207
tokenizer_config.json
Normal file
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"model_max_length": 131072,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null,
|
||||
"add_bos_token": false
|
||||
}
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user