初始化项目,由ModelHub XC社区提供模型

Model: RefalMachine/ruadapt_qwen2.5_3B_ext_u48_instruct_v4
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-18 14:01:54 +08:00
commit 602e1b9a89
48 changed files with 369640 additions and 0 deletions

39
.gitattributes vendored Normal file
View File

@@ -0,0 +1,39 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
llmtf_eval/daru_treewayextractive.jsonl filter=lfs diff=lfs merge=lfs -text
llmtf_eval/nlpcoreteam_enMMLU.jsonl filter=lfs diff=lfs merge=lfs -text
llmtf_eval/nlpcoreteam_ruMMLU.jsonl filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

84
README.md Normal file
View File

@@ -0,0 +1,84 @@
---
datasets:
- IlyaGusev/saiga_scored
- IlyaGusev/saiga_preferences
- dichspace/darulm
language:
- ru
pipeline_tag: text-generation
base_model:
- RefalMachine/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256
---
## Описание модели
Инструктивная версия адаптированного на русский язык Qwen2.5-3B (RefalMachine/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256). В модели был заменен токенизатор, затем произведено дообучение (Continued pretraining) на русскоязычном корпусе, после чего была применена техника LEP (Learned Embedding Propagation, paper will be soon).
Благодаря новому токенизатору (расширенный tiktoken cl100k с помощью униграм токенизатора на 48 т. токенов) скорость генерации* русскоязычных текстов возрасла до 60% по сравнению с исходной моделью Qwen-2.5-3B-Instruct.
*Под скоростью генерации подразумевается количество русскоязычных символов/слов в секунду на одинаковых текстовых последовательностях.
## Токенизация
![image/png](https://cdn-uploads.huggingface.co/production/uploads/652cedbdf120598322ae358a/O4eQEhnowETEatDPcmArB.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/652cedbdf120598322ae358a/oW0Q6LzD_Py3GdH0kfqu4.png)
## Метрики и оценка качества
Модель была оценена на Ru-Arena-General, MERA, llmtf_open
#### Результаты на Ru-Arena-General
Замеры были произведены с использованием оффициального кода лидерборда (https://github.com/VikhrModels/ru_llm_arena), **но с repetition_penalty=1.1**.
Приведена лишь часть лидерборда, подробнее смотрите в репозитории бенчмарка (https://huggingface.co/spaces/Vikhrmodels/arenahardlb).
| Model Name | Winrate | 95% CI | Average # Tokens |
|--------------------------------------------------|--------|--------------------|------------------|
| gpt-4-1106-preview | 90.9 | ( +1.3 / -0.9) | 541 |
| vikhr-nemo-12b-instruct-r-21-09-24 | 87.3 | (+1.1 / -1.2) | 627 |
| gpt-4o-mini | 83.9 | (+1.9 / -1.6) | 448 |
| ruadapt_qwen2.5_7B_ext_u48_instruct | 81.9 | (+1.7 / -1.6) | 556 |
| gemma-2-9b-it | 76.5 | (+1.1 / -1.1) | 459 |
| Qwen2.5-7B-Instruct | 76.0 | (+1.6 / -1.8) | 484 |
| gemma-2-9b-it-sppo-iter3 | 73.6 | (+2.1 / -2.2) | 509 |
| saiga_llama3_8b_v7 | 67.6 | (+1.7 / -1.4) | 503 |
| **ruadapt_qwen2.5_3B_ext_u48_instruct_v4** | **66.1** | **(+2.2 / -1.9)** | **531** |
| t-lite-instruct-0.1 | 64.7 | (+2.3 / -2.2) | 810 |
#### Результаты на MERA
TODO
#### Результаты на llmtf_open
TODO
## How to cite:
Tikhomirov M., Chernyshev D. Facilitating large language model Russian adaptation with Learned Embedding Propagation // 2024 (will be soon)
Tikhomirov M., Chernyshev D. Impact of Tokenization on LLaMa Russian Adaptation //2023 Ivannikov Ispras Open Conference (ISPRAS). IEEE, 2023. С. 163-168.
#### Результаты на MERA
![image/png](https://cdn-uploads.huggingface.co/production/uploads/652cedbdf120598322ae358a/iMcy-q9r22YCmObww95sH.png)
#### Результаты на llmtf_open
TODO
## How to cite:
Tikhomirov M., Chernyshev D. Facilitating large language model Russian adaptation with Learned Embedding Propagation // 2024 (Preprint: https://arxiv.org/abs/2412.21140)
Tikhomirov M., Chernyshev D. Impact of Tokenization on LLaMa Russian Adaptation //2023 Ivannikov Ispras Open Conference (ISPRAS). IEEE, 2023. С. 163-168.
## Предупреждение
Ответы модели не отражают мнения авторов, а лишь повторяют знания полученные из данных на всех этапах обучения (предобучение, смена токенизатора, обучение на инструкциях, калибровка качества ответов). Модель была получена из сторонней предобученной модели, **контроль за предобучением** которой **не является ответственностью текущих авторов**. При создании данной версии модели не производилось никаких дополнительных действий, направленных на изменение заложенных в LLM "мнений". Используйте с осторожностью.

24
added_tokens.json Normal file
View File

@@ -0,0 +1,24 @@
{
"</tool_call>": 147090,
"<tool_call>": 147089,
"<|box_end|>": 147081,
"<|box_start|>": 147080,
"<|endoftext|>": 147075,
"<|file_sep|>": 147096,
"<|fim_middle|>": 147092,
"<|fim_pad|>": 147094,
"<|fim_prefix|>": 147091,
"<|fim_suffix|>": 147093,
"<|im_end|>": 147077,
"<|im_start|>": 147076,
"<|image_pad|>": 147087,
"<|object_ref_end|>": 147079,
"<|object_ref_start|>": 147078,
"<|quad_end|>": 147083,
"<|quad_start|>": 147082,
"<|repo_name|>": 147095,
"<|video_pad|>": 147088,
"<|vision_end|>": 147085,
"<|vision_pad|>": 147086,
"<|vision_start|>": 147084
}

28
config.json Normal file
View File

@@ -0,0 +1,28 @@
{
"_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75/kto2",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"eos_token_id": 147077,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 32768,
"max_window_layers": 70,
"model_type": "qwen2",
"num_attention_heads": 16,
"num_hidden_layers": 36,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.45.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 147097
}

10
generation_config.json Normal file
View File

@@ -0,0 +1,10 @@
{
"do_sample": true,
"eos_token_id": 147077,
"pad_token_id": 151643,
"repetition_penalty": 1.05,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8,
"transformers_version": "4.45.2"
}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 512,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 200,
"method": "generate"
}
}

View File

@@ -0,0 +1,8 @@
{
"task_name": "daru/treewayabstractive",
"results": {
"rouge1": 0.33109987599556284,
"rouge2": 0.11202889150257295
},
"leaderboard_result": 0.2215643837490679
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1d1020fb7dc9f204618ab609bffe6d81a88ff7a7c1355f6f8be4b50fe0de8409
size 212492287

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 1,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 1000,
"method": "calculate_logsoftmax"
}
}

View File

@@ -0,0 +1,7 @@
{
"task_name": "daru/treewayextractive",
"results": {
"r-prec": 0.3917218614718615
},
"leaderboard_result": 0.3917218614718615
}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 64,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 10000000000000,
"method": "generate"
}
}

View File

@@ -0,0 +1,8 @@
{
"task_name": "darumeru/MultiQ",
"results": {
"f1": 0.3346248767848689,
"em": 0.22275334608030592
},
"leaderboard_result": 0.2786891114325874
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 64,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 10000000000000,
"method": "calculate_tokens_proba"
}
}

View File

@@ -0,0 +1,7 @@
{
"task_name": "darumeru/PARus",
"results": {
"acc": 0.7
},
"leaderboard_result": 0.7
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 64,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 10000000000000,
"method": "calculate_tokens_proba"
}
}

View File

@@ -0,0 +1,8 @@
{
"task_name": "darumeru/RCB",
"results": {
"acc": 0.5454545454545454,
"f1_macro": 0.49090309951702227
},
"leaderboard_result": 0.5181788224857838
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 64,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 10000000000000,
"method": "calculate_tokens_proba"
}
}

View File

@@ -0,0 +1,7 @@
{
"task_name": "darumeru/RWSD",
"results": {
"acc": 0.6029411764705882
},
"leaderboard_result": 0.6029411764705882
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 1024,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 10000000000000,
"method": "generate"
}
}

View File

@@ -0,0 +1,9 @@
{
"task_name": "darumeru/cp_para_ru",
"results": {
"symbol_per_token": 3.993754090002875,
"len": 0.9986883734384026,
"lcs": 0.98
},
"leaderboard_result": 0.98
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 64,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 10000000000000,
"method": "calculate_tokens_proba"
}
}

View File

@@ -0,0 +1,8 @@
{
"task_name": "darumeru/ruOpenBookQA",
"results": {
"acc": 0.7302405498281787,
"f1_macro": 0.7304546157096631
},
"leaderboard_result": 0.7303475827689209
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 64,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 10000000000000,
"method": "calculate_tokens_proba"
}
}

View File

@@ -0,0 +1,8 @@
{
"task_name": "darumeru/ruWorldTree",
"results": {
"acc": 0.9047619047619048,
"f1_macro": 0.9043404138496471
},
"leaderboard_result": 0.904551159305776
}

View File

@@ -0,0 +1,251 @@
INFO: 2024-10-17 21:30:14,019: llmtf.base.evaluator: Starting eval on ['darumeru/multiq']
INFO: 2024-10-17 21:30:14,019: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 21:30:14,019: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 21:30:20,481: llmtf.base.darumeru/MultiQ: Loading Dataset: 6.46s
INFO: 2024-10-17 21:35:59,593: llmtf.base.darumeru/MultiQ: Processing Dataset: 339.11s
INFO: 2024-10-17 21:35:59,593: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-17 21:35:59,594: llmtf.base.darumeru/MultiQ: {'f1': 0.3346248767848689, 'em': 0.22275334608030592}
INFO: 2024-10-17 21:35:59,599: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 21:35:59,599: llmtf.base.evaluator:
mean darumeru/MultiQ
0.279 0.279
INFO: 2024-10-17 21:36:08,809: llmtf.base.evaluator: Starting eval on ['darumeru/parus']
INFO: 2024-10-17 21:36:08,810: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 21:36:08,810: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 21:36:12,969: llmtf.base.darumeru/PARus: Loading Dataset: 4.16s
INFO: 2024-10-17 21:36:18,316: llmtf.base.darumeru/PARus: Processing Dataset: 5.35s
INFO: 2024-10-17 21:36:18,317: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-17 21:36:18,327: llmtf.base.darumeru/PARus: {'acc': 0.7}
INFO: 2024-10-17 21:36:18,327: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 21:36:18,328: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus
0.489 0.279 0.700
INFO: 2024-10-17 21:36:27,550: llmtf.base.evaluator: Starting eval on ['darumeru/rcb']
INFO: 2024-10-17 21:36:27,550: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 21:36:27,551: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 21:36:31,450: llmtf.base.darumeru/RCB: Loading Dataset: 3.90s
INFO: 2024-10-17 21:36:38,683: llmtf.base.darumeru/RCB: Processing Dataset: 7.23s
INFO: 2024-10-17 21:36:38,683: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-17 21:36:38,686: llmtf.base.darumeru/RCB: {'acc': 0.5454545454545454, 'f1_macro': 0.49090309951702227}
INFO: 2024-10-17 21:36:38,687: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 21:36:38,688: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB
0.499 0.279 0.700 0.518
INFO: 2024-10-17 21:36:48,734: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa']
INFO: 2024-10-17 21:36:48,735: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 21:36:48,735: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 21:36:54,900: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 6.17s
INFO: 2024-10-17 21:38:00,519: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 65.62s
INFO: 2024-10-17 21:38:00,520: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-17 21:38:00,532: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7302405498281787, 'f1_macro': 0.7304546157096631}
INFO: 2024-10-17 21:38:00,541: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 21:38:00,542: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA
0.557 0.279 0.700 0.518 0.730
INFO: 2024-10-17 21:38:09,745: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree']
INFO: 2024-10-17 21:38:09,745: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 21:38:09,745: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 21:38:14,102: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 4.36s
INFO: 2024-10-17 21:38:16,932: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.83s
INFO: 2024-10-17 21:38:16,933: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-17 21:38:16,936: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9043404138496471}
INFO: 2024-10-17 21:38:16,936: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 21:38:16,937: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA darumeru/ruWorldTree
0.626 0.279 0.700 0.518 0.730 0.905
INFO: 2024-10-17 21:38:26,077: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd']
INFO: 2024-10-17 21:38:26,077: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 21:38:26,077: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 21:38:30,781: llmtf.base.darumeru/RWSD: Loading Dataset: 4.70s
INFO: 2024-10-17 21:38:36,497: llmtf.base.darumeru/RWSD: Processing Dataset: 5.72s
INFO: 2024-10-17 21:38:36,497: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-17 21:38:36,498: llmtf.base.darumeru/RWSD: {'acc': 0.6029411764705882}
INFO: 2024-10-17 21:38:36,499: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 21:38:36,500: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree
0.622 0.279 0.700 0.518 0.603 0.730 0.905
INFO: 2024-10-17 21:38:45,688: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-17 21:38:45,688: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 21:38:45,688: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 21:39:02,002: llmtf.base.daru/treewayextractive: Loading Dataset: 16.31s
INFO: 2024-10-17 21:42:05,777: llmtf.base.daru/treewayextractive: Processing Dataset: 183.77s
INFO: 2024-10-17 21:42:05,777: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-10-17 21:42:06,010: llmtf.base.daru/treewayextractive: {'r-prec': 0.3917218614718615}
INFO: 2024-10-17 21:42:06,052: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 21:42:06,054: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree
0.589 0.392 0.279 0.700 0.518 0.603 0.730 0.905
INFO: 2024-10-17 21:42:15,170: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-17 21:42:15,170: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 21:42:15,170: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 21:46:47,282: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 272.11s
INFO: 2024-10-17 21:56:29,398: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 582.12s
INFO: 2024-10-17 21:56:29,399: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-17 21:56:29,464: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.340000
anatomy 0.414815
astronomy 0.611842
business_ethics 0.610000
clinical_knowledge 0.554717
college_biology 0.548611
college_chemistry 0.380000
college_computer_science 0.450000
college_mathematics 0.400000
college_medicine 0.526012
college_physics 0.470588
computer_security 0.620000
conceptual_physics 0.565957
econometrics 0.377193
electrical_engineering 0.537931
elementary_mathematics 0.529101
formal_logic 0.365079
global_facts 0.360000
high_school_biology 0.664516
high_school_chemistry 0.487685
high_school_computer_science 0.700000
high_school_european_history 0.751515
high_school_geography 0.722222
high_school_government_and_politics 0.564767
high_school_macroeconomics 0.528205
high_school_mathematics 0.433333
high_school_microeconomics 0.533613
high_school_physics 0.403974
high_school_psychology 0.713761
high_school_statistics 0.523148
high_school_us_history 0.661765
high_school_world_history 0.717300
human_aging 0.587444
human_sexuality 0.618321
international_law 0.735537
jurisprudence 0.666667
logical_fallacies 0.564417
machine_learning 0.392857
management 0.650485
marketing 0.752137
medical_genetics 0.580000
miscellaneous 0.632184
moral_disputes 0.583815
moral_scenarios 0.299441
nutrition 0.637255
philosophy 0.617363
prehistory 0.561728
professional_accounting 0.386525
professional_law 0.377445
professional_medicine 0.481618
professional_psychology 0.516340
public_relations 0.500000
security_studies 0.648980
sociology 0.756219
us_foreign_policy 0.720000
virology 0.439759
world_religions 0.719298
INFO: 2024-10-17 21:56:29,473: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.503308
humanities 0.586259
other (business, health, misc.) 0.543782
social sciences 0.599968
INFO: 2024-10-17 21:56:29,478: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5583294528508019}
INFO: 2024-10-17 21:56:29,516: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 21:56:29,518: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU
0.586 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.558
INFO: 2024-10-17 21:56:39,535: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-17 21:56:39,536: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 21:56:39,536: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 21:58:54,966: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 135.43s
INFO: 2024-10-17 22:08:04,419: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 549.45s
INFO: 2024-10-17 22:08:04,426: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-17 22:08:04,492: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.380000
anatomy 0.637037
astronomy 0.717105
business_ethics 0.700000
clinical_knowledge 0.705660
college_biology 0.715278
college_chemistry 0.470000
college_computer_science 0.580000
college_mathematics 0.330000
college_medicine 0.664740
college_physics 0.509804
computer_security 0.740000
conceptual_physics 0.642553
econometrics 0.508772
electrical_engineering 0.600000
elementary_mathematics 0.547619
formal_logic 0.412698
global_facts 0.360000
high_school_biology 0.783871
high_school_chemistry 0.581281
high_school_computer_science 0.710000
high_school_european_history 0.800000
high_school_geography 0.757576
high_school_government_and_politics 0.854922
high_school_macroeconomics 0.679487
high_school_mathematics 0.455556
high_school_microeconomics 0.773109
high_school_physics 0.437086
high_school_psychology 0.844037
high_school_statistics 0.652778
high_school_us_history 0.833333
high_school_world_history 0.843882
human_aging 0.677130
human_sexuality 0.786260
international_law 0.768595
jurisprudence 0.814815
logical_fallacies 0.803681
machine_learning 0.446429
management 0.786408
marketing 0.858974
medical_genetics 0.760000
miscellaneous 0.795658
moral_disputes 0.667630
moral_scenarios 0.311732
nutrition 0.732026
philosophy 0.704180
prehistory 0.712963
professional_accounting 0.503546
professional_law 0.457627
professional_medicine 0.658088
professional_psychology 0.668301
public_relations 0.709091
security_studies 0.697959
sociology 0.800995
us_foreign_policy 0.800000
virology 0.506024
world_religions 0.801170
INFO: 2024-10-17 22:08:04,506: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.572187
humanities 0.687100
other (business, health, misc.) 0.667521
social sciences 0.740042
INFO: 2024-10-17 22:08:04,511: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6667125709237595}
INFO: 2024-10-17 22:08:04,554: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 22:08:04,556: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.595 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.667 0.558
INFO: 2024-10-17 22:08:14,512: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-17 22:08:14,513: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 22:08:14,513: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 22:08:18,791: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.28s
INFO: 2024-10-17 22:11:46,260: llmtf.base.daru/treewayabstractive: Processing Dataset: 207.47s
INFO: 2024-10-17 22:11:46,260: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-17 22:11:46,261: llmtf.base.daru/treewayabstractive: {'rouge1': 0.33109987599556284, 'rouge2': 0.11202889150257295}
INFO: 2024-10-17 22:11:46,262: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 22:11:46,263: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.557 0.222 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.667 0.558
INFO: 2024-10-17 22:11:55,717: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru']
INFO: 2024-10-17 22:11:55,717: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-17 22:11:55,717: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 22:11:59,846: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.13s
INFO: 2024-10-17 22:14:29,975: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 150.13s
INFO: 2024-10-17 22:14:29,975: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-17 22:14:29,976: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.993754090002875, 'len': 0.9986883734384026, 'lcs': 0.98}
INFO: 2024-10-17 22:14:29,977: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 22:14:29,977: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.596 0.222 0.392 0.279 0.700 0.518 0.603 0.980 0.730 0.905 0.667 0.558

View File

@@ -0,0 +1,2 @@
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.596 0.222 0.392 0.279 0.700 0.518 0.603 0.980 0.730 0.905 0.667 0.558

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7333ae2dd9a5974ba2d5a03e75a6f52f4f469f8d181f2b230b4c72c30d43220e
size 37141480

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 64,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 10000000000000,
"method": "calculate_tokens_proba"
}
}

View File

@@ -0,0 +1,7 @@
{
"task_name": "nlpcoreteam/enMMLU",
"results": {
"acc": 0.6667125709237595
},
"leaderboard_result": 0.6667125709237595
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:14ad0be3cf344bf030c934b0049c4691c1ab4d94c6b54d5b28ffa9cf84b88867
size 43064153

View File

@@ -0,0 +1,54 @@
{
"custom_generation_config": null,
"model_params": {
"model_name_or_path": "/workdir/data/models/qwen/ruadapt_qwen2.5_3B_ext_u48_full_lr5e4_peft_mlp_32_32_bs256_as1.75_kto_simpo/simpo0",
"generation_config": {
"bos_token_id": 147075,
"do_sample": true,
"eos_token_id": [
147077
],
"max_length": 32768,
"max_new_tokens": 64,
"pad_token_id": 147075,
"stop_strings": [
"<|im_end|>"
],
"temperature": 0.1,
"top_k": 40,
"top_p": 0.9,
"transformers_version": "4.45.2",
"trust_remote_code": false
},
"conversation_template": {
"system_prompt": "",
"system_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"user_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template": "<|im_start|>{role}\n{content}<|im_end|>\n",
"bot_message_template_incomplete": "<|im_start|>{role}\n{content}",
"user_role": "user",
"bot_role": "assistant",
"system_role": "system",
"global_prefix": "",
"suffix": "<|im_start|>assistant\n",
"add_special_tokens": false,
"eos_token": "<|im_end|>"
},
"load_in_8bit": false,
"torch_dtype": "auto",
"use_flash_attention_2": true,
"device_map": "cuda:0",
"use_fast_tokenizer": true,
"leading_space": false,
"space_token": null,
"trust_remote_code": false,
"max_model_len": 32768
},
"task_params": {
"max_len": 4000,
"few_shot_count": 0,
"batch_size": 8,
"max_sample_per_dataset": 10000000000000,
"method": "calculate_tokens_proba"
}
}

View File

@@ -0,0 +1,7 @@
{
"task_name": "nlpcoreteam/ruMMLU",
"results": {
"acc": 0.5583294528508019
},
"leaderboard_result": 0.5583294528508019
}

146820
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:78d625497b470a4e2ca4b4e41223b9694859b8f6d48dd5b3f537dd7a8b2486bf
size 4982828640

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f1ce29823d548fffc6e3b2679bea25e856e12a3053a8bbbb55384ab1dc2f8c5a
size 1169277800

View File

@@ -0,0 +1,441 @@
{
"metadata": {
"total_size": 6152056832
},
"weight_map": {
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.22.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.22.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.22.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.23.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.23.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.23.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.24.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.24.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.24.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.25.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.25.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.25.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.26.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.26.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.26.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.27.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.27.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.27.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.28.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.28.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.norm.weight": "model-00002-of-00002.safetensors"
}
}

38
special_tokens_map.json Normal file
View File

@@ -0,0 +1,38 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:84b9483a1bfb5b8bb09b0269429a5e7fd89ee2868eb31cc70d5f695b64b95852
size 12441282

207
tokenizer_config.json Normal file
View File

@@ -0,0 +1,207 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"147075": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147076": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147077": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147078": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147079": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147080": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147081": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147082": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147083": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147084": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147085": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147086": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147087": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147088": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147089": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147090": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147091": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147092": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147093": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147094": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147095": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"147096": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": "<|endoftext|>",
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long