初始化项目,由ModelHub XC社区提供模型

Model: QLUNLP/BianCang-Qwen2.5-7B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-07 00:57:34 +08:00
commit 8e20be4f2e
28 changed files with 545965 additions and 0 deletions

38
.gitattributes vendored Normal file
View File

@@ -0,0 +1,38 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text

494
README.md Normal file
View File

@@ -0,0 +1,494 @@
# 扁仓中医大模型BianCang: A Traditional Chinese Medicine Large Language Model
<div align="center">
<p>
<img src="assets/BianCang-logo.png" width="500px"/>
</p>
</div>
## 💡介绍
你好,欢迎来到扁仓中医大模型的开源仓库。
为推动大语言模型在传统中医领域的落地应用,辅助医生进行疾病诊断,辅助患者进行自我评估,推动大模型赋能传统中医,我们在该仓库推出了**扁仓**系列中医大模型。扁仓是古代名医扁鹊、仓公的并称,泛指名医。我们期待扁仓中医大模型能够在延续中医传承和提升我国人民医疗健康水平方面做出一定的贡献。
扁仓以Qwen2/2.5作为基座,采用先注入领域知识再进行知识激活和对齐的两阶段训练方法而得到。扁仓在中医辨病辨证等中医特色任务上取得了最先进的性能,并且在各种医学执照考试中表现优异。
我们在该仓库中开源以下资源:
- 扁仓基座模型权重包括BianCang-Qwen2-7B、BianCang-Qwen2.5-7B、BianCang-Qwen2.5-14B。
- 扁仓指令精调模型权重包括BianCang-Qwen2-7B-Instruct、BianCang-Qwen2.5-7B-Instruct、BianCang-Qwen2.5-14B-Instruct。
更多信息请查看[GitHub]([QLU-NLP/BianCang](https://github.com/QLU-NLP/BianCang))
## 🚀推理
### 使用SWIFT
#### 环境安装
在[Release v2.4.2 · modelscope/ms-swift](https://github.com/modelscope/ms-swift/releases/tag/v2.4.2)处下载SWIFT源码切换到对应目录然后执行安装命令
```shell
cd swift
pip install -e .
```
你可以根据自己的GPU驱动版本去选择合适的torch版本进行替换SWIFT至少需要torch >= 1.13推荐torch >= 2.0.0。
注意由于我们进行SFT训练时使用的Chat Template为*qwen*因此如果你使用的SWIFT版本大于我们提供的版本可能会遇到Qwen2.5 Chat Template不对应的问题请手动将Chat Template指定为*qwen*而不是*qwen2_5*。具体原因参考:[fix qwen2.5 template by Jintao-Huang · Pull Request #2081 · modelscope/ms-swift](https://github.com/modelscope/ms-swift/pull/2081)
#### 推理方式1-代码推理
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType
)
from swift.utils import seed_everything
model_type = ModelType.qwen2_5_7b_instruct
template_type = 'qwen'
model_id_or_path = 'QLU-NLP/BianCang-Qwen2.5-7B-Instruct'
model, tokenizer = get_model_tokenizer(model_type, model_id_or_path=model_id_or_path, model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
query = '你好,你是谁?'
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')
query = '下面是一名患者的基本情况。年龄78岁性别女。主 诉:活动后胸痛一周。现病史:患者一周前活动后出现胸口隐隐作痛,如针刺样乏力气短,活动后汗出,偏头痛。中医望闻切诊:表情自然,面色红润,形体正常,语气清,气息平;无异常气味,舌暗红,苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况,判断该患者的主要中医疾病和中医证型,并给出中医辨病辨证的依据。'
response, history = inference(model, template, query, history)
print(f'query: {query}')
print(f'response: {response}')
print(f'history: {history}')
```
输出:
```
query: 你好,你是谁?
response: 你好!我是一个名为扁仓中医大模型的人工智能,由齐鲁工业大学(山东省科学院)计算机科学与技术学部(国家超级计算济南中心)自然语言处理与认知计算团队研发。我被设计成能够理解和生成自然语言文本,以便与人类进行中医辩证、中医处方推荐、中医知识问答、中医问题咨询等方面的对话交流,辅助人们完成疾病诊断相关的任务。请问有什么我可以帮助您的吗?
query: 下面是一名患者的基本情况。年龄78岁性别女。主 诉:活动后胸痛一周。现病史:患者一周前活动后出现胸口隐隐作痛,如针刺样乏力气短,活动后汗出,偏头痛。中医望闻切诊:表情自然,面色红润,形体正常,语气清,气息平;无异常气味,舌暗红,苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况,判断该患者的主要中医疾病和中医证型,并给出中医辨病辨证的依据。
response: 根据中医的诊断方法,患者患有胸痹心痛,中医证型属于气虚血瘀证。综合脉证,四诊合参,本病当属祖国医学“胸痹心痛病”范畴,证属“气虚血瘀”。患者素体虚弱,久病伤正,伤及心气,心气衰微,机能不健,致阴邪易于上乘阳位,况心脉为宗气之所,百脉朝会之枢,宗气的鼓动形成了心气推动血液运行全身,心气不足则血行无力瘀滞,发为本病,舌脉俱为佐证。
history: [['你好,你是谁?', '你好!我是一个名为扁仓中医大模型的人工智能,由齐鲁工业大学(山东省科学院)计算机科学与技术学部(国家超级计算济南中心)自然语言处理与认知计算团队研发。我被设计成能够理解和生成自然语言文本,以便与人类进行中医辩证、中医处方推荐、中医知识问答、中医问题咨询等方面的对话交流,辅助人们完成疾病诊断相关的任务。请问有什么我可以帮助您的吗?'], ['下面是一名患者的基本情况。年龄78岁性别女。主 诉:活动后胸痛一周。现病史:患者一周前活动后出现胸口隐隐作痛,如针刺样乏力气短,活动后汗出,偏头痛。中医望闻切诊:表情自然,面色红润,形体正常,语气清,气息平;无异常气味,舌暗红,苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况,判断该患者的主要中医疾病和中医证型,并给出中医辨病辨证的依据。', '根据中医的诊断方法,患者患有胸痹心痛,中医证型属于气虚血瘀证。综合脉证,四诊合参,本病当属祖国医学“胸痹心痛病”范畴,证属“气虚血瘀”。患者素体虚弱,久病伤正,伤及心气,心气衰微,机能不健,致阴邪易于上乘阳位,况心脉为宗气之所,百脉朝会之枢,宗气的鼓动形成了心气推动血液运行全身,心气不足则血行无力瘀滞,发为本病,舌脉俱为佐证。']]
```
#### 推理方式2-部署API
使用以下命令部署API
```shell
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen2_5-7b-instruct --model_id_or_path QLU-NLP/BianCang-Qwen2.5-7B-Instruct --port 8090 --template_type qwen
```
使用curl进行测试
```shell
curl http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2_5-7b-instruct",
"messages": [{"role": "user", "content": "你好,你是谁?"}],
"max_tokens": 256,
"temperature": 0.3
}'
```
响应如下:
```json
{"model":"qwen2_5-7b-instruct",
"choices":[{"index":0,"message":{"role":"assistant","content":"你好!我是一个名为扁仓中医大模型的人工智能,由齐鲁工业大学(山东省科学院)计算机科学与技术学部(国家超级计算济南中心)自然语言处理与认知计算团队研发。我被设计成能够理解和生成自然语言文本,以便与人类进行中医辩证、中医处方推荐、中医知识问答、中医问题咨询等方面的对话交流,辅助人们完成疾病诊断相关的任务。请问有什么我可以帮助您的吗?",
"tool_calls":null},"finish_reason":null,"logprobs":null}],
"usage":{"prompt_tokens":24,"completion_tokens":92,"total_tokens":116},
"id":"chatcmpl-6b4a02dee57a42238b27b5c40085df16",
"object":"chat.completion","created":1730209011}
```
使用代码进行测试:
```python
from swift.llm import get_model_list_client, XRequestConfig, inference_client
model_list = get_model_list_client(port=8090)
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
query = "你好,你是谁?"
request_config = XRequestConfig(seed=42)
resp = inference_client(model_type, query, request_config=request_config, port=8090)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
history = [(query, response)]
query = '下面是一名患者的基本情况。年龄78岁性别女。主 诉:活动后胸痛一周。现病史:患者一周前活动后出现胸口隐隐作痛,如针刺样乏力气短,活动后汗出,偏头痛。中医望闻切诊:表情自然,面色红润,形体正常,语气清,气息平;无异常气味,舌暗红,苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况,判断该患者的主要中医疾病和中医证型,并给出中医辨病辨证的依据。'
request_config = XRequestConfig(stream=True, seed=42)
stream_resp = inference_client(model_type, query, history, request_config=request_config, port=8090)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
```
输出如下:
```
model_type: qwen2_5-7b-instruct
query: 你好,你是谁?
response: 你好!我是一个名为扁仓中医大模型的人工智能,由齐鲁工业大学(山东省科学院)计算机科学与技术学部(国家超级计算济南中心)自然语言处理与认知计算团队研发。我被设计成能够理解和生成自然语言文本,以便与人类进行中医辩证、中医处方推荐、中医知识问答、中医问题咨询等方面的对话交流,辅助人们完成疾病诊断相关的任务。请问有什么我可以帮助您的吗?
query: 下面是一名患者的基本情况。年龄78岁性别女。主 诉:活动后胸痛一周。现病史:患者一周前活动后出现胸口隐隐作痛,如针刺样乏力气短,活动后汗出,偏头痛。中医望闻切诊:表情自然,面色红润,形体正常,语气清,气息平;无异常气味,舌暗红,苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况,判断该患者的主要中医疾病和中医证型,并给出中医辨病辨证的依据。
response: 根据中医的诊断方法,患者患有胸痹心痛,中医证型属于气虚血瘀证。综合脉证,四诊合参,本病当属祖国医学“胸痹心痛病”范畴,证属“气虚血瘀”。患者素体虚弱,久病伤正,伤及心气,心气衰微,机能不健,致阴邪易于上乘阳位,况心脉为宗气之所,百脉朝会之枢,宗气的鼓动形成了心气推动血液运行全身,心气不足则血行无力瘀滞,发为本病,舌脉俱为佐证。
```
### 使用Transformers
你也可以使用transformers包进行推理
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "QLU-NLP/BianCang-Qwen2.5-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "你好,你是谁?"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=256
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
### 使用Web UI
我们提供了一个简单的演示Web UI。
安装streamlit
```shell
pip install streamlit
```
使用SWIFT部署API
```shell
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen2_5-7b-instruct --model_id_or_path QLU-NLP/BianCang-Qwen2.5-7B-Instruct --port 8090 --template_type qwen
```
启动streamlit
```shell
streamlit run web_ui.py
```
![image-20241029215908096](assets/webui.png)
## 🥇中医能力测试
<table border="1" cellpadding="5" cellspacing="0">
<thead>
<tr>
<th rowspan="2">Model</th>
<th colspan="4">TCM Syndrome Differentiation</th>
<th colspan="4">TCM Disease Diagnosis</th>
<th colspan="4">TCM Exam</th>
</tr>
<tr>
<th colspan="2">TCMSD <br>Acc.(%)</th>
<th colspan="2">TCMSD-BC<br> Acc.(%)</th>
<th colspan="2">TCMDD<br> Acc.(%)</th>
<th colspan="2">TCMDD-BC<br> Acc.(%)</th>
<th colspan="2">MLEC-TCM<br> Acc.(%)</th>
<th colspan="2">MLEC-CWM<br> Acc.(%)</th>
</tr>
<tr>
<th></th><th>DI</th><th>CoT</th><th>DI</th><th>CoT</th><th>DI</th><th>CoT</th><th>DI</th><th>CoT</th><th>ZS</th><th>FS</th><th>ZS</th><th>FS</th>
</tr>
</thead>
<tbody align="center" valign="center">
<tr><td>GPT-4</td><td>24.53</td><td>45.21</td><td>16.67</td><td>70.73</td><td>27.83</td><td>54.54</td><td>41.80</td><td>68.33</td><td>74.70</td><td>76.35</td><td>76.26</td><td>76.37</td></tr>
<tr><td>DeepSeek-V3</td><td>34.62</td><td>40.74</td><td>24.53</td><td>72.00</td><td>46.97</td><td>59.08</td><td>82.67</td><td>72.93</td><td>84.97</td><td>88.56</td><td>85.05</td><td>87.81</td></tr>
<tr><td>DeepSeek-R1</td><td>37.17</td><td>55.67</td><td>25.53</td><td>76.07</td><td>50.66</td><td>80.75</td><td>79.27</td><td>94.53</td><td>92.68</td><td>93.10</td><td>90.92</td><td>90.77</td></tr>
<tr><td>Qwen2-7B</td><td>31.74</td><td>27.18</td><td>32.73</td><td>28.40</td><td>41.60</td><td>54.59</td><td>74.87</td><td>77.93</td><td>86.01</td><td>89.18</td><td>84.45</td><td>87.89</td></tr>
<tr><td>Qwen2-7B-Instruct</td><td>25.70</td><td>33.41</td><td>14.27</td><td>57.00</td><td>32.87</td><td>52.92</td><td>60.40</td><td>60.13</td><td>83.61</td><td>84.22</td><td>79.89</td><td>82.99</td></tr>
<tr><td>Qwen2.5-7B</td><td>30.44</td><td>21.29</td><td>17.87</td><td>35.73</td><td>23.71</td><td>43.88</td><td>63.87</td><td>71.27</td><td>83.32</td><td>85.52</td><td>82.02</td><td>84.04</td></tr>
<tr><td>Qwen2.5-7B-Instruct</td><td>24.30</td><td>32.19</td><td>9.93</td><td>57.07</td><td>36.29</td><td>51.51</td><td>62.93</td><td>55.53</td><td>78.72</td><td>79.88</td><td>77.27</td><td>78.43</td></tr>
<tr><td>Qwen2.5-14B</td><td>35.62</td><td>25.21</td><td>33.93</td><td>30.13</td><td>24.33</td><td>36.64</td><td>33.33</td><td>32.80</td><td>86.59</td><td>89.93</td><td>87.10</td><td>90.06</td></tr>
<tr><td>Qwen2.5-14B-Instruct</td><td>25.94</td><td>35.03</td><td>16.07</td><td>60.00</td><td>38.30</td><td>49.31</td><td>46.27</td><td>53.67</td><td>82.25</td><td>84.81</td><td>81.79</td><td>85.68</td></tr>
<tr><td>BianCang-Qwen2-7B</td><td>42.14</td><td>30.30</td><td>57.80</td><td>48.00</td><td>43.73</td><td>54.67</td><td>74.73</td><td>80.67</td><td>90.86</td><td>91.87</td><td>89.08</td><td>90.36</td></tr>
<tr><td>BianCang-Qwen2-7B-Instruct</td><td>68.88</td><td>75.96</td><td>57.33</td><td>75.40</td><td>64.42</td><td>77.71</td><td><b>89.07</b></td><td>85.67</td><td><b>92.39</b></td><td><b>92.39</b></td><td>91.14</td><td>91.48</td></tr>
<tr><td>BianCang-Qwen2.5-7B</td><td>46.57</td><td>26.72</td><td>52.93</td><td>45.47</td><td>49.80</td><td>53.15</td><td>68.13</td><td>61.73</td><td>86.46</td><td>86.30</td><td>83.93</td><td>85.35</td></tr>
<tr><td>BianCang-Qwen2.5-7B-Instruct</td><td>78.90</td><td><b>82.10</b></td><td><b>66.73</b></td><td><b>77.73</b></td><td>73.73</td><td><b>82.65</b></td><td>87.87</td><td><b>89.40</b></td><td>90.22</td><td>90.57</td><td>90.32</td><td>90.62</td></tr>
<tr><td>BianCang-Qwen2.5-14B</td><td>43.77</td><td>33.96</td><td>61.93</td><td>53.47</td><td>66.61</td><td>60.39</td><td>82.93</td><td>77.07</td><td>89.28</td><td>90.86</td><td>89.42</td><td>90.58</td></tr>
<tr><td>BianCang-Qwen2.5-14B-Instruct</td><td><b>79.38</b></td><td>75.54</td><td>62.27</td><td>70.73</td><td><b>77.63</b></td><td>82.05</td><td>86.33</td><td>88.73</td><td>92.29</td><td>92.29</td><td><b>92.75</b></td><td><b>92.86</b></td></tr>
</tbody>
</table>
<br>
<table border="1">
<tr>
<th>Model</th>
<th>CMB Acc.(%)</th>
<th colspan="2">MLEC-Clinic <br>Acc.(%)</th>
<th colspan="2">MLEC-PublicHealth<br> Acc.(%)</th>
<th colspan="2">MLEC-Stomatology<br> Acc.(%)</th>
</tr>
<tr>
<th></th>
<th>ZS/FS</th>
<th>ZS</th>
<th>FS</th>
<th>ZS</th>
<th>FS</th>
<th>ZS</th>
<th>FS</th>
</tr>
<tr>
<td>GPT-4</td>
<td>59.46*</td>
<td>82.63</td>
<td>82.69</td>
<td>81.55</td>
<td>82.58</td>
<td>72.97</td>
<td>75.43</td>
</tr>
<tr>
<td>DeepSeek-V3</td>
<td>82.33</td>
<td>86.83</td>
<td>89.41</td>
<td>85.38</td>
<td>87.59</td>
<td>79.09</td>
<td>81.97</td>
</tr>
<tr>
<td>DeepSeek-R1</td>
<td>86.38</td>
<td>92.51</td>
<td>92.36</td>
<td>91.42</td>
<td>90.40</td>
<td>87.03</td>
<td>86.16</td>
</tr>
<tr>
<td>Qwen2-7B</td>
<td>81.63</td>
<td>87.63</td>
<td>90.63</td>
<td>82.63</td>
<td>86.79</td>
<td>80.34</td>
<td>84.65</td>
</tr>
<tr>
<td>Qwen2-7B-Instruct</td>
<td>83.45</td>
<td>85.16</td>
<td>83.35</td>
<td>81.61</td>
<td>81.07</td>
<td>76.29</td>
<td>75.88</td>
</tr>
<tr>
<td>Qwen2.5-7B</td>
<td>79.60</td>
<td>86.65</td>
<td>88.55</td>
<td>83.39</td>
<td>85.17</td>
<td>78.03</td>
<td>80.79</td>
</tr>
<tr>
<td>Qwen2.5-7B-Instruct</td>
<td>79.51</td>
<td>82.81</td>
<td>83.73</td>
<td>80.96</td>
<td>80.85</td>
<td>72.93</td>
<td>74.40</td>
</tr>
<tr>
<td>Qwen2.5-14B</td>
<td>84.07</td>
<td>90.40</td>
<td>93.13</td>
<td>86.46</td>
<td>89.54</td>
<td>84.31</td>
<td>88.20</td>
</tr>
<tr>
<td>Qwen2.5-14B-Instruct</td>
<td>83.69</td>
<td>86.47</td>
<td>88.02</td>
<td>83.17</td>
<td>86.14</td>
<td>78.94</td>
<td>82.57</td>
</tr>
<tr>
<td>BianCang-Qwen2-7B (Ours)</td>
<td>83.27</td>
<td>91.88</td>
<td>93.31</td>
<td>88.57</td>
<td>90.72</td>
<td>85.29</td>
<td>88.47</td>
</tr>
<tr>
<td>BianCang-Qwen2-7B-Instruct (Ours)</td>
<td>84.08</td>
<td>94.35</td>
<td>94.35</td>
<td>91.37</td>
<td><b>91.64</b></td>
<td>89.19</td>
<td>90.02</td>
</tr>
<tr>
<td>BianCang-Qwen2.5-7B (Ours)</td>
<td>80.13</td>
<td>90.43</td>
<td>91.32</td>
<td>85.65</td>
<td>87.22</td>
<td>82.19</td>
<td>82.65</td>
</tr>
<tr>
<td>BianCang-Qwen2.5-7B-Instruct (Ours)</td>
<td>80.71</td>
<td>93.40</td>
<td>93.43</td>
<td>89.91</td>
<td>89.91</td>
<td>86.43</td>
<td>86.77</td>
</tr>
<tr>
<td>BianCang-Qwen2.5-14B (Ours)</td>
<td><b>84.34</b></td>
<td>91.70</td>
<td>93.37</td>
<td>87.92</td>
<td>89.97</td>
<td>86.16</td>
<td>87.94</td>
</tr>
<tr>
<td>BianCang-Qwen2.5-14B-Instruct (Ours)</td>
<td>83.80</td>
<td><b>94.74</b></td>
<td><b>94.97</b></td>
<td><b>91.86</b></td>
<td>91.53</td>
<td><b>90.43</b></td>
<td><b>90.51</b></td>
</tr>
</table>
更多测评结果请关注我们的技术报告。
## 🧡致谢
本项目基于开源项目进行开发,在此对相关项目和研究开发人员表示感谢。
- [Qwen2](https://github.com/vitanova/Qwen2)
- [Qwen2.5](https://github.com/QwenLM/Qwen2.5)
- [SWIFT](https://github.com/modelscope/ms-swift)
- [ModelScope](https://github.com/modelscope/modelscope)
- [ShenNong-TCM-LLM](https://github.com/michael-wzhu/ShenNong-TCM-LLM?tab=readme-ov-file)
- [HuatuoGPT-II](https://github.com/FreedomIntelligence/HuatuoGPT-II)
- [DISC-MedLLM](https://github.com/FudanDISC/DISC-MedLLM)
- [MLEC-QA](https://github.com/Judenpech/MLEC-QA)
- [CMB](https://github.com/FreedomIntelligence/CMB?tab=readme-ov-file)
- [ZY-BERT](https://github.com/Borororo/ZY-BERT)
- [COIG](https://github.com/BAAI-Zlab/COIG)
- [APE210k](https://github.com/Chenny0808/ape210k)
- [Evol-Instruction-66K](https://github.com/Continuum-Labs-HQ/EvolInstruct)
## ❔关于我们
本项目由齐鲁工业大学(山东省科学院)计算学部(国家超级计算济南中心)自然语言处理与认知计算团队、山东~~省~~中医药大学附属医院临床研究中心合作完成。
<div align="center">
<p>
<img src="assets/QLU-NLP-logo.png" width="500px"/>
</p>
</div>
<div align="center">
<p>
<img src="assets/超算logo.png" width="500px"/>
</p>
</div>
<p>
<div align="center">
<p>
<img src="assets/山中医logo.png" width="500px"/>
</p>
</div>
## ❕免责声明
- 本项目相关资源仅供学术研究之用。
- 扁仓中医大模型作为基于语言模型的智能助手,具有局限性,无法保证所有响应的准确性,其不能代替中医/西医进行医学诊断和给出医学建议。如有需要,请咨询专业医生或前往医院就诊。
- 由于医疗领域的数据不准确可能造成严重后果,我们强烈建议用户在处理生成的信息时要小心谨慎,并向专家寻求建议。
## 📖引用
```
@article{Wei2024BianCang,
title={BianCang: A Traditional Chinese Medicine Large Language Model},
author={Sibo, Wei and Xueping, Peng and Yi-fei, Wang and Jiasheng, Si and Weiyu, Zhang and Wenpeng, Lu and Xiaoming, Wu and Yinglong, Wang},
journal={arXiv preprint arXiv:2411.11027},
year={2024}
}
```

24
added_tokens.json Normal file
View File

@@ -0,0 +1,24 @@
{
"</tool_call>": 151658,
"<tool_call>": 151657,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

BIN
assets/BianCang-logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 197 KiB

BIN
assets/QLU-NLP-logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.6 MiB

BIN
assets/subjective.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

BIN
assets/webui.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 379 KiB

BIN
assets/山中医logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 KiB

BIN
assets/超算logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"_name_or_path": "/qlgy0912/models/Qwen2.5-7B",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 131072,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": 131072,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.40.0",
"use_cache": true,
"use_mrope": false,
"use_sliding_window": false,
"vocab_size": 152064
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework":"Pytorch","task":"text-generation"}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"bos_token_id": 151643,
"eos_token_id": 151645,
"max_new_tokens": 2048,
"pad_token_id": 151643,
"transformers_version": "4.40.0"
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:193bcf69d9b8b8601c8620d8593e209eaf54b39b2d32ddce8d19c144ceedfde3
size 4877660776

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5387b0934f88715789bbf44ff6ada205a17fc1347aaa00c3f9efa48cd7fff968
size 4932751008

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c5ad180fc4c4ecd4a90a89fca09fe6b99da29ac84e3999d6d7ef0b9d4ff953b6
size 4330865200

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:423163e8992696077d72fbb555b8639b0a233dfbf705038c73e3e32c12a20506
size 1089994880

View File

@@ -0,0 +1,346 @@
{
"metadata": {
"total_size": 15231233024
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00003-of-00004.safetensors"
}
}

3
rng_state_0.pth Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d79fa8a90ba1bf3da1ece5a434f09cfa69fd162316c5146bc778fc4c5a88b553
size 21687

3
rng_state_1.pth Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:093eabb9bed6410fcd98c6db3f679a066be4c2160a0af2bdc5654b1983a72148
size 21687

3
scheduler.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c744c8764d066369699bacacadba0fefd1cd74734988df7f684f11b3b431972
size 627

260
sft_args.json Normal file
View File

@@ -0,0 +1,260 @@
{
"model_type": "qwen2_5-7b",
"model_id_or_path": "/qlgy0912/models/Qwen2.5-7B",
"model_revision": "master",
"full_determinism": false,
"sft_type": "full",
"freeze_parameters": [],
"freeze_vit": false,
"freeze_parameters_ratio": 0.0,
"additional_trainable_parameters": [],
"tuner_backend": "swift",
"template_type": "default-generation",
"output_dir": "/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450",
"add_output_dir_suffix": true,
"ddp_backend": "nccl",
"ddp_find_unused_parameters": null,
"ddp_broadcast_buffers": null,
"ddp_timeout": 1800,
"seed": 42,
"resume_from_checkpoint": null,
"resume_only_model": false,
"ignore_data_skip": false,
"dtype": "bf16",
"packing": false,
"train_backend": "transformers",
"tp": 1,
"pp": 1,
"min_lr": null,
"sequence_parallel": false,
"model_kwargs": {},
"loss_name": null,
"dataset": [
"/qlgy0912/pretrain_data/APE_210K.jsonl",
"/qlgy0912/pretrain_data/ChatMed_TCM.jsonl",
"/qlgy0912/pretrain_data/CMB_Train.jsonl",
"/qlgy0912/pretrain_data/COIG_CQIA.jsonl",
"/qlgy0912/pretrain_data/Encyclopedia.jsonl",
"/qlgy0912/pretrain_data/Evol_Instruction_66K.jsonl",
"/qlgy0912/pretrain_data/Literature.jsonl",
"/qlgy0912/pretrain_data/MedicalBooks.jsonl",
"/qlgy0912/pretrain_data/Medical_Records.jsonl",
"/qlgy0912/pretrain_data/Pharmacopoeia.jsonl",
"/qlgy0912/pretrain_data/TCM_Synd_Diff.jsonl",
"/qlgy0912/pretrain_data/TCM_Synd_Know.jsonl"
],
"val_dataset": [],
"dataset_seed": 42,
"dataset_test_ratio": 0.05,
"use_loss_scale": false,
"loss_scale_config_path": "/qlgy0912/swift_repo/swift-2.4.2/swift/llm/agent/default_loss_scale_config.json",
"system": null,
"tools_prompt": "react_en",
"max_length": 4096,
"truncation_strategy": "delete",
"check_dataset_strategy": "warning",
"streaming": false,
"streaming_val_size": 0,
"streaming_buffer_size": 16384,
"model_name": [
null,
null
],
"model_author": [
null,
null
],
"quant_method": null,
"quantization_bit": 0,
"hqq_axis": 0,
"hqq_dynamic_config_path": null,
"bnb_4bit_comp_dtype": "bf16",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"bnb_4bit_quant_storage": null,
"rescale_image": -1,
"target_modules": [
"q_proj",
"k_proj",
"v_proj"
],
"target_regex": null,
"modules_to_save": [],
"lora_rank": 8,
"lora_alpha": 32,
"lora_dropout": 0.05,
"lora_bias_trainable": "none",
"lora_dtype": "AUTO",
"lora_lr_ratio": null,
"use_rslora": false,
"use_dora": false,
"init_lora_weights": "true",
"fourier_n_frequency": 2000,
"fourier_scaling": 300.0,
"rope_scaling": null,
"boft_block_size": 4,
"boft_block_num": 0,
"boft_n_butterfly_factor": 1,
"boft_dropout": 0.0,
"vera_rank": 256,
"vera_projection_prng_key": 0,
"vera_dropout": 0.0,
"vera_d_initial": 0.1,
"adapter_act": "gelu",
"adapter_length": 128,
"use_galore": false,
"galore_target_modules": null,
"galore_rank": 128,
"galore_update_proj_gap": 50,
"galore_scale": 1.0,
"galore_proj_type": "std",
"galore_optim_per_parameter": false,
"galore_with_embedding": false,
"galore_quantization": false,
"galore_proj_quant": false,
"galore_proj_bits": 4,
"galore_proj_group_size": 256,
"galore_cos_threshold": 0.4,
"galore_gamma_proj": 2,
"galore_queue_size": 5,
"adalora_target_r": 8,
"adalora_init_r": 12,
"adalora_tinit": 0,
"adalora_tfinal": 0,
"adalora_deltaT": 1,
"adalora_beta1": 0.85,
"adalora_beta2": 0.85,
"adalora_orth_reg_weight": 0.5,
"ia3_feedforward_modules": [],
"llamapro_num_new_blocks": 4,
"llamapro_num_groups": null,
"neftune_noise_alpha": null,
"neftune_backend": "transformers",
"lisa_activated_layers": 0,
"lisa_step_interval": 20,
"reft_layer_key": null,
"reft_layers": null,
"reft_rank": 4,
"reft_intervention_type": "LoreftIntervention",
"reft_args": null,
"use_liger": false,
"gradient_checkpointing": true,
"deepspeed": null,
"batch_size": 1,
"eval_batch_size": 1,
"auto_find_batch_size": false,
"num_train_epochs": 2,
"max_steps": -1,
"optim": "adamw_torch",
"adam_beta1": 0.9,
"adam_beta2": 0.95,
"adam_epsilon": 1e-08,
"learning_rate": 1e-05,
"weight_decay": 0.1,
"gradient_accumulation_steps": 8,
"max_grad_norm": 0.5,
"predict_with_generate": false,
"lr_scheduler_type": "cosine",
"lr_scheduler_kwargs": {},
"warmup_ratio": 0.05,
"warmup_steps": 0,
"eval_steps": 2000,
"save_steps": 2000,
"save_only_model": false,
"save_total_limit": 5,
"logging_steps": 20,
"acc_steps": 1,
"dataloader_num_workers": 1,
"dataloader_pin_memory": true,
"dataloader_drop_last": false,
"push_to_hub": false,
"hub_model_id": null,
"hub_token": null,
"hub_private_repo": false,
"hub_strategy": "every_save",
"test_oom_error": false,
"disable_tqdm": false,
"lazy_tokenize": false,
"preprocess_num_proc": 1,
"use_flash_attn": null,
"ignore_args_error": false,
"check_model_is_latest": true,
"logging_dir": "/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450/runs",
"report_to": [
"tensorboard"
],
"acc_strategy": "token",
"save_on_each_node": false,
"evaluation_strategy": "steps",
"save_strategy": "steps",
"save_safetensors": true,
"gpu_memory_fraction": null,
"include_num_input_tokens_seen": false,
"local_repo_path": null,
"custom_register_path": null,
"custom_dataset_info": null,
"device_map_config": null,
"device_max_memory": [],
"max_new_tokens": 2048,
"do_sample": null,
"temperature": null,
"top_k": null,
"top_p": null,
"repetition_penalty": null,
"num_beams": 1,
"fsdp": "",
"fsdp_config": null,
"sequence_parallel_size": 1,
"model_layer_cls_name": null,
"metric_warmup_step": 0,
"fsdp_num": 1,
"per_device_train_batch_size": null,
"per_device_eval_batch_size": null,
"eval_strategy": null,
"self_cognition_sample": 0,
"train_dataset_mix_ratio": 0.0,
"train_dataset_mix_ds": [
"ms-bench"
],
"train_dataset_sample": -1,
"val_dataset_sample": null,
"safe_serialization": null,
"only_save_model": null,
"neftune_alpha": null,
"deepspeed_config_path": null,
"model_cache_dir": null,
"lora_dropout_p": null,
"lora_target_modules": [],
"lora_target_regex": null,
"lora_modules_to_save": [],
"boft_target_modules": [],
"boft_modules_to_save": [],
"vera_target_modules": [],
"vera_modules_to_save": [],
"ia3_target_modules": [],
"ia3_modules_to_save": [],
"custom_train_dataset_path": [],
"custom_val_dataset_path": [],
"device_map_config_path": null,
"push_hub_strategy": null,
"use_self_cognition": false,
"is_multimodal": false,
"is_vision": false,
"lora_use_embedding": false,
"lora_use_all": false,
"lora_m2s_use_embedding": false,
"lora_m2s_use_ln": false,
"torch_dtype": "torch.bfloat16",
"fp16": false,
"bf16": true,
"rank": 0,
"local_rank": 0,
"world_size": 2,
"local_world_size": 2,
"bnb_4bit_compute_dtype": "torch.bfloat16",
"load_in_4bit": false,
"load_in_8bit": false,
"train_sampler_random": true,
"train_type": "sft",
"training_args": "Seq2SeqTrainingArguments(output_dir='/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450', overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, evaluation_strategy=<IntervalStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=1, per_device_eval_batch_size=1, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=8, eval_accumulation_steps=None, eval_delay=0, learning_rate=1e-05, weight_decay=0.1, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, max_grad_norm=0.5, num_train_epochs=2, max_steps=-1, lr_scheduler_type=<SchedulerType.COSINE: 'cosine'>, lr_scheduler_kwargs={}, warmup_ratio=0.05, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir='/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450/runs', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=True, logging_steps=20, logging_nan_inf_filter=True, save_strategy=<IntervalStrategy.STEPS: 'steps'>, save_steps=2000, save_total_limit=5, save_safetensors=True, save_on_each_node=False, save_only_model=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=42, jit_mode_eval=False, use_ipex=False, bf16=True, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=0, ddp_backend='nccl', tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=2000, dataloader_num_workers=1, dataloader_prefetch_factor=None, past_index=-1, run_name='/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450', disable_tqdm=False, remove_unused_columns=False, label_names=None, load_best_model_at_end=False, metric_for_best_model='loss', greater_is_better=False, ignore_data_skip=False, fsdp=[], fsdp_min_num_params=0, fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=AcceleratorConfig(split_batches=False, dispatch_batches=False, even_batches=True, use_seedable_sampler=True, gradient_accumulation_kwargs=None), deepspeed=None, label_smoothing_factor=0.0, optim=<OptimizerNames.ADAMW_TORCH: 'adamw_torch'>, optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=['tensorboard'], ddp_find_unused_parameters=False, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=False, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy=<HubStrategy.EVERY_SAVE: 'every_save'>, hub_token=None, hub_private_repo=False, hub_always_push=False, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=1800, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, dispatch_batches=None, split_batches=None, include_tokens_per_second=False, include_num_input_tokens_seen=False, neftune_noise_alpha=None, optim_target_modules=None, sortish_sampler=False, predict_with_generate=False, generation_max_length=None, generation_num_beams=None, generation_config=GenerationConfig {\n \"bos_token_id\": 151643,\n \"eos_token_id\": 151645,\n \"max_new_tokens\": 2048,\n \"pad_token_id\": 151643\n}\n, acc_strategy='token', loss_name=None, additional_saved_files=[], train_sampler_random=True, metric_warmup_step=0, train_dataset_sample=-1)"
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

303283
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

207
tokenizer_config.json Normal file
View File

@@ -0,0 +1,207 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

89832
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:44ab2f78618f8224082dd85cb346cd7efc8ddb9bcbedb24ad4adc1aebe5777df
size 6459

1
vocab.json Normal file

File diff suppressed because one or more lines are too long