初始化项目，由ModelHub XC社区提供模型

Model: QLUNLP/BianCang-Qwen2.5-7B Source: Original Platform
2026-05-07 00:57:34 +08:00
commit 8e20be4f2e
28 changed files with 545965 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,38 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bin.* filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zstandard filter=lfs diff=lfs merge=lfs -text
 *.tfevents* filter=lfs diff=lfs merge=lfs -text
 *.db* filter=lfs diff=lfs merge=lfs -text
 *.ark* filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.gguf* filter=lfs diff=lfs merge=lfs -text
 *.ggml filter=lfs diff=lfs merge=lfs -text
 *.llamafile* filter=lfs diff=lfs merge=lfs -text
 *.pt2 filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,494 @@
 # 扁仓中医大模型（BianCang: A Traditional Chinese Medicine Large Language Model）
 <div align="center">
    <p>
    <img src="assets/BianCang-logo.png" width="500px"/>
    </p>
    </div>
 ## 💡介绍
 你好，欢迎来到扁仓中医大模型的开源仓库。
 为推动大语言模型在传统中医领域的落地应用，辅助医生进行疾病诊断，辅助患者进行自我评估，推动大模型赋能传统中医，我们在该仓库推出了**扁仓**系列中医大模型。扁仓是古代名医扁鹊、仓公的并称，泛指名医。我们期待扁仓中医大模型能够在延续中医传承和提升我国人民医疗健康水平方面做出一定的贡献。
 扁仓以Qwen2/2.5作为基座，采用先注入领域知识再进行知识激活和对齐的两阶段训练方法而得到。扁仓在中医辨病辨证等中医特色任务上取得了最先进的性能，并且在各种医学执照考试中表现优异。
 我们在该仓库中开源以下资源：
 - 扁仓基座模型权重：包括BianCang-Qwen2-7B、BianCang-Qwen2.5-7B、BianCang-Qwen2.5-14B。
 - 扁仓指令精调模型权重：包括BianCang-Qwen2-7B-Instruct、BianCang-Qwen2.5-7B-Instruct、BianCang-Qwen2.5-14B-Instruct。
 更多信息请查看[GitHub]([QLU-NLP/BianCang](https://github.com/QLU-NLP/BianCang))
 ## 🚀推理
 ### 使用SWIFT
 #### 环境安装
 在[Release v2.4.2 · modelscope/ms-swift](https://github.com/modelscope/ms-swift/releases/tag/v2.4.2)处下载SWIFT源码，切换到对应目录，然后执行安装命令：
 ```shell
 cd swift
 pip install -e .
 ```
 你可以根据自己的GPU驱动版本去选择合适的torch版本进行替换，SWIFT至少需要torch >= 1.13，推荐torch >= 2.0.0。
 注意：由于我们进行SFT训练时使用的Chat Template为*qwen*，因此如果你使用的SWIFT版本大于我们提供的版本，可能会遇到Qwen2.5 Chat Template不对应的问题，请手动将Chat Template指定为*qwen*而不是*qwen2_5*。具体原因参考：[fix qwen2.5 template by Jintao-Huang · Pull Request #2081 · modelscope/ms-swift](https://github.com/modelscope/ms-swift/pull/2081)
 #### 推理方式1-代码推理
 ```python
 import os
 os.environ['CUDA_VISIBLE_DEVICES'] = '0'
 from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType
 )
 from swift.utils import seed_everything
 model_type = ModelType.qwen2_5_7b_instruct
 template_type = 'qwen'
 model_id_or_path = 'QLU-NLP/BianCang-Qwen2.5-7B-Instruct'
 model, tokenizer = get_model_tokenizer(model_type, model_id_or_path=model_id_or_path, model_kwargs={'device_map': 'auto'})
 model.generation_config.max_new_tokens = 256
 template = get_template(template_type, tokenizer)
 seed_everything(42)
 query = '你好，你是谁？'
 response, history = inference(model, template, query)
 print(f'query: {query}')
 print(f'response: {response}')
 query = '下面是一名患者的基本情况。年龄：78岁，性别：女。主 诉：活动后胸痛一周。现病史：患者一周前活动后出现胸口隐隐作痛，如针刺样乏力气短，活动后汗出，偏头痛。中医望闻切诊：表情自然，面色红润，形体正常,语气清,气息平；无异常气味,舌暗红，苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况，判断该患者的主要中医疾病和中医证型，并给出中医辨病辨证的依据。'
 response, history = inference(model, template, query, history)
 print(f'query: {query}')
 print(f'response: {response}')
 print(f'history: {history}')
 ```
 输出：
 ```
 query: 你好，你是谁？
 response: 你好！我是一个名为扁仓中医大模型的人工智能，由齐鲁工业大学（山东省科学院）计算机科学与技术学部（国家超级计算济南中心）自然语言处理与认知计算团队研发。我被设计成能够理解和生成自然语言文本，以便与人类进行中医辩证、中医处方推荐、中医知识问答、中医问题咨询等方面的对话交流，辅助人们完成疾病诊断相关的任务。请问有什么我可以帮助您的吗？
 query: 下面是一名患者的基本情况。年龄：78岁，性别：女。主 诉：活动后胸痛一周。现病史：患者一周前活动后出现胸口隐隐作痛，如针刺样乏力气短，活动后汗出，偏头痛。中医望闻切诊：表情自然，面色红润，形体正常,语气清,气息平；无异常气味,舌暗红，苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况，判断该患者的主要中医疾病和中医证型，并给出中医辨病辨证的依据。
 response: 根据中医的诊断方法，患者患有胸痹心痛，中医证型属于气虚血瘀证。综合脉证，四诊合参，本病当属祖国医学“胸痹心痛病”范畴，证属“气虚血瘀”。患者素体虚弱，久病伤正，伤及心气，心气衰微，机能不健，致阴邪易于上乘阳位，况心脉为宗气之所，百脉朝会之枢，宗气的鼓动形成了心气推动血液运行全身，心气不足则血行无力瘀滞，发为本病，舌脉俱为佐证。
 history: [['你好，你是谁？', '你好！我是一个名为扁仓中医大模型的人工智能，由齐鲁工业大学（山东省科学院）计算机科学与技术学部（国家超级计算济南中心）自然语言处理与认知计算团队研发。我被设计成能够理解和生成自然语言文本，以便与人类进行中医辩证、中医处方推荐、中医知识问答、中医问题咨询等方面的对话交流，辅助人们完成疾病诊断相关的任务。请问有什么我可以帮助您的吗？'], ['下面是一名患者的基本情况。年龄：78岁，性别：女。主 诉：活动后胸痛一周。现病史：患者一周前活动后出现胸口隐隐作痛，如针刺样乏力气短，活动后汗出，偏头痛。中医望闻切诊：表情自然，面色红润，形体正常,语气清,气息平；无异常气味,舌暗红，苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况，判断该患者的主要中医疾病和中医证型，并给出中医辨病辨证的依据。', '根据中医的诊断方法，患者患有胸痹心痛，中医证型属于气虚血瘀证。综合脉证，四诊合参，本病当属祖国医学“胸痹心痛病”范畴，证属“气虚血瘀”。患者素体虚弱，久病伤正，伤及心气，心气衰微，机能不健，致阴邪易于上乘阳位，况心脉为宗气之所，百脉朝会之枢，宗气的鼓动形成了心气推动血液运行全身，心气不足则血行无力瘀滞，发为本病，舌脉俱为佐证。']]
 ```
 #### 推理方式2-部署API
 使用以下命令部署API：
 ```shell
 CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen2_5-7b-instruct --model_id_or_path QLU-NLP/BianCang-Qwen2.5-7B-Instruct --port 8090 --template_type qwen
 ```
 使用curl进行测试：
 ```shell
 curl http://localhost:8090/v1/chat/completions \
 -H "Content-Type: application/json" \
 -d '{
 "model": "qwen2_5-7b-instruct",
 "messages": [{"role": "user", "content": "你好，你是谁？"}],
 "max_tokens": 256,
 "temperature": 0.3
 }'
 ```
 响应如下：
 ```json
 {"model":"qwen2_5-7b-instruct",
 "choices":[{"index":0,"message":{"role":"assistant","content":"你好！我是一个名为扁仓中医大模型的人工智能，由齐鲁工业大学（山东省科学院）计算机科学与技术学部（国家超级计算济南中心）自然语言处理与认知计算团队研发。我被设计成能够理解和生成自然语言文本，以便与人类进行中医辩证、中医处方推荐、中医知识问答、中医问题咨询等方面的对话交流，辅助人们完成疾病诊断相关的任务。请问有什么我可以帮助您的吗？",
 "tool_calls":null},"finish_reason":null,"logprobs":null}],
 "usage":{"prompt_tokens":24,"completion_tokens":92,"total_tokens":116},
 "id":"chatcmpl-6b4a02dee57a42238b27b5c40085df16",
 "object":"chat.completion","created":1730209011}
 ```
 使用代码进行测试：
 ```python
 from swift.llm import get_model_list_client, XRequestConfig, inference_client
 model_list = get_model_list_client(port=8090)
 model_type = model_list.data[0].id
 print(f'model_type: {model_type}')
 query = "你好，你是谁？"
 request_config = XRequestConfig(seed=42)
 resp = inference_client(model_type, query, request_config=request_config, port=8090)
 response = resp.choices[0].message.content
 print(f'query: {query}')
 print(f'response: {response}')
 history = [(query, response)]
 query = '下面是一名患者的基本情况。年龄：78岁，性别：女。主 诉：活动后胸痛一周。现病史：患者一周前活动后出现胸口隐隐作痛，如针刺样乏力气短，活动后汗出，偏头痛。中医望闻切诊：表情自然，面色红润，形体正常,语气清,气息平；无异常气味,舌暗红，苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况，判断该患者的主要中医疾病和中医证型，并给出中医辨病辨证的依据。'
 request_config = XRequestConfig(stream=True, seed=42)
 stream_resp = inference_client(model_type, query, history, request_config=request_config, port=8090)
 print(f'query: {query}')
 print('response: ', end='')
 for chunk in stream_resp:
    print(chunk.choices[0].delta.content, end='', flush=True)
 print()
 ```
 输出如下：
 ```
 model_type: qwen2_5-7b-instruct
 query: 你好，你是谁？
 response: 你好！我是一个名为扁仓中医大模型的人工智能，由齐鲁工业大学（山东省科学院）计算机科学与技术学部（国家超级计算济南中心）自然语言处理与认知计算团队研发。我被设计成能够理解和生成自然语言文本，以便与人类进行中医辩证、中医处方推荐、中医知识问答、中医问题咨询等方面的对话交流，辅助人们完成疾病诊断相关的任务。请问有什么我可以帮助您的吗？
 query: 下面是一名患者的基本情况。年龄：78岁，性别：女。主 诉：活动后胸痛一周。现病史：患者一周前活动后出现胸口隐隐作痛，如针刺样乏力气短，活动后汗出，偏头痛。中医望闻切诊：表情自然，面色红润，形体正常,语气清,气息平；无异常气味,舌暗红，苔少。请你根据上述患者的主诉、病史和中医望闻切诊情况，判断该患者的主要中医疾病和中医证型，并给出中医辨病辨证的依据。
 response: 根据中医的诊断方法，患者患有胸痹心痛，中医证型属于气虚血瘀证。综合脉证，四诊合参，本病当属祖国医学“胸痹心痛病”范畴，证属“气虚血瘀”。患者素体虚弱，久病伤正，伤及心气，心气衰微，机能不健，致阴邪易于上乘阳位，况心脉为宗气之所，百脉朝会之枢，宗气的鼓动形成了心气推动血液运行全身，心气不足则血行无力瘀滞，发为本病，舌脉俱为佐证。
 ```
 ### 使用Transformers
 你也可以使用transformers包进行推理：
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "QLU-NLP/BianCang-Qwen2.5-7B-Instruct"
 model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 prompt = "你好，你是谁？"
 messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
 ]
 text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
 )
 model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=256
 )
 generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
 ]
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
 ```
 ### 使用Web UI
 我们提供了一个简单的演示Web UI。
 安装streamlit：
 ```shell
 pip install streamlit
 ```
 使用SWIFT部署API：
 ```shell
 CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen2_5-7b-instruct --model_id_or_path QLU-NLP/BianCang-Qwen2.5-7B-Instruct --port 8090 --template_type qwen
 ```
 启动streamlit：
 ```shell
 streamlit run web_ui.py
 ```
 ![image-20241029215908096](assets/webui.png)
 ## 🥇中医能力测试
 <table border="1" cellpadding="5" cellspacing="0">
  <thead>
    <tr>
      <th rowspan="2">Model</th>
      <th colspan="4">TCM Syndrome Differentiation</th>
      <th colspan="4">TCM Disease Diagnosis</th>
      <th colspan="4">TCM Exam</th>
    </tr>
    <tr>
      <th colspan="2">TCMSD <br>Acc.(%)</th>
      <th colspan="2">TCMSD-BC<br> Acc.(%)</th>
      <th colspan="2">TCMDD<br> Acc.(%)</th>
      <th colspan="2">TCMDD-BC<br> Acc.(%)</th>
      <th colspan="2">MLEC-TCM<br> Acc.(%)</th>
      <th colspan="2">MLEC-CWM<br> Acc.(%)</th>
    </tr>
    <tr>
        <th></th><th>DI</th><th>CoT</th><th>DI</th><th>CoT</th><th>DI</th><th>CoT</th><th>DI</th><th>CoT</th><th>ZS</th><th>FS</th><th>ZS</th><th>FS</th>
    </tr>
  </thead>
  <tbody align="center" valign="center">
    <tr><td>GPT-4</td><td>24.53</td><td>45.21</td><td>16.67</td><td>70.73</td><td>27.83</td><td>54.54</td><td>41.80</td><td>68.33</td><td>74.70</td><td>76.35</td><td>76.26</td><td>76.37</td></tr>
      <tr><td>DeepSeek-V3</td><td>34.62</td><td>40.74</td><td>24.53</td><td>72.00</td><td>46.97</td><td>59.08</td><td>82.67</td><td>72.93</td><td>84.97</td><td>88.56</td><td>85.05</td><td>87.81</td></tr>
 <tr><td>DeepSeek-R1</td><td>37.17</td><td>55.67</td><td>25.53</td><td>76.07</td><td>50.66</td><td>80.75</td><td>79.27</td><td>94.53</td><td>92.68</td><td>93.10</td><td>90.92</td><td>90.77</td></tr>
    <tr><td>Qwen2-7B</td><td>31.74</td><td>27.18</td><td>32.73</td><td>28.40</td><td>41.60</td><td>54.59</td><td>74.87</td><td>77.93</td><td>86.01</td><td>89.18</td><td>84.45</td><td>87.89</td></tr>
    <tr><td>Qwen2-7B-Instruct</td><td>25.70</td><td>33.41</td><td>14.27</td><td>57.00</td><td>32.87</td><td>52.92</td><td>60.40</td><td>60.13</td><td>83.61</td><td>84.22</td><td>79.89</td><td>82.99</td></tr>
    <tr><td>Qwen2.5-7B</td><td>30.44</td><td>21.29</td><td>17.87</td><td>35.73</td><td>23.71</td><td>43.88</td><td>63.87</td><td>71.27</td><td>83.32</td><td>85.52</td><td>82.02</td><td>84.04</td></tr>
    <tr><td>Qwen2.5-7B-Instruct</td><td>24.30</td><td>32.19</td><td>9.93</td><td>57.07</td><td>36.29</td><td>51.51</td><td>62.93</td><td>55.53</td><td>78.72</td><td>79.88</td><td>77.27</td><td>78.43</td></tr>
    <tr><td>Qwen2.5-14B</td><td>35.62</td><td>25.21</td><td>33.93</td><td>30.13</td><td>24.33</td><td>36.64</td><td>33.33</td><td>32.80</td><td>86.59</td><td>89.93</td><td>87.10</td><td>90.06</td></tr>
    <tr><td>Qwen2.5-14B-Instruct</td><td>25.94</td><td>35.03</td><td>16.07</td><td>60.00</td><td>38.30</td><td>49.31</td><td>46.27</td><td>53.67</td><td>82.25</td><td>84.81</td><td>81.79</td><td>85.68</td></tr>
    <tr><td>BianCang-Qwen2-7B</td><td>42.14</td><td>30.30</td><td>57.80</td><td>48.00</td><td>43.73</td><td>54.67</td><td>74.73</td><td>80.67</td><td>90.86</td><td>91.87</td><td>89.08</td><td>90.36</td></tr>
    <tr><td>BianCang-Qwen2-7B-Instruct</td><td>68.88</td><td>75.96</td><td>57.33</td><td>75.40</td><td>64.42</td><td>77.71</td><td><b>89.07</b></td><td>85.67</td><td><b>92.39</b></td><td><b>92.39</b></td><td>91.14</td><td>91.48</td></tr>
    <tr><td>BianCang-Qwen2.5-7B</td><td>46.57</td><td>26.72</td><td>52.93</td><td>45.47</td><td>49.80</td><td>53.15</td><td>68.13</td><td>61.73</td><td>86.46</td><td>86.30</td><td>83.93</td><td>85.35</td></tr>
    <tr><td>BianCang-Qwen2.5-7B-Instruct</td><td>78.90</td><td><b>82.10</b></td><td><b>66.73</b></td><td><b>77.73</b></td><td>73.73</td><td><b>82.65</b></td><td>87.87</td><td><b>89.40</b></td><td>90.22</td><td>90.57</td><td>90.32</td><td>90.62</td></tr>
    <tr><td>BianCang-Qwen2.5-14B</td><td>43.77</td><td>33.96</td><td>61.93</td><td>53.47</td><td>66.61</td><td>60.39</td><td>82.93</td><td>77.07</td><td>89.28</td><td>90.86</td><td>89.42</td><td>90.58</td></tr>
    <tr><td>BianCang-Qwen2.5-14B-Instruct</td><td><b>79.38</b></td><td>75.54</td><td>62.27</td><td>70.73</td><td><b>77.63</b></td><td>82.05</td><td>86.33</td><td>88.73</td><td>92.29</td><td>92.29</td><td><b>92.75</b></td><td><b>92.86</b></td></tr>
  </tbody>
 </table>
 <br>
 <table border="1">
  <tr>
    <th>Model</th>
    <th>CMB Acc.(%)</th>
    <th colspan="2">MLEC-Clinic <br>Acc.(%)</th>
    <th colspan="2">MLEC-PublicHealth<br> Acc.(%)</th>
    <th colspan="2">MLEC-Stomatology<br> Acc.(%)</th>
  </tr>
  <tr>
    <th></th>
    <th>ZS/FS</th>
    <th>ZS</th>
    <th>FS</th>
    <th>ZS</th>
    <th>FS</th>
    <th>ZS</th>
    <th>FS</th>
  </tr>
  <tr>
    <td>GPT-4</td>
    <td>59.46*</td>
    <td>82.63</td>
    <td>82.69</td>
    <td>81.55</td>
    <td>82.58</td>
    <td>72.97</td>
    <td>75.43</td>
  </tr>
 <tr>
    <td>DeepSeek-V3</td>
    <td>82.33</td>
    <td>86.83</td>
    <td>89.41</td>
    <td>85.38</td>
    <td>87.59</td>
    <td>79.09</td>
    <td>81.97</td>
  </tr>
   <tr>
    <td>DeepSeek-R1</td>
    <td>86.38</td>
    <td>92.51</td>
    <td>92.36</td>
    <td>91.42</td>
    <td>90.40</td>
    <td>87.03</td>
    <td>86.16</td>
  </tr>
  <tr>
    <td>Qwen2-7B</td>
    <td>81.63</td>
    <td>87.63</td>
    <td>90.63</td>
    <td>82.63</td>
    <td>86.79</td>
    <td>80.34</td>
    <td>84.65</td>
  </tr>
  <tr>
    <td>Qwen2-7B-Instruct</td>
    <td>83.45</td>
    <td>85.16</td>
    <td>83.35</td>
    <td>81.61</td>
    <td>81.07</td>
    <td>76.29</td>
    <td>75.88</td>
  </tr>
  <tr>
    <td>Qwen2.5-7B</td>
    <td>79.60</td>
    <td>86.65</td>
    <td>88.55</td>
    <td>83.39</td>
    <td>85.17</td>
    <td>78.03</td>
    <td>80.79</td>
  </tr>
  <tr>
    <td>Qwen2.5-7B-Instruct</td>
    <td>79.51</td>
    <td>82.81</td>
    <td>83.73</td>
    <td>80.96</td>
    <td>80.85</td>
    <td>72.93</td>
    <td>74.40</td>
  </tr>
  <tr>
    <td>Qwen2.5-14B</td>
    <td>84.07</td>
    <td>90.40</td>
    <td>93.13</td>
    <td>86.46</td>
    <td>89.54</td>
    <td>84.31</td>
    <td>88.20</td>
  </tr>
  <tr>
    <td>Qwen2.5-14B-Instruct</td>
    <td>83.69</td>
    <td>86.47</td>
    <td>88.02</td>
    <td>83.17</td>
    <td>86.14</td>
    <td>78.94</td>
    <td>82.57</td>
  </tr>
  <tr>
    <td>BianCang-Qwen2-7B (Ours)</td>
    <td>83.27</td>
    <td>91.88</td>
    <td>93.31</td>
    <td>88.57</td>
    <td>90.72</td>
    <td>85.29</td>
    <td>88.47</td>
  </tr>
  <tr>
    <td>BianCang-Qwen2-7B-Instruct (Ours)</td>
    <td>84.08</td>
    <td>94.35</td>
    <td>94.35</td>
    <td>91.37</td>
    <td><b>91.64</b></td>
    <td>89.19</td>
    <td>90.02</td>
  </tr>
  <tr>
    <td>BianCang-Qwen2.5-7B (Ours)</td>
    <td>80.13</td>
    <td>90.43</td>
    <td>91.32</td>
    <td>85.65</td>
    <td>87.22</td>
    <td>82.19</td>
    <td>82.65</td>
  </tr>
  <tr>
    <td>BianCang-Qwen2.5-7B-Instruct (Ours)</td>
    <td>80.71</td>
    <td>93.40</td>
    <td>93.43</td>
    <td>89.91</td>
    <td>89.91</td>
    <td>86.43</td>
    <td>86.77</td>
  </tr>
  <tr>
    <td>BianCang-Qwen2.5-14B (Ours)</td>
    <td><b>84.34</b></td>
    <td>91.70</td>
    <td>93.37</td>
    <td>87.92</td>
    <td>89.97</td>
    <td>86.16</td>
    <td>87.94</td>
  </tr>
  <tr>
    <td>BianCang-Qwen2.5-14B-Instruct (Ours)</td>
    <td>83.80</td>
    <td><b>94.74</b></td>
    <td><b>94.97</b></td>
    <td><b>91.86</b></td>
    <td>91.53</td>
    <td><b>90.43</b></td>
    <td><b>90.51</b></td>
  </tr>
 </table>
 更多测评结果请关注我们的技术报告。
 ## 🧡致谢
 本项目基于开源项目进行开发，在此对相关项目和研究开发人员表示感谢。
 - [Qwen2](https://github.com/vitanova/Qwen2)
 - [Qwen2.5](https://github.com/QwenLM/Qwen2.5)
 - [SWIFT](https://github.com/modelscope/ms-swift)
 - [ModelScope](https://github.com/modelscope/modelscope)
 - [ShenNong-TCM-LLM](https://github.com/michael-wzhu/ShenNong-TCM-LLM?tab=readme-ov-file)
 - [HuatuoGPT-II](https://github.com/FreedomIntelligence/HuatuoGPT-II)
 - [DISC-MedLLM](https://github.com/FudanDISC/DISC-MedLLM)
 - [MLEC-QA](https://github.com/Judenpech/MLEC-QA)
 - [CMB](https://github.com/FreedomIntelligence/CMB?tab=readme-ov-file)
 - [ZY-BERT](https://github.com/Borororo/ZY-BERT)
 - [COIG](https://github.com/BAAI-Zlab/COIG)
 - [APE210k](https://github.com/Chenny0808/ape210k)
 - [Evol-Instruction-66K](https://github.com/Continuum-Labs-HQ/EvolInstruct)
 ## ❔关于我们
 本项目由齐鲁工业大学（山东省科学院）计算学部（国家超级计算济南中心）自然语言处理与认知计算团队、山东~~省~~中医药大学附属医院临床研究中心合作完成。
 <div align="center">
 <p>
    <img src="assets/QLU-NLP-logo.png" width="500px"/>
    </p>
 </div>
 <div align="center">
    <p>
        <img src="assets/超算logo.png" width="500px"/>
    </p>
 </div>
 <p>
 <div align="center">
 <p>
    <img src="assets/山中医logo.png" width="500px"/> 
    </p>   
 </div>
 ## ❕免责声明
 - 本项目相关资源仅供学术研究之用。
 - 扁仓中医大模型作为基于语言模型的智能助手，具有局限性，无法保证所有响应的准确性，其不能代替中医/西医进行医学诊断和给出医学建议。如有需要，请咨询专业医生或前往医院就诊。
 - 由于医疗领域的数据不准确可能造成严重后果，我们强烈建议用户在处理生成的信息时要小心谨慎，并向专家寻求建议。
 ## 📖引用
 ```
@article{Wei2024BianCang,
  title={BianCang: A Traditional Chinese Medicine Large Language Model},
  author={Sibo, Wei and  Xueping, Peng and Yi-fei, Wang and Jiasheng, Si and Weiyu, Zhang and Wenpeng, Lu and Xiaoming, Wu and Yinglong, Wang},
  journal={arXiv preprint arXiv:2411.11027},
  year={2024}
 }
 ```
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,24 @@
 {
  "</tool_call>": 151658,
  "<tool_call>": 151657,
  "<|box_end|>": 151649,
  "<|box_start|>": 151648,
  "<|endoftext|>": 151643,
  "<|file_sep|>": 151664,
  "<|fim_middle|>": 151660,
  "<|fim_pad|>": 151662,
  "<|fim_prefix|>": 151659,
  "<|fim_suffix|>": 151661,
  "<|im_end|>": 151645,
  "<|im_start|>": 151644,
  "<|image_pad|>": 151655,
  "<|object_ref_end|>": 151647,
  "<|object_ref_start|>": 151646,
  "<|quad_end|>": 151651,
  "<|quad_start|>": 151650,
  "<|repo_name|>": 151663,
  "<|video_pad|>": 151656,
  "<|vision_end|>": 151653,
  "<|vision_pad|>": 151654,
  "<|vision_start|>": 151652
 }
--- a/assets/BianCang-logo.png
+++ b/assets/BianCang-logo.png
--- a/assets/QLU-NLP-logo.png
+++ b/assets/QLU-NLP-logo.png
--- a/assets/subjective.png
+++ b/assets/subjective.png
--- a/assets/webui.png
+++ b/assets/webui.png
--- a/assets/山中医logo.png
+++ b/assets/山中医logo.png
--- a/assets/超算logo.png
+++ b/assets/超算logo.png
--- a/config.json
+++ b/config.json
@@ -0,0 +1,29 @@
 {
  "_name_or_path": "/qlgy0912/models/Qwen2.5-7B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 131072,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.0",
  "use_cache": true,
  "use_mrope": false,
  "use_sliding_window": false,
  "vocab_size": 152064
 }
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
 {"framework":"Pytorch","task":"text-generation"}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,7 @@
 {
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "max_new_tokens": 2048,
  "pad_token_id": 151643,
  "transformers_version": "4.40.0"
 }
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00004.safetensors
+++ b/model-00001-of-00004.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:193bcf69d9b8b8601c8620d8593e209eaf54b39b2d32ddce8d19c144ceedfde3
 size 4877660776
--- a/model-00002-of-00004.safetensors
+++ b/model-00002-of-00004.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5387b0934f88715789bbf44ff6ada205a17fc1347aaa00c3f9efa48cd7fff968
 size 4932751008
--- a/model-00003-of-00004.safetensors
+++ b/model-00003-of-00004.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:c5ad180fc4c4ecd4a90a89fca09fe6b99da29ac84e3999d6d7ef0b9d4ff953b6
 size 4330865200
--- a/model-00004-of-00004.safetensors
+++ b/model-00004-of-00004.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:423163e8992696077d72fbb555b8639b0a233dfbf705038c73e3e32c12a20506
 size 1089994880
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,346 @@
 {
  "metadata": {
    "total_size": 15231233024
  },
  "weight_map": {
    "lm_head.weight": "model-00004-of-00004.safetensors",
    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.18.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.18.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.18.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.8.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.8.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.8.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.norm.weight": "model-00003-of-00004.safetensors"
  }
 }
--- a/rng_state_0.pth
+++ b/rng_state_0.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:d79fa8a90ba1bf3da1ece5a434f09cfa69fd162316c5146bc778fc4c5a88b553
 size 21687
--- a/rng_state_1.pth
+++ b/rng_state_1.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:093eabb9bed6410fcd98c6db3f679a066be4c2160a0af2bdc5654b1983a72148
 size 21687
--- a/scheduler.pt
+++ b/scheduler.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:8c744c8764d066369699bacacadba0fefd1cd74734988df7f684f11b3b431972
 size 627
--- a/sft_args.json
+++ b/sft_args.json
@@ -0,0 +1,260 @@
 {
  "model_type": "qwen2_5-7b",
  "model_id_or_path": "/qlgy0912/models/Qwen2.5-7B",
  "model_revision": "master",
  "full_determinism": false,
  "sft_type": "full",
  "freeze_parameters": [],
  "freeze_vit": false,
  "freeze_parameters_ratio": 0.0,
  "additional_trainable_parameters": [],
  "tuner_backend": "swift",
  "template_type": "default-generation",
  "output_dir": "/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450",
  "add_output_dir_suffix": true,
  "ddp_backend": "nccl",
  "ddp_find_unused_parameters": null,
  "ddp_broadcast_buffers": null,
  "ddp_timeout": 1800,
  "seed": 42,
  "resume_from_checkpoint": null,
  "resume_only_model": false,
  "ignore_data_skip": false,
  "dtype": "bf16",
  "packing": false,
  "train_backend": "transformers",
  "tp": 1,
  "pp": 1,
  "min_lr": null,
  "sequence_parallel": false,
  "model_kwargs": {},
  "loss_name": null,
  "dataset": [
    "/qlgy0912/pretrain_data/APE_210K.jsonl",
    "/qlgy0912/pretrain_data/ChatMed_TCM.jsonl",
    "/qlgy0912/pretrain_data/CMB_Train.jsonl",
    "/qlgy0912/pretrain_data/COIG_CQIA.jsonl",
    "/qlgy0912/pretrain_data/Encyclopedia.jsonl",
    "/qlgy0912/pretrain_data/Evol_Instruction_66K.jsonl",
    "/qlgy0912/pretrain_data/Literature.jsonl",
    "/qlgy0912/pretrain_data/MedicalBooks.jsonl",
    "/qlgy0912/pretrain_data/Medical_Records.jsonl",
    "/qlgy0912/pretrain_data/Pharmacopoeia.jsonl",
    "/qlgy0912/pretrain_data/TCM_Synd_Diff.jsonl",
    "/qlgy0912/pretrain_data/TCM_Synd_Know.jsonl"
  ],
  "val_dataset": [],
  "dataset_seed": 42,
  "dataset_test_ratio": 0.05,
  "use_loss_scale": false,
  "loss_scale_config_path": "/qlgy0912/swift_repo/swift-2.4.2/swift/llm/agent/default_loss_scale_config.json",
  "system": null,
  "tools_prompt": "react_en",
  "max_length": 4096,
  "truncation_strategy": "delete",
  "check_dataset_strategy": "warning",
  "streaming": false,
  "streaming_val_size": 0,
  "streaming_buffer_size": 16384,
  "model_name": [
    null,
    null
  ],
  "model_author": [
    null,
    null
  ],
  "quant_method": null,
  "quantization_bit": 0,
  "hqq_axis": 0,
  "hqq_dynamic_config_path": null,
  "bnb_4bit_comp_dtype": "bf16",
  "bnb_4bit_quant_type": "nf4",
  "bnb_4bit_use_double_quant": true,
  "bnb_4bit_quant_storage": null,
  "rescale_image": -1,
  "target_modules": [
    "q_proj",
    "k_proj",
    "v_proj"
  ],
  "target_regex": null,
  "modules_to_save": [],
  "lora_rank": 8,
  "lora_alpha": 32,
  "lora_dropout": 0.05,
  "lora_bias_trainable": "none",
  "lora_dtype": "AUTO",
  "lora_lr_ratio": null,
  "use_rslora": false,
  "use_dora": false,
  "init_lora_weights": "true",
  "fourier_n_frequency": 2000,
  "fourier_scaling": 300.0,
  "rope_scaling": null,
  "boft_block_size": 4,
  "boft_block_num": 0,
  "boft_n_butterfly_factor": 1,
  "boft_dropout": 0.0,
  "vera_rank": 256,
  "vera_projection_prng_key": 0,
  "vera_dropout": 0.0,
  "vera_d_initial": 0.1,
  "adapter_act": "gelu",
  "adapter_length": 128,
  "use_galore": false,
  "galore_target_modules": null,
  "galore_rank": 128,
  "galore_update_proj_gap": 50,
  "galore_scale": 1.0,
  "galore_proj_type": "std",
  "galore_optim_per_parameter": false,
  "galore_with_embedding": false,
  "galore_quantization": false,
  "galore_proj_quant": false,
  "galore_proj_bits": 4,
  "galore_proj_group_size": 256,
  "galore_cos_threshold": 0.4,
  "galore_gamma_proj": 2,
  "galore_queue_size": 5,
  "adalora_target_r": 8,
  "adalora_init_r": 12,
  "adalora_tinit": 0,
  "adalora_tfinal": 0,
  "adalora_deltaT": 1,
  "adalora_beta1": 0.85,
  "adalora_beta2": 0.85,
  "adalora_orth_reg_weight": 0.5,
  "ia3_feedforward_modules": [],
  "llamapro_num_new_blocks": 4,
  "llamapro_num_groups": null,
  "neftune_noise_alpha": null,
  "neftune_backend": "transformers",
  "lisa_activated_layers": 0,
  "lisa_step_interval": 20,
  "reft_layer_key": null,
  "reft_layers": null,
  "reft_rank": 4,
  "reft_intervention_type": "LoreftIntervention",
  "reft_args": null,
  "use_liger": false,
  "gradient_checkpointing": true,
  "deepspeed": null,
  "batch_size": 1,
  "eval_batch_size": 1,
  "auto_find_batch_size": false,
  "num_train_epochs": 2,
  "max_steps": -1,
  "optim": "adamw_torch",
  "adam_beta1": 0.9,
  "adam_beta2": 0.95,
  "adam_epsilon": 1e-08,
  "learning_rate": 1e-05,
  "weight_decay": 0.1,
  "gradient_accumulation_steps": 8,
  "max_grad_norm": 0.5,
  "predict_with_generate": false,
  "lr_scheduler_type": "cosine",
  "lr_scheduler_kwargs": {},
  "warmup_ratio": 0.05,
  "warmup_steps": 0,
  "eval_steps": 2000,
  "save_steps": 2000,
  "save_only_model": false,
  "save_total_limit": 5,
  "logging_steps": 20,
  "acc_steps": 1,
  "dataloader_num_workers": 1,
  "dataloader_pin_memory": true,
  "dataloader_drop_last": false,
  "push_to_hub": false,
  "hub_model_id": null,
  "hub_token": null,
  "hub_private_repo": false,
  "hub_strategy": "every_save",
  "test_oom_error": false,
  "disable_tqdm": false,
  "lazy_tokenize": false,
  "preprocess_num_proc": 1,
  "use_flash_attn": null,
  "ignore_args_error": false,
  "check_model_is_latest": true,
  "logging_dir": "/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450/runs",
  "report_to": [
    "tensorboard"
  ],
  "acc_strategy": "token",
  "save_on_each_node": false,
  "evaluation_strategy": "steps",
  "save_strategy": "steps",
  "save_safetensors": true,
  "gpu_memory_fraction": null,
  "include_num_input_tokens_seen": false,
  "local_repo_path": null,
  "custom_register_path": null,
  "custom_dataset_info": null,
  "device_map_config": null,
  "device_max_memory": [],
  "max_new_tokens": 2048,
  "do_sample": null,
  "temperature": null,
  "top_k": null,
  "top_p": null,
  "repetition_penalty": null,
  "num_beams": 1,
  "fsdp": "",
  "fsdp_config": null,
  "sequence_parallel_size": 1,
  "model_layer_cls_name": null,
  "metric_warmup_step": 0,
  "fsdp_num": 1,
  "per_device_train_batch_size": null,
  "per_device_eval_batch_size": null,
  "eval_strategy": null,
  "self_cognition_sample": 0,
  "train_dataset_mix_ratio": 0.0,
  "train_dataset_mix_ds": [
    "ms-bench"
  ],
  "train_dataset_sample": -1,
  "val_dataset_sample": null,
  "safe_serialization": null,
  "only_save_model": null,
  "neftune_alpha": null,
  "deepspeed_config_path": null,
  "model_cache_dir": null,
  "lora_dropout_p": null,
  "lora_target_modules": [],
  "lora_target_regex": null,
  "lora_modules_to_save": [],
  "boft_target_modules": [],
  "boft_modules_to_save": [],
  "vera_target_modules": [],
  "vera_modules_to_save": [],
  "ia3_target_modules": [],
  "ia3_modules_to_save": [],
  "custom_train_dataset_path": [],
  "custom_val_dataset_path": [],
  "device_map_config_path": null,
  "push_hub_strategy": null,
  "use_self_cognition": false,
  "is_multimodal": false,
  "is_vision": false,
  "lora_use_embedding": false,
  "lora_use_all": false,
  "lora_m2s_use_embedding": false,
  "lora_m2s_use_ln": false,
  "torch_dtype": "torch.bfloat16",
  "fp16": false,
  "bf16": true,
  "rank": 0,
  "local_rank": 0,
  "world_size": 2,
  "local_world_size": 2,
  "bnb_4bit_compute_dtype": "torch.bfloat16",
  "load_in_4bit": false,
  "load_in_8bit": false,
  "train_sampler_random": true,
  "train_type": "sft",
  "training_args": "Seq2SeqTrainingArguments(output_dir='/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450', overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, evaluation_strategy=<IntervalStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=1, per_device_eval_batch_size=1, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=8, eval_accumulation_steps=None, eval_delay=0, learning_rate=1e-05, weight_decay=0.1, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, max_grad_norm=0.5, num_train_epochs=2, max_steps=-1, lr_scheduler_type=<SchedulerType.COSINE: 'cosine'>, lr_scheduler_kwargs={}, warmup_ratio=0.05, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir='/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450/runs', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=True, logging_steps=20, logging_nan_inf_filter=True, save_strategy=<IntervalStrategy.STEPS: 'steps'>, save_steps=2000, save_total_limit=5, save_safetensors=True, save_on_each_node=False, save_only_model=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=42, jit_mode_eval=False, use_ipex=False, bf16=True, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=0, ddp_backend='nccl', tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=2000, dataloader_num_workers=1, dataloader_prefetch_factor=None, past_index=-1, run_name='/qlgy0912/llm_pretrain_output/qwen2_5-7b/v1-20240919-080450', disable_tqdm=False, remove_unused_columns=False, label_names=None, load_best_model_at_end=False, metric_for_best_model='loss', greater_is_better=False, ignore_data_skip=False, fsdp=[], fsdp_min_num_params=0, fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=AcceleratorConfig(split_batches=False, dispatch_batches=False, even_batches=True, use_seedable_sampler=True, gradient_accumulation_kwargs=None), deepspeed=None, label_smoothing_factor=0.0, optim=<OptimizerNames.ADAMW_TORCH: 'adamw_torch'>, optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=['tensorboard'], ddp_find_unused_parameters=False, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=False, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy=<HubStrategy.EVERY_SAVE: 'every_save'>, hub_token=None, hub_private_repo=False, hub_always_push=False, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=1800, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, dispatch_batches=None, split_batches=None, include_tokens_per_second=False, include_num_input_tokens_seen=False, neftune_noise_alpha=None, optim_target_modules=None, sortish_sampler=False, predict_with_generate=False, generation_max_length=None, generation_num_beams=None, generation_config=GenerationConfig {\n  \"bos_token_id\": 151643,\n  \"eos_token_id\": 151645,\n  \"max_new_tokens\": 2048,\n  \"pad_token_id\": 151643\n}\n, acc_strategy='token', loss_name=None, additional_saved_files=[], train_sampler_random=True, metric_warmup_step=0, train_dataset_sample=-1)"
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,31 @@
 {
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "eos_token": {
    "content": "<|im_end|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,207 @@
 {
  "add_bos_token": false,
  "add_prefix_space": false,
  "added_tokens_decoder": {
    "151643": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151644": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151645": {
      "content": "<|im_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151646": {
      "content": "<|object_ref_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151647": {
      "content": "<|object_ref_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151648": {
      "content": "<|box_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151649": {
      "content": "<|box_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151650": {
      "content": "<|quad_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151651": {
      "content": "<|quad_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151652": {
      "content": "<|vision_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151653": {
      "content": "<|vision_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151654": {
      "content": "<|vision_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151655": {
      "content": "<|image_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151656": {
      "content": "<|video_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151657": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151658": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151659": {
      "content": "<|fim_prefix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151660": {
      "content": "<|fim_middle|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151661": {
      "content": "<|fim_suffix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151662": {
      "content": "<|fim_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151663": {
      "content": "<|repo_name|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151664": {
      "content": "<|file_sep|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    }
  },
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "bos_token": null,
  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "errors": "replace",
  "model_max_length": 131072,
  "pad_token": "<|endoftext|>",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": null
 }
--- a/trainer_state.json
+++ b/trainer_state.json
--- a/training_args.bin
+++ b/training_args.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:44ab2f78618f8224082dd85cb346cd7efc8ddb9bcbedb24ad4adc1aebe5777df
 size 6459
--- a/vocab.json
+++ b/vocab.json
		`@@ -0,0 +1 @@`
							`{"framework":"Pytorch","task":"text-generation"}`