Files
ModelHub XC 979330991c 初始化项目,由ModelHub XC社区提供模型
Model: q021gink/Llama3-Weighted-Combination
Source: Original Platform
2026-05-22 01:30:13 +08:00

67 lines
2.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: Apache License 2.0
---
### Clone with HTTP
```bash
git clone https://www.modelscope.cn/q021gink/Llama3-Weighted-Combination.git
```
# Llama3权重组合
Llama3权重组合基于LLM-Research/Llama3-8B-Chinese-Chat模型权重文件model-00001-of-00004.safetensors、model-00002-of-00004.safetensors和FlagAlpha/Llama3-Chinese-8B-Instruct模型文件组合而成。新模型继承了两者优点如模型1较快的推理速度、弱智吧数据推理、模型2的思维链特征不过模型依然存在assistant重复、长序列数学计算错误等问题。
本模型旨在提出新的研究视角:同类型同参数不同微调模型权重文件的随机组合,是否有助于提升模型能力(克服不足)或涌现新能力,模块化权重文件设计是否有价值。
## 如何使用
下载模型
```
git clone https://www.modelscope.cn/q021gink/Llama3-Weighted-Combination.git
```
部分情况下需要更新transformer库
```
pip install --upgrade transformers
```
## 测试
根据https://modelscope.cn/headlines/article/473方法测试。
1.1 安装
```
git clone https://github.com/modelscope/eval-scope
cd eval-scope
pip install -e .
```
1.2测试命令
```
python3 llmuses/run.py --model q021gink/Llama3-Weighted-Combination --template-type llama3 --datasets arc ceval gsm8k --dataset-args '{"gsm8k": {"few_shot_num": 0}}'
```
自测结果:
```
2024-05-03 21:23:47,879 - llmuses - INFO - Dump data to /root/.cache/llmuses/outputs/eval_arc-ceval-gsm8k_q021gink_Llama3-Weighted-Combination_default/reviews/modelscope_gsm8k_main.jsonl successfully.
2024-05-03 21:23:47,880 - llmuses - INFO - ** Dump report: modelscope_gsm8k.json
2024-05-03 21:23:47,880 - llmuses - INFO - ** Report table:
+-----------------------------+------------------+--------------------+-------------------+
| Model | arc | ceval | gsm8k |
+=============================+==================+====================+===================+
| Llama3-Weighted-Combination | (arc/acc) 0.7918 | (ceval/acc) 0.4859 | (gsm8k/acc) 0.655 |
+-----------------------------+------------------+--------------------+-------------------+
```
从前两项测试结果看组合权重模型和原生模型Meta-Llama-3-8B-instruct相差不大见https://modelscope.cn/headlines/article/473,gsm8k分数下降较大可见微调导致的灾难遗忘不可避免。
单从前两项测试看组合权重模型的天花板由原生模型决定。暂不清楚本组合模型父母模型LLM-Research/Llama3-8B-Chinese-Chat和FlagAlpha/Llama3-Chinese-8B-Instruct模型测试表现如何感兴趣的朋友可进一步做对比测试。
推测权重组合不一定涌现新能力不过可以起到修复作用如某种微调导致模型能力下降较大可与原生模型文件进行组合dropout不好的权重最大限度保留模型原始能力和部分微调特征。