81 lines
3.2 KiB
Markdown
81 lines
3.2 KiB
Markdown
---
|
||
license: qwen
|
||
license_name: Tongyi Qianwen LICENSE AGREEMENT
|
||
license_link: LICENSE
|
||
pipeline_tag: text-generation
|
||
tags:
|
||
- qwen2.5
|
||
- gptq
|
||
- int8
|
||
- 量化修复
|
||
- vLLM
|
||
- sglang
|
||
---
|
||
|
||
# 通义千问2.5-7B-Chat-GPTQ-Int8-量化修复
|
||
原模型 [qwen/Qwen2.5-7B-Instruct](https://www.modelscope.cn/models/qwen/Qwen2.5-7B-Instruct)
|
||
|
||
|
||
### 【模型更新日期】
|
||
|
||
注:通过`snapshot_download`函数传入`revision=...`来下载指定的`tag`版本
|
||
|
||
```
|
||
2024-09-23 tag g128v2
|
||
1. 减少长文时吐字重复与消失的情况
|
||
|
||
2024-09-23 tag g32v2
|
||
1. 减少长文时吐字重复与消失的情况
|
||
|
||
2024-09-22 tag g128
|
||
1. add group 128
|
||
|
||
2024-09-22 tag g32
|
||
1. add group 32
|
||
```
|
||
|
||
### 【模型列表】
|
||
|
||
| tag | 文件大小 | 最近更新时间 |
|
||
|----------|---------|--------------|
|
||
| `g128v2` | `8.3GB` | `2024-09-23` |
|
||
| `g32v2` | `8.7GB` | `2024-09-23` |
|
||
| `g128` | `8.3GB` | `2024-09-23` |
|
||
| `g32` | `8.7GB` | `2024-09-22` |
|
||
|
||
```python
|
||
from modelscope import snapshot_download
|
||
snapshot_download('tclf90/qwen2.5-7b-instruct-gptq-int8', cache_dir="本地路径", revision='g128v2')
|
||
snapshot_download('tclf90/qwen2.5-7b-instruct-gptq-int8', cache_dir="本地路径", revision='g32v2')
|
||
```
|
||
|
||
### 【修复内容】
|
||
|
||
1. 对GPTQ量化的校准做了额外优化;减少模型的 `1.乱吐字`、`2.无限循环`、`3.长文能力丢失`等情况。
|
||
2. 有些推理框架的默认`top_k`与`top_p`较大,可以考虑减小对应数值,来获得更合理的模型输出。
|
||
3. 根据模型实际情况,可以支持1卡、2卡及4卡的`tensor-parallel-size`启动。
|
||
|
||
|
||
### 【介绍】
|
||
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
|
||
|
||
- Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
|
||
- Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
|
||
- **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
|
||
- **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
||
|
||
For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
||
|
||
|
||
### 【模型下载】
|
||
```python
|
||
from modelscope import snapshot_download
|
||
snapshot_download('tclf90/模型名', cache_dir="本地路径", revision='...tag...')
|
||
```
|
||
|
||
### 【高并发RESTFul API推理】
|
||
方式1:[vllm](https://github.com/vllm-project/vllm)
|
||
|
||
方式2:[sglang](https://github.com/sgl-project/sglang)
|
||
|
||
目前推荐使用sglang进行部署,相较于vllm, sglang于A100实测,能有50%~100%的吞吐增益。 |