Files
qwen2.5-14b-instruct-awq/README.md
ModelHub XC 08cd5308e6 初始化项目,由ModelHub XC社区提供模型
Model: tclf90/qwen2.5-14b-instruct-awq
Source: Original Platform
2026-05-22 23:15:14 +08:00

65 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: qwen
license_name: Tongyi Qianwen LICENSE AGREEMENT
license_link: LICENSE
pipeline_tag: text-generation
tags:
- qwen2.5
- awq
- int4
- 量化修复
- vLLM
- sglang
---
# 通义千问2.5-14B-Instruct-AWQ-量化修复
原模型 [qwen/Qwen2.5-14B-Instruct](https://www.modelscope.cn/models/qwen/Qwen2.5-14B-Instruct)
### 【模型更新日期】
注:通过`snapshot_download`函数传入`revision=...`来下载指定的`tag`版本
```
2024-09-24
1. add group 128
```
### 【模型列表】
| tag | 文件大小 | 最近更新时间 |
|--------|---------|--------------|
| `g128` | `9.4GB` | `2024-09-24` |
### 【模型下载】
```python
from modelscope import snapshot_download
snapshot_download('tclf90/...', cache_dir="本地路径", revision='g128')
```
### 【修复内容】
1. 对GPTQ量化的校准做了额外优化减少模型的 `1.乱吐字`、`2.无限循环`、`3.长文能力丢失`等情况。
2. 相同技术移植至AWQ。
3. 有些推理框架的默认`top_k`与`top_p`较大,可以考虑减小对应数值,来获得更合理的模型输出。
### 【介绍】
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
- Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
- Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
- **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
- **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
### 【高并发RESTFul API推理】
方式1[vllm](https://github.com/vllm-project/vllm)
方式2[sglang](https://github.com/sgl-project/sglang)
目前推荐使用sglang进行部署相较于vllm, sglang于A100实测能有50%100%的吞吐增益。