Commit Graph

11 Commits

Author SHA1 Message Date
wangxiyuan
0dae55a9a3 [MISC] fix format check error (#654)
This pr makes format.sh works as expect.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-29 11:14:19 +08:00
zzzzwwjj
5c6d05a59e support deepseek quant & mix-parallel with graphmode (#585)
### What this PR does / why we need it?
1. support deepseek with w8a8 quant;
2. support deepseek with mix-parallel(multi-DP, EP+TP);
3. support deepseek with graphmode.
---------

Signed-off-by: wen-jie666 <wenjie39@huawei.com>
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: libaokui <libaokui@huawei.com>
Signed-off-by: linfeng-yuan <1102311262@qq.com>
Co-authored-by: wen-jie666 <wenjie39@huawei.com>
2025-04-23 16:23:25 +08:00
Pleaplusone
d12a057df8 Add note for deepseek related docs and remove unnecessary comments (#590)
### What this PR does / why we need it?
Add notes for deepseek's patch and remove some of the unnecessary
comments

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-04-22 09:59:09 +08:00
Pleaplusone
1a1f9a6d89 port deepseekv2 and mtp to main branch (#429)
### What this PR does / why we need it?
This PR ports all the deepseek graph mode code and mtp code from v0.7.3
to the main branch
---------

Signed-off-by: SidaoY <1024863041@qq.com>
Signed-off-by: linfeng-yuan <1102311262@qq.com>
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: mengwei805 <mengwei25@huawei.com>
Signed-off-by: libaokui <libaokui@huawei.com>
Signed-off-by: q00832892 <qiaoyang19@huawei.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Co-authored-by: SidaoY <1024863041@qq.com>
Co-authored-by: linfeng-yuan <1102311262@qq.com>
Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
Co-authored-by: libaokui <libaokui@huawei.com>
2025-04-19 17:38:18 +08:00
hfadzxy
9935d45728 [CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460)
### What this PR does / why we need it?
Add model basic accuracy test(Qwen2.5-0.5B-Instruct)

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-04-17 14:59:56 +08:00
yiz-liu
0db6670bfa [Feature] Implement EP-compatible fused_moe (#121)
### What this PR does / why we need it?

Enable Expert-Parallel for ascend devices.

### Does this PR introduce _any_ user-facing change?

Enable EP
add `enable_expert_parallel=True` in your offline inference scripts,
like this:
```python
llm = LLM(
    model="/path/to/model",
    trust_remote_code=True,
    tensor_parallel_size=4,
    max_model_len=4096,
    enforce_eager=True,
    distributed_executor_backend="mp",
    enable_expert_parallel=True,
)
```

### How was this patch tested?

Please use the `main` branch of vLLM.

---------

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>
2025-03-11 21:08:02 +08:00
HongtaoYang
dcd0005058 [Fix] Remove npu_group_topk before CANN version update (#242)
Remove npu_group_topk before CANN version update.

Signed-off-by: SidaoY <1024863041@qq.com>
2025-03-06 09:02:46 +08:00
HongtaoYang
1715230867 [CI] Upgrade to newest pta.(MLA and FusedMoE) (#189)
Upgrade to newest pta.(MLA and FusedMoE)

---------

Signed-off-by: SidaoY <1024863041@qq.com>
2025-02-27 18:50:52 +08:00
Mengqing Cao
fd18ae6494 [MOE] fix #176 (#179)
Fix #176
We need to set `topk_group` and `num_expert_group` to `0` if they are
`None`

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-27 14:21:08 +08:00
Yaphets24
d0b3cb4fa7 modify:Eliminate redundant operations in the code to improve performance (#137)
### What this PR does / why we need it?
Eliminate redundant operations in the code to improve performance

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed
---------

Signed-off-by: Yaphets24 <d_mym0618@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
2025-02-22 17:43:42 +08:00
wangxiyuan
5f465010de [Core] Cherry pick from 0.7.1 to keep the main code newest (#127)
Cherry pick from 0.7.1 to keep the main code newest

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-21 17:07:37 +08:00