xc-llm-ascend

Author	SHA1	Message	Date
wemaster	0ae9ee0f8a	[BUGFIX] main-sd-bugfix && [UT] add mtp UT (#593 ) ### What this PR does / why we need it? The pr will fix some bug about spec decode / MTP The pr add a mtp e2e UT `test_mtp_correctness.py` vllm_ascend/attention/attention.py 1. add support `self.attn_mask_cache` only has 1 element to cover scene in which both spec docode and chunked prefill are enabled. vllm_ascend/distributed/parallel_state.py 1. remove 2 assert because spec decode worker would use init_worker twice vllm_ascend/models/deepseek_mtp.py 1. remove unused params; 2. add support w8a8 in `CustomDeepSeekMTP` vllm_ascend/quantization/quant_config.py 1. use `AscendUnquantizedFusedMoEMethod` instead of `UnquantizedFusedMoEMethod` other 1. replace `from vllm.logger import init_logger` to `from vllm.logger import logger` all of the vllm-ascend project ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Signed-off-by: mengwei805 <mengwei25@huawei.com>	2025-04-21 19:25:51 +08:00
Pleaplusone	1a1f9a6d89	port deepseekv2 and mtp to main branch (#429 ) ### What this PR does / why we need it? This PR ports all the deepseek graph mode code and mtp code from v0.7.3 to the main branch --------- Signed-off-by: SidaoY <1024863041@qq.com> Signed-off-by: linfeng-yuan <1102311262@qq.com> Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com> Signed-off-by: mengwei805 <mengwei25@huawei.com> Signed-off-by: libaokui <libaokui@huawei.com> Signed-off-by: q00832892 <qiaoyang19@huawei.com> Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Co-authored-by: SidaoY <1024863041@qq.com> Co-authored-by: linfeng-yuan <1102311262@qq.com> Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com> Co-authored-by: mengwei805 <mengwei25@huawei.com> Co-authored-by: libaokui <libaokui@huawei.com>	2025-04-19 17:38:18 +08:00
hfadzxy	9935d45728	[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 ) ### What this PR does / why we need it? Add model basic accuracy test(Qwen2.5-0.5B-Instruct) Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-04-17 14:59:56 +08:00
Huazhong Ji	c3d1a3782a	Add pyhccl (#503 ) This is the first step to support trl vllm serve on Ascend NPU https://github.com/vllm-project/vllm-ascend/issues/459. This PR can work properly only when https://github.com/vllm-project/vllm/pull/16464 is merged into vLLM. --------- Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>	2025-04-17 14:57:52 +08:00
eeethenQ	44a8301424	[Feature] Add PD separation feature (#432 ) ### What this PR does / why we need it? Adapt Disaggregated Prefill feature onto Ascend device ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? The test usage has been provided alongwith the PR, in examples/offline_disaggregated_prefill_npu.py To run it, do this ``` export PROMPT_DEVICE_ID=0,1 export DECODE_DEVICE_ID=2,3 python examples/offline_disaggregated_prefill_npu.py ``` --------- Signed-off-by: ZihuiQian <qianzihui@huawei.com> Co-authored-by: ZihuiQian <qianzihui@huawei.com>	2025-04-15 15:11:35 +08:00

5 Commits