xc-llm-ascend

Author	SHA1	Message	Date
linfeng-yuan	ffdd1a36e2	[bugfix][torchair] fix wasted NPU memory buffer allocation for quantized deepseek with unquantized MTP layer (#3068 ) ### What this PR does / why we need it? While running quantized deepseek models with unquantized MTP layer, free NPU memory abnormally decreases for `2*HCCL_BUFFSIZE` bytes. This results from the wasted VRAM buffer allocation casued by calling `dist.all_to_all_single` without correct device process group argument. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? We run vllm online serving with quantized deepseek-r1 and unquantized MTP layer, and observed that free_memory increased without redundat VRAM buffer for HCCL communication op (all_to_all_single). - vLLM version: v0.10.2 - vLLM main: `6d8246aaff` Signed-off-by: linfeng-yuan <1102311262@qq.com>	2025-09-22 14:06:43 +08:00
Angazenn	aeffe27b30	[Perf]set moe w2_weight default to be nz (#2842 ) ### What this PR does / why we need it? This PR sets the default format of GMM w2_weight in w8a8_dynamic to be NZ to improve performance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: main - vLLM main: `e40827280b` --------- Signed-off-by: Angazenn <supperccell@163.com>	2025-09-11 21:40:54 +08:00
Wang Yixuan	936c102105	[bugfix][refactor]fix torchair_w8a8 (#2569 ) ### What this PR does / why we need it? torchair w8a8 and w4a8 Separate from fused_moe due to the refactor and change for fused_moe ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? vLLM version: main vLLM main: `ab9f2cfd19` - vLLM version: v0.10.1.1 - vLLM main: `69244e67e6` Signed-off-by: hust17yixuan <303660421@qq.com>	2025-08-28 09:10:31 +08:00
Wang Yixuan	20a7bc4b71	[3/N][refactor] refactoer quantization (#2504 ) ### What this PR does / why we need it? Move torchair related qunatization section into torchair dir to make the code clear. Next step we'll remove all torchair related code outside of torchair quantization. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? vLLM version: main vLLM main: `ab9f2cfd19` - vLLM version: v0.10.1.1 - vLLM main: `959783fb99` Signed-off-by: hust17yixuan <303660421@qq.com>	2025-08-27 10:45:50 +08:00

4 Commits