xc-llm-ascend

Author	SHA1	Message	Date
Angazenn	aeffe27b30	[Perf]set moe w2_weight default to be nz (#2842 ) ### What this PR does / why we need it? This PR sets the default format of GMM w2_weight in w8a8_dynamic to be NZ to improve performance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: main - vLLM main: `e40827280b` --------- Signed-off-by: Angazenn <supperccell@163.com>	2025-09-11 21:40:54 +08:00
22dimensions	37f5a29cd4	[1/N][Refactor][Quantization] remove redundant quantizer class (#2680 ) ### What this PR does / why we need it? AscendQuantizer/LLMQuantizer class is used to select quant method based on quant config and some other arguments, but it is more simple and clean replacing these classes with map. So i remove them. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ut and e2e test - vLLM version: v0.10.1.1 - vLLM main: `6997a25ac6` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-09-04 11:35:14 +08:00
Wang Yixuan	936c102105	[bugfix][refactor]fix torchair_w8a8 (#2569 ) ### What this PR does / why we need it? torchair w8a8 and w4a8 Separate from fused_moe due to the refactor and change for fused_moe ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? vLLM version: main vLLM main: `ab9f2cfd19` - vLLM version: v0.10.1.1 - vLLM main: `69244e67e6` Signed-off-by: hust17yixuan <303660421@qq.com>	2025-08-28 09:10:31 +08:00
Wang Yixuan	20a7bc4b71	[3/N][refactor] refactoer quantization (#2504 ) ### What this PR does / why we need it? Move torchair related qunatization section into torchair dir to make the code clear. Next step we'll remove all torchair related code outside of torchair quantization. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? vLLM version: main vLLM main: `ab9f2cfd19` - vLLM version: v0.10.1.1 - vLLM main: `959783fb99` Signed-off-by: hust17yixuan <303660421@qq.com>	2025-08-27 10:45:50 +08:00

4 Commits