xc-llm-ascend

Author	SHA1	Message	Date
zhangxinyuehfad	0d094531b4	[bugfix] Fixed the bug in retrieving the quantization method for mlp.… (#4797 ) When retrieving the quantization method for MOE (e.g., the quantization file of DeepSeek v3.2 exp do not match the model's naming convention in eager mode), a KeyError is raised: "model.layers.3.mlp.experts.weight not in self.quant_description". However the quantization file is like : ```bash "model.layers.3.mlp.experts.255.gate_proj.weight": "W8A8_DYNAMIC", "model.layers.3.mlp.experts.255.gate_proj.weight_scale": "W8A8_DYNAMIC", "model.layers.3.mlp.experts.255.gate_proj.weight_offset": "W8A8_DYNAMIC", "model.layers.3.mlp.experts.255.down_proj.weight": "W8A8_DYNAMIC", "model.layers.3.mlp.experts.255.down_proj.weight_scale": "W8A8_DYNAMIC", "model.layers.3.mlp.experts.255.down_proj.weight_offset": "W8A8_DYNAMIC", "model.layers.3.mlp.experts.255.up_proj.weight": "W8A8_DYNAMIC", "model.layers.3.mlp.experts.255.up_proj.weight_scale": "W8A8_DYNAMIC", "model.layers.3.mlp.experts.255.up_proj.weight_offset": "W8A8_DYNAMIC", ``` Co-Authored-By: yangqinghao-cmss <yangqinghao_yewu@cmss.chinamobile.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: yangqinghao-cmss <yangqinghao_yewu@cmss.chinamobile.com>	2025-12-09 08:47:19 +08:00
Slightwind	4f6d60eb06	[Feature] Add W4A4 Flat Quantization support (#3427 ) Introduce W4A4 Flat Quantization for better model compression and inference efficiency on Ascend devices. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2025-10-13 23:20:16 +08:00
22dimensions	f5a97e8fa5	[Quantization] register AscendQuantRMSNorm for quantization (#2856 ) ### What this PR does / why we need it? modelslim will generate self.bias for rms norm in quantization, since RMSNorm in vllm has no this parameter, so its nesscesary to create a AscendQuantRmsNorm. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested by deepseek-v3.1-w8a8 <img width="2496" height="592" alt="image" src="https://github.com/user-attachments/assets/004c6e76-3d7a-4a1f-b59f-a14304012663" /> - vLLM version: main - vLLM main: `d6249d0699` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-09-11 23:14:02 +08:00
22dimensions	d51694a77b	[2/N][Refactor][Quantization] clean quantization patch (#2785 ) ### What this PR does / why we need it? quantization patch is unused code ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested by CI - vLLM version: v0.10.1.1 - vLLM main: `f4962a6d55` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-09-08 17:31:53 +08:00
22dimensions	37f5a29cd4	[1/N][Refactor][Quantization] remove redundant quantizer class (#2680 ) ### What this PR does / why we need it? AscendQuantizer/LLMQuantizer class is used to select quant method based on quant config and some other arguments, but it is more simple and clean replacing these classes with map. So i remove them. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ut and e2e test - vLLM version: v0.10.1.1 - vLLM main: `6997a25ac6` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-09-04 11:35:14 +08:00

5 Commits