xc-llm-ascend

Author	SHA1	Message	Date
Angazenn	5b746f3e83	[Inductor]change pass to adapt to new addrmsnormBias operator (#6094 ) ### What this PR does / why we need it? #5790 changes default addrmsnormBias operator if custom ops is enabled. This PR modifies AddRmsNormQuant pass to align with addrmsnormBias. --------- Signed-off-by: Angazenn <supperccell@163.com>	2026-01-24 20:16:44 +08:00
zhangxinyuehfad	f7b904641e	[Main2Main] Upgrade vllm commit to 0109 (#5752 ) ### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to https://github.com/vllm-project/vllm/pull/31786 2. fix spec_decode e2e test due to https://github.com/vllm-project/vllm/pull/29821 break 3. fix `vllm.v1.attention.backends.utils` duo to https://github.com/vllm-project/vllm/pull/31891 4. fix `self.seq_lens - query_lens` on same device due to https://github.com/vllm-project/vllm/pull/31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-01-13 19:14:43 +08:00
Angazenn	acc3578f58	[Graph][Fusion]Add new pattern for AddRmsnormQuant with SP. (#5077 ) ### What this PR does / why we need it? 1. In addition to [#4168](https://github.com/vllm-project/vllm-ascend/pull/4168), [#5011](https://github.com/vllm-project/vllm-ascend/pull/5011)， this PR adds two more pattern for AddRmsnormQuant with SP enabled. The key difference is to insert an additional `maybe_all_gather_and_maybe_unpad` between `addrmsnorm` and `quantize`. 2. This PR also introduce another api `torch.ops.vllm.quantize`, so that we pass `input_scale` and `input_scale_reciprocal` at the same time. This is because `npu_add_rms_norm_quant` and `npu_quantize` requires different `div_mode`. To avoid introducing additional reciprocal calculation in runtime, we have to pass both of them to quantize api. 3. Removes redundant `AscendQuantRmsnorm`. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: Angazenn <supperccell@163.com>	2025-12-18 20:25:44 +08:00
Icey	5fae65f3a8	[Graph][Fusion] Add AddRMSNorm(with bias) and Quant Fusion Pattern (#5011 ) ### What this PR does / why we need it? AddRMSNorm(with bias) and Quant Fusion Pattern ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wxsIcey <1790571317@qq.com>	2025-12-15 18:37:56 +08:00
Icey	18221c0e1d	[Fusion] normalize fusion naming and enable e2e test (#4693 ) ### What this PR does / why we need it? This PR standardizes the fusion naming, changing `enable_quantization_fusion` to `fuse_norm_quant`, and enables e2e testing. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wxsIcey <1790571317@qq.com>	2025-12-11 17:53:43 +08:00

5 Commits