xc-llm-ascend

Files

Ruri dd7a25063c [Feat] Prefetching Attention QKV Linear Weight With AddRmsNormQuant Custom Op (#3517 )

### What this PR does / why we need it?

- `qkv_proj.weight` prefetching has been implemented with `Quant` op,
when `AddRmsNormQuant` is enabled (#3465) `qkv_proj.weight` prefetching
won't work
- Implement `qkv_proj.weight` prefetching with `AddRmsNormQuant`

### Does this PR introduce _any_ user-facing change?

None.

### How was this patch tested?

Tested on `Qwen3-235B-A22B-W8A8`
<img width="1868" height="109" alt="image"
src="https://github.com/user-attachments/assets/0bc28082-0287-4d5c-b8f6-f907c3134d36"
/>


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

2025-10-23 10:07:37 +08:00

expert_map.json

Add unit test local cpu guide and enable base testcase (#1566 )

2025-07-06 10:42:27 +08:00

test_activation.py

[main] mlp weight prefetch in Qwen Dense Models (#2816 )

2025-09-11 21:20:09 +08:00

test_comm_utils.py

Refactor tensor_parallel and comm_utils (#2814 )

2025-09-11 21:26:36 +08:00

test_common_fused_moe.py

[Refactor] Adjustments to moe_comm_method selection process (#3001 )