xc-llm-ascend

Files

rjg-lyh 0005479b9c [main] mlp weight prefetch in Qwen Dense Models (#2816 )

### What this PR does / why we need it?
This PR prefetchs the weight of mlp layers in Qwen Dense Models to
optimize the performance in Decode phase mainly.

### Does this PR introduce _any_ user-facing change?
 No.

### How was this patch tested?
CI passed with new added/existing test.

- vLLM version: main
- vLLM main:
a1213fae5f

Signed-off-by: rjg-lyh <1318825571@qq.com>
Co-authored-by: Shuming19 <313093131@qq.com>

2025-09-11 21:20:09 +08:00

310p

Refactor e2e CI (#2276 )

2025-09-02 09:02:22 +08:00

doctests

Remove transformer pins for v0.9.1-dev (#2234 )

2025-08-07 14:41:10 +08:00

models

Accuracy report formatting (#2279 )

2025-08-25 09:39:30 +08:00

multicard

[main] mlp weight prefetch in Qwen Dense Models (#2816 )