Files
xc-llm-ascend/tests/e2e
rjg-lyh 0005479b9c [main] mlp weight prefetch in Qwen Dense Models (#2816)
### What this PR does / why we need it?
This PR prefetchs the weight of mlp layers in Qwen Dense Models to
optimize the performance in Decode phase mainly.

### Does this PR introduce _any_ user-facing change?
 No.

### How was this patch tested?
CI passed with new added/existing test.

- vLLM version: main
- vLLM main:
a1213fae5f

Signed-off-by: rjg-lyh <1318825571@qq.com>
Co-authored-by: Shuming19 <313093131@qq.com>
2025-09-11 21:20:09 +08:00
..
2025-09-02 09:02:22 +08:00
2025-08-25 09:39:30 +08:00
2025-09-02 18:49:17 +08:00
2025-09-02 09:02:22 +08:00
2025-06-09 16:34:41 +08:00