Commit Graph

3 Commits

Author SHA1 Message Date
Wang Yixuan
d0086d432a fix deepseek torchair recompile (#3679)
### What this PR does / why we need it?
The #3624 PR fix the precision of deepseek torchair, but don't consider
the limitation of torch compile which results in the recompile, This PR
fixs this problem. PR to main #3678


### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: hust17yixuan <303660421@qq.com>
2025-10-23 22:53:13 +08:00
Wang Yixuan
edccd46d74 fix deepseek torchair precision (#3635)
### What this PR does / why we need it?
The precision of deepseek torchair is broken by #3465 , which due to the origin patch or rmsnorm in torchair. This PR fixes the precision of deepseek torchair.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: hust17yixuan <303660421@qq.com>
2025-10-22 20:20:32 +08:00
rjg-lyh
0005479b9c [main] mlp weight prefetch in Qwen Dense Models (#2816)
### What this PR does / why we need it?
This PR prefetchs the weight of mlp layers in Qwen Dense Models to
optimize the performance in Decode phase mainly.

### Does this PR introduce _any_ user-facing change?
 No.

### How was this patch tested?
CI passed with new added/existing test.

- vLLM version: main
- vLLM main:
a1213fae5f

Signed-off-by: rjg-lyh <1318825571@qq.com>
Co-authored-by: Shuming19 <313093131@qq.com>
2025-09-11 21:20:09 +08:00