Commit Graph

5 Commits

Author SHA1 Message Date
zzzzwwjj
4df8df5b94 [bugfix] fix deepseek rope sincoscache re-generation (#2744)
### What this PR does / why we need it?
The current implementation will result in duplicate generation of
`sin_cos_cache` in rope when `kv_seqlen` > 4k, because the
initialization length of the `sin_cos_cache` is only 4k.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
After this PR merged, sin_cos_cache will not increase in forward func,
so `test_native_rope_deepseek_forward_cache_handling` is not necessary.

- vLLM version: v0.10.1.1
- vLLM main:
60f0843ef8

Signed-off-by: zzzzwwjj <1183291235@qq.com>
2025-09-08 22:03:34 +08:00
1092626063
5b3646ab21 [FEATURE][MTP] Support MTP > 1 (#2708)
### What this PR does / why we need it?
[RFC:Support MTP > 1 for
DeepSeek](https://github.com/vllm-project/vllm-ascend/issues/2745)

- [x] dp1 tp16
- [x] dp4 tp4
- [x] dp2 tp 8
- [x] torchair graph

- vLLM version: v0.10.1.1
- vLLM main:
c9f7081f9c

Signed-off-by: 1092626063 <1092626063@qq.com>
2025-09-05 09:11:22 +08:00
linfeng-yuan
90a75a90a9 [bugfix] fix torchair runtime error caused by configuration mismtaches and file missing (#2532)
### What this PR does / why we need it?
This PR ports #2312 #2506 #2531 to main branch.

Original implementation of torchair caching forces users to make
everything prepared, fix all the configuration and enable
`use_cached_npu_graph`, and it might cause some problems confusing to
understand and tackle for users. It is better to compile the graph twice
instead of reusing the old kvcaches and cached torchair graph. And the
extra duration time is acceptable. Additionally, this pr fixes a
recompilation problem of torchair graph mode caused by
`running_in_graph` variable in `AscendMLATorchairImpl`.

### Does this PR introduce _any_ user-facing change?
If users want to enabling torchair.cache_compile with high compilation
speed, it is recommended to enable both `use_cached_kv_cache_bytes` and
`use_cached_graph` in `torchair_graph_config`. Without
`use_cached_kv_cache_bytes`, we'll compile torchair computation graph
twice to avoid runtime error caused by configuration mismtaches (the
second compilation will be much faster). Additionally, we've made a
change to how the TORCHAIR_CACHE_HOME enviroment variable is utilized to
enhance safety and prevent accidental file deletion by adding a suffix
directory.

### How was this patch tested?
CI and e2e vllm serving pass.


- vLLM version: v0.10.1.1
- vLLM main:
70549c1245

---------

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-09-03 17:56:12 +08:00
ZhaoJiangJiang
3629bc4431 feat: add mtp ut and fix some bugs (#2453)
### What this PR does / why we need it?
Fix mtp mode ut

### Does this PR introduce _any_ user-facing change?
Nothing

### How was this patch tested?
This can be tested in the same way as a unit test.


- vLLM version: v0.10.0
- vLLM main:
53415653ff

Signed-off-by: 赵江江 <zhaojiangjiang1@h-partners.com>
Co-authored-by: 赵江江 <zhaojiangjiang1@h-partners.com>
2025-08-22 17:09:08 +08:00
linfeng-yuan
0ca3f48c90 [2/N][refactor] torchair deepseek mla backend refactor (#2459)
### What this PR does / why we need it?
This PR move current unified mla backend to torchair folder and remove
torchair-related code in attention/mla_v1.py (1.3k -> 0.9k).

 
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Running eager mode with mla backend, and torchair mode with code before
[2445](https://github.com/vllm-project/vllm-ascend/pull/2445)


- vLLM version: v0.10.0
- vLLM main:
f571ff8eb6

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-08-21 14:02:30 +08:00