Commit Graph

5 Commits

Author SHA1 Message Date
wangxiyuan
846555cdb5 [Misc] Clean up uesless code in attention (#1933)
Before do attention module refactor, we can do some code cleanup to make
the next step easier.

What this PR does:

1. remove uesless `common_prefix_len` for attention builder
2. remove uesless `is_only_prefill` and `num_input_tokens` in attention
metadata.
3. remove `CommonAttentionMetadata` and ues `query_start_loc` instead,
`CommonAttentionMetadata` is over designed and uesless
4. update the attention backend input parameters to keep the same as
vLLM.
5. Rename attention name to the same style with `ASCEND` prefix

- vLLM version: v0.9.2
- vLLM main:
107111a859

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-24 10:23:34 +08:00
wangxiyuan
a8b316ac5b [CI] Make AttentionBackend interface compatible to fix broken CI (#1893)
vLLM commit
752c6ade2e
removed `blocksparse_params` for attention backend. This PR does the
same change to make CI happy.


- vLLM version: v0.9.2
- vLLM main:
9499e26e2a

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-21 08:21:06 +08:00
Angazenn
18495f44b2 [BugFix] Fix max_num_tokens_across_dp calculation bugs in attention_v1_torchair (#1636)
### What this PR does / why we need it?
This PR fixes a bug that is caused by max_num_tokens_across_dp
calculation. In earlier version, we compute this by graph_pad_size plus
max_num_tokens(actual). This will result in different
max_num_tokens_across_dp across dp ranks. If padding related is
required, this might cause a wrong padding.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed normally.

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
2025-07-07 20:03:02 +08:00
wangxiyuan
343955c7ac [CI] Follow vLLM FusedMoEParallelConfig interface change and clean up unused config (#1625)
This commit
78fe77534b
from vllm reverted the change for FusedMoEParallelConfig

This PR do the same to fix the CI error

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-04 17:54:33 +08:00
Angazenn
a5f33590d3 [CORE]initial support for torchair with non-mla backend (#1506)
### What this PR does / why we need it?
This PR supports torchair graph mode with non-mla backend on both 800IA2
and 300I Duo platforms. The main change is to add
`attention_v1_torchair.py` to support specific attention related
operations that are required by torchair.

### Does this PR introduce _any_ user-facing change?
Before this PR, vLLM-Ascend only allows deepseek to use torchair. Now we
can also use it with pangu. Besides, we add a support model list to
control which type of models that can use torchair.

### How was this patch tested?
We have test it with PanguProMoE on both 800IA2 and 300I Duo platforms,
and model generates answer normally.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: tianyitang <tangtianyi4@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: tianyitang <tangtianyi4@huawei.com>
2025-07-03 22:21:42 +08:00