xc-llm-ascend

Author	SHA1	Message	Date
wangxiyuan	846555cdb5	[Misc] Clean up uesless code in attention (#1933 ) Before do attention module refactor, we can do some code cleanup to make the next step easier. What this PR does: 1. remove uesless `common_prefix_len` for attention builder 2. remove uesless `is_only_prefill` and `num_input_tokens` in attention metadata. 3. remove `CommonAttentionMetadata` and ues `query_start_loc` instead, `CommonAttentionMetadata` is over designed and uesless 4. update the attention backend input parameters to keep the same as vLLM. 5. Rename attention name to the same style with `ASCEND` prefix - vLLM version: v0.9.2 - vLLM main: `107111a859` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-24 10:23:34 +08:00
wangxiyuan	a8b316ac5b	[CI] Make AttentionBackend interface compatible to fix broken CI (#1893 ) vLLM commit `752c6ade2e` removed `blocksparse_params` for attention backend. This PR does the same change to make CI happy. - vLLM version: v0.9.2 - vLLM main: `9499e26e2a` --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-07-21 08:21:06 +08:00
Angazenn	18495f44b2	[BugFix] Fix max_num_tokens_across_dp calculation bugs in attention_v1_torchair (#1636 ) ### What this PR does / why we need it? This PR fixes a bug that is caused by max_num_tokens_across_dp calculation. In earlier version, we compute this by graph_pad_size plus max_num_tokens(actual). This will result in different max_num_tokens_across_dp across dp ranks. If padding related is required, this might cause a wrong padding. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed normally. Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com>	2025-07-07 20:03:02 +08:00
wangxiyuan	343955c7ac	[CI] Follow vLLM FusedMoEParallelConfig interface change and clean up unused config (#1625 ) This commit `78fe77534b` from vllm reverted the change for FusedMoEParallelConfig This PR do the same to fix the CI error Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-04 17:54:33 +08:00
Angazenn	a5f33590d3	[CORE]initial support for torchair with non-mla backend (#1506 ) ### What this PR does / why we need it? This PR supports torchair graph mode with non-mla backend on both 800IA2 and 300I Duo platforms. The main change is to add `attention_v1_torchair.py` to support specific attention related operations that are required by torchair. ### Does this PR introduce _any_ user-facing change? Before this PR, vLLM-Ascend only allows deepseek to use torchair. Now we can also use it with pangu. Besides, we add a support model list to control which type of models that can use torchair. ### How was this patch tested? We have test it with PanguProMoE on both 800IA2 and 300I Duo platforms, and model generates answer normally. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Signed-off-by: tianyitang <tangtianyi4@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Co-authored-by: tianyitang <tangtianyi4@huawei.com>	2025-07-03 22:21:42 +08:00

5 Commits