[CORE]initial support for torchair with non-mla backend (#1506)

### What this PR does / why we need it? This PR supports torchair graph mode with non-mla backend on both 800IA2 and 300I Duo platforms. The main change is to add `attention_v1_torchair.py` to support specific attention related operations that are required by torchair. ### Does this PR introduce _any_ user-facing change? Before this PR, vLLM-Ascend only allows deepseek to use torchair. Now we can also use it with pangu. Besides, we add a support model list to control which type of models that can use torchair. ### How was this patch tested? We have test it with PanguProMoE on both 800IA2 and 300I Duo platforms, and model generates answer normally. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Signed-off-by: tianyitang <tangtianyi4@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Co-authored-by: tianyitang <tangtianyi4@huawei.com>
2025-07-03 22:21:42 +08:00
parent 9fbd8017c0
commit a5f33590d3
19 changed files with 1130 additions and 84 deletions
--- a/tests/ut/test_ascend_config.py
+++ b/tests/ut/test_ascend_config.py
@@ -6,6 +6,7 @@ from transformers import PretrainedConfig
 from vllm.config import ModelConfig, VllmConfig

 from vllm_ascend.ascend_config import (check_ascend_config,
+                                       check_torchair_supported,
                                       clear_ascend_config, get_ascend_config,
                                       init_ascend_config)

@@ -242,3 +243,10 @@ class TestAscendConfig(unittest.TestCase):
                test_vllm_config.model_config = fake_model_config
                init_ascend_config(test_vllm_config)
                check_ascend_config(test_vllm_config, False)
+
+    def test_check_torchair_supported(self):
+        test_cases = [('deepseek_v3', True), ('PanguProMoE', True),
+                      ('qwen', False), ('llama', False)]
+        for model_type, expected_output in test_cases:
+            self.assertEqual(check_torchair_supported(model_type),
+                             expected_output)