[Feat] chunkprefill mla support torchair graph (#1772)

chunkprefill mla only support eager mode now，we want to optimaze it by support torchair graph, the idea is simple, when all the request is running in decode, use torchair graph to deal with it, else when chunkprefill or prefill only, use the eager mode - vLLM version: v0.10.0 - vLLM main: ebf7605b0d Signed-off-by: haojiangzheng <justineric096@gmail.com> Co-authored-by: haojiangzheng <justineric096@gmail.com>
2025-08-11 19:58:59 +08:00
parent 881e36d6a9
commit eb43a475f4
2 changed files with 28 additions and 18 deletions
--- a/tests/ut/attention/test_mla_v1.py
+++ b/tests/ut/attention/test_mla_v1.py
@@ -664,6 +664,7 @@ class TestAscendMLAImpl(TestBase):
    def test_forward_decode_without_graph(self, mock_page_attention_mla,
                                          mock_up_proj):
        self.impl.running_in_graph = False
+        self.impl.running_chunkprefilll_with_torchair = False
        num_tokens = 100
        num_blocks = 256
        block_size = 4