add pagedattention to support FULL_DECODE_ONLY. (#3102)

### What this PR does / why we need it? Calculate in advance the workspace memory size needed for the PagedAttention operator to avoid deadlocks during resource cleanup. This PR requires torch_npu version 0920 or newer. ### How was this patch tested? - vLLM version: v0.11.0 --------- Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
2025-10-10 08:50:33 +08:00
parent 1c2c72af8d
commit 579b7e5f21
5 changed files with 245 additions and 12 deletions
--- a/.github/workflows/_e2e_test.yaml
+++ b/.github/workflows/_e2e_test.yaml
@@ -173,6 +173,7 @@ jobs:
        if: ${{ inputs.type == 'full' }}
        run: |
          pytest -sv tests/e2e/multicard/test_data_parallel.py
+          pytest -sv tests/e2e/multicard/test_full_graph_mode.py
          pytest -sv tests/e2e/multicard/test_expert_parallel.py
          # external_launcher test is not stable enough. Fix it later
          # pytest -sv tests/e2e/multicard/test_external_launcher.py