[Attention] add gpt-oss support (#5901)

### What this PR does / why we need it? Please refer to the following link for the historical conversation https://github.com/vllm-project/vllm-ascend/pull/4467. We have made updates in light of the comments from the prior PR review. Given the refactoring of the attention_v1 component, we have carried out necessary adjustments to fit the newly revised code. ### Does this PR introduce _any_ user-facing change? 1. Modified the code in the Attention section to adapt to the SWA and Sink features required by gpt-oss. 2. Modified the code in the MoE section to add support for bias and swigluoai. ### How was this patch tested? Please refer to the https://github.com/vllm-project/vllm-ascend/pull/4467 for performance tests, on the basis of which the accuracy tests from AIME2024 have been newly added. ![img_v3_02tu_501e88e3-2217-4565-8edf-b9acf4f43f2g](https://github.com/user-attachments/assets/024f8283-18ab-4d4d-ab12-27917b5d7d06) - vLLM version: v0.13.0 - vLLM main: bde38c11df --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: mikequan0425 <mikequan0425@foxmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: pu-zhe <zpuaa@outlook.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: luomin2005 <luomin2005@huawei.com> Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: SlightwindSec <slightwindsec@gmail.com> Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: leon_tao <taoyao2@huawei.com> Co-authored-by: nurxat <738457498@qq.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: mikequan <199741451@qq.com> Co-authored-by: LI SHENGYONG <49200266+shenchuxiaofugui@users.noreply.github.com> Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Co-authored-by: pu-zhe <zpuaa@outlook.com> Co-authored-by: luomin2005 <luomin2005@huawei.com> Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: whx <56632993+whx-sjtu@users.noreply.github.com> Co-authored-by: Cao Yi <slightwindsec@gmail.com> Co-authored-by: Icey <1790571317@qq.com> Co-authored-by: SILONG ZENG <2609716663@qq.com>
2026-02-12 10:55:34 +08:00
parent f71812011d
commit 7221045777
8 changed files with 111 additions and 19 deletions
--- a/tests/e2e/multicard/2-cards/test_offline_inference_distributed.py
+++ b/tests/e2e/multicard/2-cards/test_offline_inference_distributed.py
@@ -22,7 +22,6 @@ Run `pytest tests/test_offline_inference.py`.
 """
 import os
 from unittest.mock import patch
-
 import pytest
 from vllm import SamplingParams

@@ -48,6 +47,9 @@ DEEPSEEK_W4A8_MODELS = [
    "vllm-ascend/DeepSeek-V3.1-W4A8-puring",
 ]

+GPT_OSS_MODELS = [
+    "unsloth/gpt-oss-20b-BF16",
+]

 def test_deepseek_multistream_moe_tp2():
    example_prompts = [
@@ -289,3 +291,17 @@ def test_qwen3_w4a4_distributed_tp2(model):
            quantization="ascend",
    ) as vllm_model:
        vllm_model.generate_greedy(example_prompts, max_tokens)
+
+
+@pytest.mark.parametrize("model", GPT_OSS_MODELS)
+def test_gpt_oss_distributed_tp2(model):
+    example_prompts = [
+        "Hello, my name is",
+    ]
+    max_tokens = 5
+    with VllmRunner(
+            model,
+            tensor_parallel_size=2,
+            enforce_eager=True,
+    ) as vllm_model:
+        vllm_model.generate_greedy(example_prompts, max_tokens)