Files
xc-llm-ascend/vllm_ascend/spec_decode
zhaomingyu13 ff63626874 [Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A3B-EAGLE3 (#6138)
### What this PR does / why we need it?
Due to the long-term lack of synchronization with the upstream code, a
problem that led to a decrease in the acceptance rate of the
Qwen3-30B-A3B-EAGLE3 draft model was introduced when fixing the
bug(#5967). Now, synchronize with the upstream and fix this bug
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
```python
from vllm import LLM, SamplingParams

def main():
    prompts = [
        "The future of AI is",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    # Create an LLM.
    llm = LLM(
            model="Qwen/Qwen3-30B-A3B",
            tensor_parallel_size=4,
            gpu_memory_utilization=0.9,
            enforce_eager=True,
            speculative_config={
                "method": "eagle3",
                "model": "AngelSlim/Qwen3-a3B_eagle3"
                "num_speculative_tokens": 3,
            },
        )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    print(f"Outputs: {outputs}")
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com>
Co-authored-by: drslark <slarkblood@qq.com>
2026-01-23 16:12:56 +08:00
..