xc-llm-ascend

Files

zhaomingyu13 cb42564942 [BugFix] Fix eagle3 accuracy problem when enforce_eager=True (#4521 )

### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
def main():
    prompts = [
        "The future of AI is",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    # Create an LLM.
    llm = LLM(
            model="meta-llama/Llama-3.1-8B-Instruct",
            tensor_parallel_size=1,
            speculative_config={
                "method": "eagle3",
                "model": "yuhuili/EAGLE3-LLaMA3.1-Instruct-8B"
                "num_speculative_tokens": 3
            },
            enforce_eager=True,
        )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    print(f"Outputs: {outputs}")
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-12-06 17:31:26 +08:00

310p

[CI] drop ascend scheduler test (#4582 )

2025-12-01 20:33:50 +08:00

doctests

[Doc] Recover installation doc to use pip install (#4109 )

2025-11-11 09:25:44 +08:00

models

[Test] Add accuracy nightly test for new models (#4262 )

2025-12-01 22:28:46 +08:00

multicard

[Model] Add qwen3Next support in Main (#4596 )