xc-llm-ascend

Files

zhaomingyu13 cb42564942 [BugFix] Fix eagle3 accuracy problem when enforce_eager=True (#4521 )

### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
def main():
    prompts = [
        "The future of AI is",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    # Create an LLM.
    llm = LLM(
            model="meta-llama/Llama-3.1-8B-Instruct",
            tensor_parallel_size=1,
            speculative_config={
                "method": "eagle3",
                "model": "yuhuili/EAGLE3-LLaMA3.1-Instruct-8B"
                "num_speculative_tokens": 3
            },
            enforce_eager=True,
        )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    print(f"Outputs: {outputs}")
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-12-06 17:31:26 +08:00

__init__.py

[Feature] Integrate Suffix Spec Decoding (#4045 )

2025-12-01 18:41:42 +08:00

eagle_proposer.py

[BugFix] Fix eagle3 accuracy problem when enforce_eager=True (#4521 )

2025-12-06 17:31:26 +08:00

interface.py

upgrade vLLM to main (#4608 )

2025-12-02 22:10:52 +08:00

mtp_proposer.py

support async mtp (#4511 )