Commit Graph

5 Commits

Author SHA1 Message Date
drslark
48ec97821a [Bugfix] Fixed an accuracy problem of sp with eagle3 (#5816)
### What this PR does / why we need it?
Fixed an accuracy problem when using eagle3 with sp.

The problem is described in
https://github.com/vllm-project/vllm-ascend/issues/5825.

It also adds a much more precise way to determine whether drafter should
use `sp` or not.

Also, it changes the `eager` of drafter to be a real `eager` in frontend
to avoid a `fx-graph` problem.

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

For simpilicity, we test it as in
https://github.com/vllm-project/vllm-ascend/issues/5825.

And we get the same result of `eagle3` with `sp` disabled.

```text
--------------------------------------------------
total_num_output_tokens: 1000
num_drafts: 437
num_draft_tokens: 1311
num_accepted_tokens: 564
mean acceptance length: 2.29
--------------------------------------------------
acceptance at token 0: 0.62
acceptance at token 1: 0.40
acceptance at token 2: 0.27
acceptance at token 3: 0.00
acceptance at token 4: 0.00
acceptance at token 5: 0.00
```

* vLLM version: v0.13.0
* vLLM main:
2f4e6548ef

Signed-off-by: drslark <slarksblood@qq.com>
2026-01-14 09:00:37 +08:00
zhaomingyu13
d886b81971 [BugFix] Support setting tp=1 for the Eagle draft model to take effect (#5519)
### What this PR does / why we need it?
According to the official documentation, the parameter
"draft_tensor_parallel_size": 1 is supposed to be applied to the Eagle3
model. However, based on actual debugging, it was found that the number
of tensor parallelisms (tp) of the Eagle model is consistent with that
of the target model. The setting of tp for the draft model did not take
effect as expected.

**Note:** This feature has not been superimposed and tested with `sp`
and `dp`. It will be adapted later
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
```python
from vllm import LLM, SamplingParams

def main():
    prompts = [
        "The future of AI is",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    # Create an LLM.
    llm = LLM(
            model="meta-llama/Llama-3.1-8B-Instruct",
            tensor_parallel_size=4,
            gpu_memory_utilization=0.9,
            enforce_eager=True,
            speculative_config={
                "method": "eagle3",
                "model": "yuhuili/EAGLE3-LLaMA3.1-Instruct-8B"
                "draft_tensor_parallel_size": 1,
                "num_speculative_tokens": 3,
            },
        )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    print(f"Outputs: {outputs}")
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1

Fixes vllm-project/vllm#31345

Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com>
Co-authored-by: drslark <slarksblood@qq.com>
2026-01-13 09:14:30 +08:00
drslark
ccbc5e2ba1 [Feat][Bugfix][main] Adapted SP to eagle3 (#5562)
### What this PR does / why we need it?

Adapted sp to eagle3.

There may still be some problems, e.g., accuracy in some scenes,
`sp`+`dp`...

We will fix them later.

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

We tested it mainly in a new `e2e`.

```shell
pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance
```

```text
.

=============================== warnings summary ===============================
<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============= 3 passed, 1 skipped, 2 warnings in 142.05s (0:02:22) =============
```

It passed.

- vLLM version: v0.13.0
- vLLM main:
7157596103

Signed-off-by: drslark <slarksblood@qq.com>
2026-01-08 15:33:52 +08:00
wangxiyuan
6f7a81cd9f [CI] cleanup single/multi-card test (#5623)
1. speed up e2e light test.
2. create `2-cards` and `4-cards` folder in multicard
3. move ops to nightly
4. run test in Alphabetical Order

- vLLM version: v0.13.0
- vLLM main:
8be6432bda

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-07 14:13:34 +08:00
lilinsiman
46862ce1af [main][test] Refactor the mtp and eagle test case (#5326)
### What this PR does / why we need it?
1. Refactor the current test with mtp and eagle cases
2. Add new necessary cases with mtp and eagle

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: release/v0.13.0
- vLLM main:
5fbfa8d9ef

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2025-12-31 09:22:58 +08:00