xc-llm-ascend

Files

lio 9e150e5009 [Refactor] optimize _prepare_inputs method in eagle_proposer (#3296 )

### What this PR does / why we need it?

We optimized the _prepare_input method in eagle_proposer and no longer
use the _prepare_eagle_input_sequential method, improving the
performance of eagle-3.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
```
python3 -m vllm.entrypoints.openai.api_server  
--host 0.0.0.0 
--port 13963
--dtype bfloat16 
--model meta-llama/Llama-3.1-8B-Instruct
--served-model-name Llama-3.1-8B-Instruct 
--tensor-parallel-size 1 
--gpu-memory-utilization 0.85   
--max-model-len  32768 
--trust-remote-code  
--seed 42  
--no-enable-prefix-caching 
--speculative_config '{"method":"eagle3","model":"yuhuili/EAGLE3-LLaMA3.1-Instruct-8B","num_speculative_tokens":2,"draft_tensor_parallel_size":1}'
```

Co-authored-by: QilaiZhang (245706640@qq.com )


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: lio <1983142975@qq.com>

2025-10-25 09:49:42 +08:00

__init__.py

[Refactor] Refactor Spec Decode (#2668 )

2025-09-04 11:34:47 +08:00

eagle_proposer.py

[Refactor] optimize _prepare_inputs method in eagle_proposer (#3296 )

2025-10-25 09:49:42 +08:00

interface.py

[Feat]mtp aclgraph support (#3244 )

2025-10-17 18:14:49 +08:00

mtp_proposer.py

[1/N][Refactor] Refactor code to adapt with vllm main (#3612 )

2025-10-24 16:55:08 +08:00

ngram_proposer.py

[Feat]mtp aclgraph support (#3244 )

2025-10-17 18:14:49 +08:00