xc-llm-ascend

Files

sunchendd 5932abc446 [Bugfix] Fix the Eagle3 inference failure issue. (#4721 )

### What this PR does / why we need it?
Fix the Eagle3 inference failure issue.
error message: "EngineCore encountered an issue. See stack trace (above)
for the root cause."

Fixes https://github.com/vllm-project/vllm-ascend/issues/4323

### How was this patch tested?
`vllm serve /nfs/1_AscendPackage/05_weights_public/Qwen3-32B \
--served-model-name Qwen3-32B \ -tp 4 \ --host "0.0.0.0" \ --port "8000"
\ --trust-remote-code \ --speculative-config
'{"method":"eagle3","model":"/home/scd/qwen3_32b_eagle3/","num_speculative_tokens":4,"draft_tensor_parallel_size":1}'
\ --max-num-batched-tokens 4096 \ --max-model-len 4096`

```
curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen3-32B",
        "prompt": "hi, where is the capital of France?",
        "max_tokens": 10,
        "temperature": 0
    }' | python3 -m json.tool
```

vLLM version: v0.11.0
vLLM-ascend version: v0.11.0rc2

Signed-off-by: 17764591921 <sunchend@outlook.com>

2025-12-12 14:52:29 +08:00

__init__.py

[Refactor] Refactor Spec Decode (#2668 )

2025-09-04 11:34:47 +08:00

eagle_proposer.py

[Bugfix] Fix the Eagle3 inference failure issue. (#4721 )

2025-12-12 14:52:29 +08:00

interface.py

[v0.11.0-dev][CI] Fix ngram lacking of input arg dummy_compute_logits error (#4648 )

2025-12-03 09:22:07 +08:00

mtp_proposer.py

[cherry-pick pr-4254] bugfix for mtp>1 when lm_head_tp>1 (#4360 )

2025-12-01 11:11:15 +08:00

ngram_proposer.py

[v0.11.0-dev][CI] Fix ngram lacking of input arg dummy_compute_logits error (#4648 )

2025-12-03 09:22:07 +08:00