xc-llm-ascend

Files

Mengqing Cao cc210f46e6 [AscendScheduler][Bugfix] Remove num_draft_tokens while allocating slots (#1718 )

### What this PR does / why we need it?

Now there is no need to calculate `num_draft_tokens` when allocating
slots.

This PR follows the changes in vllm:
https://github.com/vllm-project/vllm/pull/20701

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test






- vLLM version: v0.9.2
- vLLM main:
cc876d0f29

---------

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-07-10 18:47:45 +08:00

e2e

[AscendScheduler][Bugfix] Remove num_draft_tokens while allocating slots (#1718 )

2025-07-10 18:47:45 +08:00

[Bugfix] Fix accuracy problem caused by mask pollution (#1678 )

2025-07-10 14:06:49 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00

conftest.py

[CI/UT] Unify model usage via ModelScope in CI (#1207 )

2025-07-04 10:52:17 +08:00

model_utils.py

[CI] Refactor CI (#952 )

2025-05-28 06:31:35 +08:00

utils.py

[V1][ModelRunner] Support pooling model for v1 engine (#1359 )

2025-06-30 16:31:12 +08:00