xc-llm-ascend

Files

Angazenn ddf4d53ca3 [bugfix] Fix bugs in _dumm_run and re-initialize kv-cache. (#3262 )

### What this PR does / why we need it?
Currently we run an extra profile_run with `num_tokens ==
self.mc2_tokens_capacity`. However, when setting `max_num_batched_tokens
< self.mc2_tokens_capacity`, this will trigger an assertion error that
requires num_tokens in `_dummy_run` to be smaller than
`max_num_batched_tokens`. This PR skips this extra `profile_run` if
`self.max_num_tokens <= self.mc2_tokens_capacity` so as to avoid this
bug.

This PR fixes a bug that `kernel_block_sizes` never equals to
`[self.cache_config.block_size]`. `kernel_block_sizes` is type of
List[List[int]], so the condition should be `kernel_block_sizes !=
[[self.cache_config.block_size]]`. This also helps to resolve a issue
that cpu_offload_gb cannot be enabled.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: Angazenn <supperccell@163.com>

2025-09-30 10:54:14 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007 )

2025-09-18 21:43:22 +08:00

model_runner_v1.py

[bugfix] Fix bugs in _dumm_run and re-initialize kv-cache. (#3262 )

2025-09-30 10:54:14 +08:00

npu_input_batch.py

[CI] Upgrade vLLM to 20250919 (6d8246aa) and fix some broken issue (#2907 )

2025-09-20 17:37:57 +08:00

worker_v1.py

Add DeepSeek V3.2 support (#3270 )

2025-09-30 03:25:58 +08:00