xc-llm-ascend

Files

realliujiaxu b154a8e22c [Bugfix] fix logging and d2h bug for flash comm1 (#3505 )

### What this PR does / why we need it?

Fix 3 bugs in flash comm1 of Allgather
EP(https://github.com/vllm-project/vllm-ascend/pull/3334):
1. call `enable_sp()` with argument `vllm_config` trigger a lot of
warning log, this PR caches its return value.
2. `num_tokens_after_padding` should be cpu tensor as it will used as
`num_tokens_across_dp_cpu` in `DPMetadata`. It will causes may d2h copy
when running model.
3. In PD, model runner will execute `kv_connector_no_forward`，where
`num_tokens` is None

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

2025-10-17 21:13:41 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007 )

2025-09-18 21:43:22 +08:00

model_runner_v1.py

[Bugfix] fix logging and d2h bug for flash comm1 (#3505 )

2025-10-17 21:13:41 +08:00

npu_input_batch.py

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

worker_v1.py

[Feat]Make full graph mode compalible with MTP (#3276 )

2025-10-17 20:19:56 +08:00