xc-llm-ascend/vllm_ascend at 5ffae031560ef7441f73a2e2fcc272657f4e7c77 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

starmountain1997 5ffae03156 [bugfix] fix capture shape in sp_eagle_fullgraph (#6846 )

### What this PR does / why we need it?

This was meant to be merged in #6536, but I accidentally restored a
commit. You can find the relevant discussion
[here](https://github.com/vllm-project/vllm-ascend/pull/6536#issuecomment-3882883471).

Since `self.pass_config.enable_sp` is forcibly set to `False` in the
[source
code](f176443446/vllm/config/compilation.py (L1066)),
this section will no longer verify whether the generated cudagraph
shapes are multiples of both `uniform_decode_query_len`
(`num_speculative_tokens + 1`) and `tensor_parallel_size`.

This PR enables the `num_speculative_tokens + 1` and
`tensor_parallel_size` check upfront. Therefore, it won't silently round
up the `cudagraph_size` and throw a cryptic error for the user.

A typical example of this cryptic error looks like:
```
ValueError: could not broadcast input array from shape (196,) into shape (14,)
```

### Does this PR introduce _any_ user-facing change?

no.

### How was this patch tested?

Have passed all test.

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
Signed-off-by: guozr <guozr1997@hotmail.com>
Co-authored-by: lilinsiman <lilinsiman@gmail.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: guozr <guozr1997@hotmail.com>

2026-02-28 17:30:02 +08:00

..

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Refactor][EAGLE] 7/N Merged PCP and disable_padded interface (#6811 )

2026-02-27 16:06:56 +08:00

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

[BugFix] Support ALL D-Nodes in fullgraph when running MTP in PD (#5472 )

2026-02-26 19:09:05 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[EPLB] Reduce the memory used for heat aggregation (#6729 )

2026-02-24 18:02:24 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

[Refactor][EAGLE] 7/N Merged PCP and disable_padded interface (#6811 )

2026-02-27 16:06:56 +08:00

[bugfix] fix capture shape in sp_eagle_fullgraph (#6846 )

2026-02-28 17:30:02 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[Feat]support sequence parallelism by pass for VL models (#5632 )

2026-02-27 08:27:41 +08:00

ascend_forward_context.py

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

batch_invariant.py

implement batch invariant with ascendc (#6590 )

2026-02-10 14:15:26 +08:00

cpu_binding.py

[Refactor] Modify the binding logic, added memory migration and interrupt core binding functions. (#6785 )

2026-02-26 08:49:50 +08:00

envs.py

[MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618 )

2026-02-09 15:38:58 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

Support platform.get_device_uuid function (#6777 )

2026-02-28 14:17:12 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[bugfix] Fixed an accuracy problem of gdn layer in graph (#6822 )

2026-02-28 08:57:53 +08:00