xc-llm-ascend

Files

Zhijun Chen 463910e686 [Bugfix] use module-level import for patched function in Qwen3Next (#4354 )

### What this PR does / why we need it?

**Problem**: The Qwen3Next model implementation currently imports
chunk_gated_delta_rule directly using `from ... import ...`

In frameworks like `verl`, the model file is often imported before
`vllm-ascend` initializes and applies its patches. This causes the model
to permanently hold a reference to the original (unpatched) vLLM kernel,
resulting in execution errors on Ascend devices even if the patch runs
later.

**Solution**: Changed the import style to `from vllm...ops import chunk`
and call `chunk.chunk_gated_delta_rule().`

This ensures that the function lookup happens at runtime (dynamic
dispatch), allowing the model to correctly pick up the patched function
regardless of import order.

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

Signed-off-by: zjchenn <zjchenn@gmail.com>

2025-11-25 20:15:43 +08:00

layers

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

__init__.py

[Feat] Adapted mtp function to Qwen3-next (#3918 )

2025-11-07 16:39:03 +08:00

qwen2_5_vl_without_padding.py

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

qwen2_5_vl.py

Drop 0.11.0 support (#4377 )