Main2main Upgrade vllm commit to 0320 17:00 (#7510)

### What this PR does / why we need it? Main2main Upgrade vllm commit to 0320 17:00 1. fix vllm refactored `_moe_forward` to call `runner.forward_impl_chunked()` when `runner.use_dp_chunking` is True. vllm PR:"[MoE Refactor] DefaultMoERunner simplification [#33049](https://github.com/vllm-project/vllm/pull/33049)" 2.fix vllm moved the call to `self._set_compile_ranges()` in `VllmConfig.__post_init__` from **before** `check_and_update_config()` to **after** it (to allow platforms to lower `max_num_batched_tokens` first). vllm PR: "fix(xpu): Re-compute compile ranges after platform-specific config updates" [#37523](https://github.com/vllm-project/vllm/pull/37523) ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.17.0 - vLLM main: 8b6325758c --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>
2026-03-23 21:37:41 +08:00
parent bdd90c0088
commit fcba91a392
8 changed files with 15 additions and 9 deletions
--- a/vllm_ascend/ascend_config.py
+++ b/vllm_ascend/ascend_config.py
@@ -161,7 +161,7 @@ class AscendConfig:

    @staticmethod
    def _get_compile_ranges(compilation_config):
-        return compilation_config.compile_ranges_endpoints
+        return compilation_config.compile_ranges_endpoints or []

    @staticmethod
    def _set_compile_ranges(compilation_config, value):
--- a/vllm_ascend/ops/fused_moe/fused_moe.py
+++ b/vllm_ascend/ops/fused_moe/fused_moe.py
@@ -259,6 +259,12 @@ class AscendMoERunner(DefaultMoERunner):
        else:
            self.moe_forward = torch.ops.vllm.moe_forward_shared

+    @property
+    def use_dp_chunking(self) -> bool:
+        """Ascend uses its own forward_impl path, not the FlashInfer Cutlass
+        chunked path. Always return False to stay on forward_impl."""
+        return False
+
    def forward_impl(
        self,
        layer: torch.nn.Module,