xc-llm-ascend/vllm_ascend at 68d8d20ca249bd9b5c5ca510c08591192aa5c6b6 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

linfeng-yuan 68d8d20ca2 [misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

### What this PR does / why we need it?
`mxfp_compat` only provides dtype/symbol compatibility helpers for
different `torch_npu` versions, but it was placed under
`vllm_ascend.quantization`. Importing it from device/ops paths could
trigger `quantization/__init__.py` and pull in heavy quantization method
dependencies, increasing startup coupling and causing import-cycle risk
(especially on 310P paths).

### Does this PR introduce _any_ user-facing change?
No functional behavior change intended.

### How was this patch tested?
CI passed.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: linfeng-yuan <1102311262@qq.com>

2026-03-02 18:17:01 +08:00

..

[misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

2026-03-02 18:17:01 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Feat] support basic pcp&dcp for qwen3next (#6091 )

2026-02-28 21:44:08 +08:00

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

[BugFix] Support ALL D-Nodes in fullgraph when running MTP in PD (#5472 )

2026-02-26 19:09:05 +08:00

[misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

2026-03-02 18:17:01 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[BugFix][PCP] Fix presion bugs for pcp/dcp in PD disaggregate (#6876 )

2026-03-02 16:11:00 +08:00

[EPLB] Reduce the memory used for heat aggregation (#6729 )

2026-02-24 18:02:24 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

[misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

2026-03-02 18:17:01 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

2026-03-02 18:17:01 +08:00

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

[Refactor][EAGLE] 7/N Merged PCP and disable_padded interface (#6811 )

2026-02-27 16:06:56 +08:00

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

ascend_forward_context.py

add mxfp8 moe quantization (#6670 )

2026-03-02 11:04:06 +08:00

batch_invariant.py

implement batch invariant with ascendc (#6590 )

2026-02-10 14:15:26 +08:00

cpu_binding.py

[Platform] Fix CPU binding logic (#6889 )

2026-03-01 20:30:43 +08:00

envs.py

[MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618 )

2026-02-09 15:38:58 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[300I] support decode-only aclgraph mode (#6849 )

2026-03-02 14:15:14 +08:00