xc-llm-ascend

Files

linfeng-yuan d452d04656 [A5][bugfix] Fix fused MoE A5 MXFP8 scale normalization, load-balance routing and gating_topk ops (#7573 )

### What this PR does / why we need it?
This PR fixes A5 MXFP8 MoE scale handling in the fused MoE path.

- It normalizes MXFP8 activation scales to the packed 3D layout expected
by A5 kernels, including both precomputed dynamic_scale inputs and gmm1
output scales before they are consumed by downstream grouped matmul ops.
- It also refines the MXFP8 force load-balancing path in profiling runs.
- This PR also enables npu_gating_top_k from torch_npu instead of custom
op when running ascend950 chip.
### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI and E2E serving tests on Ascend950DT passed.

---------

Signed-off-by: linfeng-yuan <1102311262@qq.com>

2026-03-25 17:20:28 +08:00

__init__.py

[Feature]Supports DSv3.1 PD separation and C8 quantization (#7222 )

2026-03-16 22:49:05 +08:00

base.py

[refactor] replace scattered business kwargs with typed request objects and explicit stage boundaries (#7024 )

2026-03-20 23:23:57 +08:00

kv_c8.py

[Bugfix]Fix deepseek 3.2 C8 precision by rotary tensor (#7537 )

2026-03-25 09:18:00 +08:00

registry.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #7 ) (#6023 )