xc-llm-ascend

Files

Shaoxu Cheng 2064afe380 [300I][Bugfix] fix unquant model weight nd2nz error (#6851 )

### What this PR does / why we need it?
- This PR fixes an issue with weight format conversion for unquantized
models running on Ascend 310P devices.

- The changes refactor the logic for converting weights to the
FRACTAL_NZ format. Previously, this was handled in a 310P-specific
linear layer implementation (`AscendUnquantizedLinearMethod310`). This
implementation has been removed, and the logic is now centralized in the
`maybe_trans_nz` utility function. This function now checks if the
device is a 310P and applies the NZ format cast accordingly for
`float16`/`bfloat16` weights.

- This refactoring simplifies the code by removing platform-specific
duplication and ensures correct weight handling for unquantized models
on 310P.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
ut and local test
- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

2026-03-03 15:57:26 +08:00

__init__.py

[Feat][310p] 310P support w8a8s quantization and saving w8a8sc state (#6878 )

2026-03-02 20:09:15 +08:00

registry.py

[Feat.]: support 310p w8a8 (#6454 )

2026-02-03 14:13:06 +08:00

w8a8_dynamic.py

[Feat] 310p support MoE W8A8 quantizaition (#6641 )

2026-02-10 17:17:44 +08:00

w8a8_static.py

[300I][Bugfix] fix unquant model weight nd2nz error (#6851 )

2026-03-03 15:57:26 +08:00

w8a8s.py

[300I][Bugfix] fix unquant model weight nd2nz error (#6851 )

2026-03-03 15:57:26 +08:00