xc-llm-ascend

Files

anon189Ty 07e39620ea [Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

### What this PR does / why we need it?
Currently, when executing to the Linear layer of models in vLLM-Ascend,
the weights format is ND in unquantized case and skipped ascend case.
This PR supplements the execution logic for Linear layer. We use a new
global variable: VLLM_ASCEND_ENABLE_NZ. When VLLM_ASCEND_ENABLE_NZ=1 and
CANN version is 8.3, the weights of the Linear layer will be converted
to FRACTAL_NZ, in both unquantized case and skipped ascend case. We also
use VLLM_ASCEND_ENABLE_NZ to control the existing NZ conversion, such as
w8a8-quantized case.

### Does this PR introduce _any_ user-facing change?
Add a new global variable VLLM_ASCEND_ENABLE_NZ. If you want to use NZ
format, you should set VLLM_ASCEND_ENABLE_NZ=1.

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>

2025-10-14 17:39:26 +08:00

layers

Add DeepSeek V3.2 support (#3270 )

2025-09-30 03:25:58 +08:00

__init__.py

[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 )

2025-10-09 14:12:46 +08:00

deepseek_mtp.py

[KVCache][Bugfix] Fix kv cache initialization error of attention layer (#3113 )

2025-09-24 11:32:34 +08:00

deepseek_v2.py

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )