xc-llm-ascend/vllm_ascend at 9fbd8017c0d1e6e09ac4568ff16931118a56ab12 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Angazenn 9fbd8017c0 [Quantization]300I Duo support w8a8 quantization (#1560 )

### What this PR does / why we need it?
This pr supports w8a8 on 300I Duo platform. The main change is to use
`npu_quant_grouped_matmul_dequant` to replace `npu_grouped_matmul`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
offline inference on 310p runs normally.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: tianyitang <tangtianyi4@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: tianyitang <tangtianyi4@huawei.com>

2025-07-03 22:12:46 +08:00

..

[Bugfix] Add func swap_states to fix MLA attention (#1580 )

2025-07-02 17:42:53 +08:00

[CI] Upgrade vllm to 0.9.1 (#1165 )

2025-06-11 16:33:11 +08:00

[ModelRunner] Use shared CachedRequestData cross request to fix ci (#1546 )

2025-07-02 06:05:21 +08:00

device_allocator

[Build] Add build info (#1386 )

2025-06-27 09:14:43 +08:00

[bugfix] some bugs maybe fail to run (#896 )

2025-06-03 11:07:33 +08:00

[Bugfix] fix import error (#600 )

2025-04-22 08:57:25 +08:00

support pangumoe w8a8c8 and docs (#1477 )

2025-06-28 18:51:07 +08:00

[perf]: support dual-batch overlap(dbo) for deepseek (#941 )

2025-06-07 16:46:58 +08:00

[CI] Fix FusedMoEConfig and input batch failure to recover CI (#1602 )

2025-07-03 18:36:17 +08:00

[CI] Fix FusedMoEConfig and input batch failure to recover CI (#1602 )

2025-07-03 18:36:17 +08:00

[Bugfix] Add func swap_states to fix MLA attention (#1580 )

2025-07-02 17:42:53 +08:00

[Quantization]300I Duo support w8a8 quantization (#1560 )

2025-07-03 22:12:46 +08:00

Spec decode support for V1 Engine (#874 )

2025-05-23 14:25:46 +08:00

[Quantization]300I Duo support w8a8 quantization (#1560 )

2025-07-03 22:12:46 +08:00

__init__.py

[CI] Patch torch.library.infer_schema for fused moe ops to fix CI (#854 )

2025-05-14 19:49:09 +08:00

ascend_config.py

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

envs.py

support fused_moe_allgather_ep (#1335 )

2025-06-23 22:03:38 +08:00

platform.py

support pangumoe w8a8c8 and docs (#1477 )

2025-06-28 18:51:07 +08:00

utils.py

[Quantization]300I Duo support w8a8 quantization (#1560 )

2025-07-03 22:12:46 +08:00