xc-llm-ascend/worker at fe4cad24e9efa97235a5ebff10b62d8a4d981ddc - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

wangbj127 0c659e91ed [MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139 )

### What this PR does / why we need it?
When GLM5 target model uses rotary quant, the final hidden states passes
to MTP need to do an extra rotary.

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: Wangbingjie <wangbj1207@126.com>
Signed-off-by: wangbj127 <256472688+wangbj127@users.noreply.github.com>

2026-03-12 20:01:24 +08:00

..

__init__.py

[MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139 )

2026-03-12 20:01:24 +08:00

patch_bert.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_cudagraph.py

[main][bugfix] Fixed the problem of speculative decoding in FULL mode (#7148 )

2026-03-12 14:51:12 +08:00

patch_deepseek_mtp.py

[MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139 )

2026-03-12 20:01:24 +08:00

patch_distributed.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_draft_quarot.py

[main][feature] Support quarot for eagle3 without embedding (#7038 )

2026-03-09 10:43:06 +08:00

patch_huanyuan_vl.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_kimi_k25.py

[Bugfix] Support Kimi-K2.5 models (#6755 )

2026-02-25 14:51:46 +08:00

patch_minimax_m2_linear_attn.py

[Model] Support Minimax-m2.5 on NPU (#7105 )

2026-03-11 00:12:02 +08:00

patch_minimax_m2.py

[Model] Support Minimax-m2.5 on NPU (#7105 )

2026-03-11 00:12:02 +08:00

patch_module.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_multimodal_merge.py

[bugfix]Fix parameter ordering bug in _merge_multimodal_embeddings (#7068 )

2026-03-09 16:05:52 +08:00

patch_npugraph_ex_triton.py

[npugraph_ex]enable npugraph_ex by default (#6664 )

2026-02-12 08:44:06 +08:00

patch_qwen3_5.py

Add patch_qwen3_5 for triton ops fused_recurrent_gated_delta_rule (#7109 )

2026-03-10 23:28:58 +08:00

patch_qwen3_next_mtp.py

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

patch_qwen3_next.py

[P/D]Mooncake Layerwise Connector supports hybrid attention manager with multiple kvcache groups (#7022 )

2026-03-10 23:59:20 +08:00

patch_rejection_sampler.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_routed_experts_capturer.py

[Feat] Support routing replay (#6696 )

2026-02-26 10:22:47 +08:00

patch_triton.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_unquantized_gemm.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_v2_eagle.py

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

patch_v2_uva.py

[Feature] adapt to uva buffer and main2main (#6657 )

2026-02-12 10:36:31 +08:00