xc-llm-ascend/worker at 52d9086f64efdcf6bbb9327d9c41eed1ee4b74b5 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

zhaomingyu13 52d9086f64 [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (#6914 )

### What this PR does / why we need it?
When using the target model after rotational quantization, the
acceptance rate decreases because the fc weight of the draft model has
not undergone rotational quantization(issue: #6445). We fixed this issue
by performing rotation quantization on the fc weight of the draft model
in the same way as the main model when loading draft model.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com>

2026-03-04 11:29:49 +08:00

..

__init__.py

[Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (#6914 )

2026-03-04 11:29:49 +08:00

patch_bert.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_distributed.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_huanyuan_vl.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_kimi_k25.py

[Bugfix] Support Kimi-K2.5 models (#6755 )

2026-02-25 14:51:46 +08:00

patch_module.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_multimodal_merge.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_npugraph_ex_triton.py

[npugraph_ex]enable npugraph_ex by default (#6664 )

2026-02-12 08:44:06 +08:00

patch_qwen3_next_mtp.py

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

patch_qwen3_next.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_qwen3_quarot.py

[Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (#6914 )

2026-03-04 11:29:49 +08:00

patch_rejection_sampler.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_routed_experts_capturer.py

[Feat] Support routing replay (#6696 )

2026-02-26 10:22:47 +08:00

patch_triton.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_unquantized_gemm.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_v2_eagle.py

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

patch_v2_uva.py

[Feature] adapt to uva buffer and main2main (#6657 )

2026-02-12 10:36:31 +08:00