xc-llm-ascend

Files

HongtaoYang 80a4265717 [Feat] Support separate attention backend for target and draft model. (#7342 )

### What this PR does / why we need it?
This PR enables separate attention backend configuration for target and
draft models in speculative decoding, decoupling the previously bound
attention backend settings between the two models.

It solves the compatibility issue where some draft models do not support
the attention backend used by the target model, and allows users to
select the optimal attention backend for each model individually to
maximize inference performance. The change is fully backward compatible.
---------
Signed-off-by: SidaoY <1024863041@qq.com>

2026-03-21 10:48:01 +08:00

test_block_table.py

[UT][PCP&DCP] UT for block_table.py (#5032 )

2026-01-06 11:19:25 +08:00

test_model_runner_v1.py

[Feat] Support separate attention backend for target and draft model. (#7342 )

2026-03-21 10:48:01 +08:00

test_pcp_manager.py

[bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (#6491 )

2026-02-05 10:06:14 +08:00

test_worker_v1.py

[Feat][Worker] NPUWorker Profiler profile_prefix full adaptation (RFC #6954 ) (#6968 )

2026-03-05 16:18:34 +08:00