xc-llm-ascend

Files

zouyida2052 2b4f7a5016 [cherry-pick pr-4254] bugfix for mtp>1 when lm_head_tp>1 (#4360 )

### What this PR does / why we need it?
Previously, the dummy run executed compute_logits only once, regardless
of num_speculative_tokens. This caused execute_model to hang on
compute_logits when lm head tensor parallelism exceeded 1. The fix
ensures compute_logits executes correctly during dummy run, matching
num_speculative_tokens.

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

2025-12-01 11:11:15 +08:00

__init__.py

[Refactor] Refactor Spec Decode (#2668 )

2025-09-04 11:34:47 +08:00

eagle_proposer.py

[cherry-pick pr-4254] bugfix for mtp>1 when lm_head_tp>1 (#4360 )

2025-12-01 11:11:15 +08:00

interface.py

[Feat]mtp aclgraph support (#3244 )

2025-10-17 18:14:49 +08:00

mtp_proposer.py

[cherry-pick pr-4254] bugfix for mtp>1 when lm_head_tp>1 (#4360 )

2025-12-01 11:11:15 +08:00

ngram_proposer.py

[0.11.0][Bugfix] Fix ngram precision issue and open e2e ngram test (#4092 )

2025-11-11 09:58:03 +08:00