xc-llm-ascend/vllm_ascend at 76d0ba4342c6ae91f802aa10ee17cca47330f2ec - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

zouyida2052 2b4f7a5016 [cherry-pick pr-4254] bugfix for mtp>1 when lm_head_tp>1 (#4360 )

### What this PR does / why we need it?
Previously, the dummy run executed compute_logits only once, regardless
of num_speculative_tokens. This caused execute_model to hang on
compute_logits when lm head tensor parallelism exceeded 1. The fix
ensures compute_logits executes correctly during dummy run, matching
num_speculative_tokens.

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

2025-12-01 11:11:15 +08:00

..

For nz unset in bf16&fp16 (#4495 )

2025-11-28 17:32:25 +08:00

[Bugfix][Aclgraph] failed to update graph task (#4282 )

2025-11-19 21:30:48 +08:00

[BugFix][Cherry-pick] Cherry-pick PR 3675 to v0.11.0-dev (#3732 )

2025-10-25 09:41:51 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[Cherry-pick] [0.11.0] pd proxy support ipv6 and fix proxy (#4242 )

2025-11-18 16:33:00 +08:00

[bugfix] dep ineffective (#4416 )

2025-11-29 15:19:11 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

For nz unset in bf16&fp16 (#4495 )

2025-11-28 17:32:25 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[bugfix] dep ineffective (#4416 )

2025-11-29 15:19:11 +08:00

[Cherry-pick][0.11.0] Adapted to torch_npu.npu_fused_infer_attention_score (#4202 )

2025-11-17 10:56:23 +08:00

[v0.11.0-dev][Bugfix][cherry-pick]bugfix for weight load of kimi-k2 (#4190 )

2025-11-14 15:43:22 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

[cherry-pick pr-4254] bugfix for mtp>1 when lm_head_tp>1 (#4360 )

2025-12-01 11:11:15 +08:00

[bugfix] dep ineffective (#4416 )

2025-11-29 15:19:11 +08:00

[cherry-pick pr-4254] bugfix for mtp>1 when lm_head_tp>1 (#4360 )

2025-12-01 11:11:15 +08:00

__init__.py

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432 )

2025-10-15 17:48:58 +08:00

ascend_config.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

ascend_forward_context.py

Revert "[cherry-pick][refactor]support gatingtopk operator generalization (#4050 )" (#4352 )

2025-11-21 23:03:20 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[Cherry-pick] [0.11.0] pd proxy support ipv6 and fix proxy (#4242 )

2025-11-18 16:33:00 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[MM][Bugfix] Add error log for VL models when enabling FLASHCOMM (#4222 )

2025-11-21 15:04:35 +08:00

utils.py

For nz unset in bf16&fp16 (#4495 )

2025-11-28 17:32:25 +08:00