xc-llm-ascend/vllm_ascend at 17dd9ae42c9f1905af5873577f8344f2c3442b33 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

fems14 17dd9ae42c [0.11.0][bugfix]look up multi_tp key (#3699 ) (#3723 )

### What this PR does / why we need it?
In multi-Tensor Parallel (TP) scenarios, the KV pool only queries the
first GPU card. When keys on other cards are released, the query result
still returns as successful, introducing accuracy issues. This PR
modifies the KV pool's query logic to check all cards, resolving this
problem.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: fems14 <1804143737@qq.com>

2025-10-24 18:22:45 +08:00

..

[v0.11.0][Perf] Eliminating the zerolike operator through patch (#3632 )

2025-10-23 14:49:28 +08:00

[Feat]Make full graph mode compalible with MTP (#3276 )

2025-10-17 20:19:56 +08:00

[Bugfix] Route requests requiring KVC recomputation from the decode instance to the P instance (#3448 )

2025-10-18 15:56:44 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[0.11.0][bugfix]look up multi_tp key (#3699 ) (#3723 )

2025-10-24 18:22:45 +08:00

[CI]Add EPLB CI. (#3568 )

2025-10-21 22:58:02 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

Revert "[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495 )" (#3586 )

2025-10-21 22:24:30 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[BugFix] Check all expert maps when using muilty instance. (#3662 )

2025-10-24 17:10:31 +08:00

[cherry-pick]【main】patch sched_yield (#3648 ) (#3687 )

2025-10-24 00:24:58 +08:00

[v0.11.0][bugfix] Add 'layer_type' param to get_pergroup_param() for compatibility (#3684 )

2025-10-23 21:26:50 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

unify logic between aclgraph and torchair (#3602 )

2025-10-22 21:55:06 +08:00

[BugFix] Check all expert maps when using muilty instance. (#3662 )

2025-10-24 17:10:31 +08:00

[v0.11.0][Fix] Fix attention metadata handling for profiling and MLA (#3636 ) (#3643 )

2025-10-23 10:29:30 +08:00

__init__.py

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432 )

2025-10-15 17:48:58 +08:00

ascend_config.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

ascend_forward_context.py

[Bugfix] fix logging and d2h bug for flash comm1 (#3505 )

2025-10-17 21:13:41 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[Feat] Flash comm allgher ep (#3334 )

2025-10-15 19:36:32 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

Revert "[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495 )" (#3586 )

2025-10-21 22:24:30 +08:00

utils.py

[v0.11.0] cherry-pick Fix performance degradation when mtp>1 (#3597 ) (#3630 )

2025-10-22 22:07:39 +08:00