xc-llm-ascend/vllm_ascend at 6f04b467deb351bc648df5386c0e8f218c7ddfa4 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

xuyexiong 79821106e6 [BugFix]Fix mtp torchair bug caused by #2719 (#3566 )

### What this PR does / why we need it?
Fix mtp tochair bug cuased by #2719
Since FIA need extra space for padding, we need to enforce
`self.max_num_seqs > self.scheduler_config.max_num_seqs` in KV consumer
+ MTP
This means that, `self.max_num_seqs` **>** the actual maximum requests
(`self.scheduler_config.max_num_seqs`)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: xuyexiong <xuyexiong@huawei.com>

2025-10-21 22:21:44 +08:00

..

[Feature] Reduce host memory usage for attention mask generation (#3048 )

2025-10-21 20:19:04 +08:00

[Feat]Make full graph mode compalible with MTP (#3276 )

2025-10-17 20:19:56 +08:00

[Bugfix] Route requests requiring KVC recomputation from the decode instance to the P instance (#3448 )

2025-10-18 15:56:44 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

Mooncake store use adxl inferface (#3350 )

2025-10-21 20:18:17 +08:00

[BugFix]Support redundant experts in EPLB (#3473 )

2025-10-18 00:09:16 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

[Model][2/N] Remove deepseek_mtp modeling. (#3561 )

2025-10-21 20:17:09 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[BugFix][mian] Fixed a triton kernel bug of layer_norm_fwd_kernel for Qwen3-next (#3549 )

2025-10-21 20:20:57 +08:00

[Refactor] refactor patch module (#3555 )

2025-10-21 20:19:46 +08:00

[Feat][quantization] Support new version w4a8 dynamic quantization for Linear layers (#3311 )

2025-10-21 20:18:39 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

[Model][2/N] Remove deepseek_mtp modeling. (#3561 )

2025-10-21 20:17:09 +08:00

[BugFix]Fix mtp torchair bug caused by #2719 (#3566 )

2025-10-21 22:21:44 +08:00

[BugFix]Fix mtp torchair bug caused by #2719 (#3566 )

2025-10-21 22:21:44 +08:00

__init__.py

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432 )

2025-10-15 17:48:58 +08:00

ascend_config.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

ascend_forward_context.py

[Bugfix] fix logging and d2h bug for flash comm1 (#3505 )

2025-10-17 21:13:41 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[Feat] Flash comm allgher ep (#3334 )

2025-10-15 19:36:32 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[Chore] Prevents use of ASCEND_LAUNCH_BLOCKING with ACL Graph (#3574 )

2025-10-21 20:17:33 +08:00

utils.py

Revert "Add mrope op fusion (#3509 )" (#3562 )

2025-10-20 20:19:24 +08:00