xc-llm-ascend/vllm_ascend at 22a1d91cf5f18e373c8bac8fbb1575ffae9724a9 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Yizhou 4536123341 [Fix] Fix mc2_tokens_capacity-related issues (#3411 )

### What this PR does / why we need it?
Replaces the hardcoded `mc2_tokens_capacity` with the max graph capture
size for a more accurate allocation.

This change ensures the capacity is correctly sized relative to the
graph capture configuration, removing a magic number and making the
setup more robust.

This PR fixes two issues:

1. <del>MC2 op restrictions differ between SoCs.</del> @Angazenn This
requires an overhaul, hence removed from this PR, please commit another
PR.
2. The hardcoded value `512` allocates too much buffer for large models.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
Tested in daily checks.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

2025-10-14 10:56:12 +08:00

..

[Bugfix] Fix the abnormal NPU memory usage in full graph mode. (#3331 )

2025-10-11 10:20:10 +08:00

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

[BugFix] Fix ascend scheduler assert error (#3191 )

2025-09-28 18:22:08 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[Feature] mooncake connector support GQA transport (#2947 )

2025-10-13 15:48:37 +08:00

Bugfix: Expose the user policy type interface (#3336 )

2025-10-11 16:28:57 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 )

2025-10-09 14:12:46 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[Feature] optimize sp & qwen3 next support sp. (#3225 )

2025-10-13 23:02:12 +08:00

[feat] support customized and separated hccl_buffer_size for process group initialization (#3073 )

2025-10-11 15:55:22 +08:00

[Feature] Add W4A4 Flat Quantization support (#3427 )

2025-10-13 23:20:16 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

bugfix for mtp (#3300 )

2025-10-09 19:22:46 +08:00

[Feature] optimize sp & qwen3 next support sp. (#3225 )

2025-10-13 23:02:12 +08:00

[Fix] Fix mc2_tokens_capacity-related issues (#3411 )

2025-10-14 10:56:12 +08:00

__init__.py

【bugfix】fix connector register failed (#3335 )

2025-10-09 21:09:54 +08:00

ascend_config.py

Bugfix: Expose the user policy type interface (#3336 )

2025-10-11 16:28:57 +08:00

ascend_forward_context.py

Revert PTA upgrade PR (#3352 )

2025-10-10 14:09:53 +08:00

envs.py

Add DeepSeek V3.2 support (#3270 )

2025-09-30 03:25:58 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

utils.py

[Feat] enable hierarchical communication for mc2 ops on A2 (#3015 )

2025-10-13 16:13:17 +08:00