xc-llm-ascend/vllm_ascend at ace300a54908cfa068acbfff231bc61e2a8ef6dc - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

无脸男 ace300a549 [Bugfix] Fix the abnormal NPU memory usage in full graph mode. (#3331 )

### What this PR does / why we need it?

In the full graph mode, since paged attention operators updates are
required, the parameters of this operators needs to be retained.
However, the tensor such as query、key cache、value cache, does not need
to be persistently saved, and we can manually release this space by
`weak_ref_tensor` to save the memory.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: WithHades <244036962@qq.com>

2025-10-11 10:20:10 +08:00

..

[Bugfix] Fix the abnormal NPU memory usage in full graph mode. (#3331 )

2025-10-11 10:20:10 +08:00

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

[BugFix] Fix ascend scheduler assert error (#3191 )

2025-09-28 18:22:08 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

【bugfix】fix connector register failed (#3335 )

2025-10-09 21:09:54 +08:00

FlashLB algorithm (#3042 )

2025-09-23 10:27:14 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 )

2025-10-09 14:12:46 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[Bugfix] Fix weight prefetching AssertionError in W8A8 MTP scene (#3361 )

2025-10-11 09:24:02 +08:00

[Bugfix]modify the enable range of _merge_multimodal_embeddings patch (#3360 )

2025-10-11 08:37:07 +08:00

[Bugfix] Fix weight prefetching AssertionError in W8A8 MTP scene (#3361 )

2025-10-11 09:24:02 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

bugfix for mtp (#3300 )

2025-10-09 19:22:46 +08:00

bugfix for mtp when running torchair in a2 (#3354 )

2025-10-10 23:07:24 +08:00

[Bugfix] Optimized exception throwing when stream captures exception (#3322 )

2025-10-10 17:09:28 +08:00

__init__.py

【bugfix】fix connector register failed (#3335 )

2025-10-09 21:09:54 +08:00

ascend_config.py

[1/N][Feat] Add weight prefetch feature for Attention layers (#3146 )

2025-10-09 20:38:39 +08:00

ascend_forward_context.py

Revert PTA upgrade PR (#3352 )

2025-10-10 14:09:53 +08:00

envs.py

Add DeepSeek V3.2 support (#3270 )

2025-09-30 03:25:58 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

utils.py

[Perf]Add YaRN custom op (#3355 )

2025-10-11 08:36:20 +08:00