xc-llm-ascend/vllm_ascend at 1bc61031e580ecafc4e95d779cdf5ff747909c71 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Yizhou 1bc61031e5 [v0.11.0][Fix] Cap max tokens to prevent potential OOM (#3720 ) (#3744 )

### What this PR does / why we need it?
Caps the calculated maximum number of tokens at 512.

This prevents allocating an excessively large buffer when a cudagraph
capture size is not specified, mitigating the risk of out-of-memory
errors.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

2025-10-25 15:46:56 +08:00

..

[v0.11.0][Perf] Eliminating the zerolike operator through patch (#3632 )

2025-10-23 14:49:28 +08:00

[Feat]Make full graph mode compalible with MTP (#3276 )

2025-10-17 20:19:56 +08:00

[BugFix][Cherry-pick] Cherry-pick PR 3675 to v0.11.0-dev (#3732 )

2025-10-25 09:41:51 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[0.11.0] cherry-pick from #3747 (#3746 )

2025-10-25 14:21:30 +08:00

[CI]Add EPLB CI. (#3568 )

2025-10-21 22:58:02 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

Revert "[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495 )" (#3586 )

2025-10-21 22:24:30 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[cherry-pick][Feat] Add mrope fusion op#3708 (#3735 )

2025-10-25 11:41:23 +08:00

[cherry-pick]【main】patch sched_yield (#3648 ) (#3687 )

2025-10-24 00:24:58 +08:00

[v0.11.0][bugfix] Add 'layer_type' param to get_pergroup_param() for compatibility (#3684 )

2025-10-23 21:26:50 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

unify logic between aclgraph and torchair (#3602 )

2025-10-22 21:55:06 +08:00

[BugFix] Check all expert maps when using muilty instance. (#3662 )

2025-10-24 17:10:31 +08:00

[v0.11.0][Fix] Cap max tokens to prevent potential OOM (#3720 ) (#3744 )

2025-10-25 15:46:56 +08:00

__init__.py

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432 )

2025-10-15 17:48:58 +08:00

ascend_config.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

ascend_forward_context.py

[Bugfix] fix logging and d2h bug for flash comm1 (#3505 )

2025-10-17 21:13:41 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[Feat] Flash comm allgher ep (#3334 )

2025-10-15 19:36:32 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

Revert "[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495 )" (#3586 )

2025-10-21 22:24:30 +08:00

utils.py

[cherry-pick][Feat] Add mrope fusion op#3708 (#3735 )

2025-10-25 11:41:23 +08:00