xc-llm-ascend/vllm_ascend at 418a43e2a2f853282c4337ebac2455cd8f316b6f - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

ZYang6263 418a43e2a2 [Bugfix] Fix seq_lens reset issue causing performance degradation (#6158 )

### What this PR does / why we need it?
Now `seq_lens` was not being reset correctly after each step due to
missing code that clears the sequence lengths. As a result, when
processing a smaller batch after a larger batch, the `seq_lens` from the
larger batch was still carried over. This caused the attention operator
to compute using an unnecessarily larger sequence length, leading to an
increased computation load and performance degradation.



### Does this PR introduce _any_ user-facing change?


### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: ZYang6263 <zy626375@gmail.com>

2026-01-23 11:29:54 +08:00

..

[CI] optimize lint term (#5986 )

2026-01-22 15:46:59 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

Drop vLLM 0.13.0 support (#6069 )

2026-01-23 09:45:08 +08:00

[Feat] Merge the multi eagle graphs to one graph (#5940 )

2026-01-23 08:37:02 +08:00

[bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (#5132 )

2026-01-21 09:13:52 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[P/D]Add ssl cert for metaserver proxy (#5875 )

2026-01-23 11:11:44 +08:00

[Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (#5755 )

2026-01-19 16:10:43 +08:00

Drop vLLM 0.13.0 support (#6069 )

2026-01-23 09:45:08 +08:00

[BufFix]Fix the error when using Ascend custom operators with rank=128 (#5394 )

2026-01-09 15:57:43 +08:00

[BugFix] NetLoader: No backend type associated with device type npu (#5700 )

2026-01-09 15:54:54 +08:00

Drop vLLM 0.13.0 support (#6069 )

2026-01-23 09:45:08 +08:00

Drop vLLM 0.13.0 support (#6069 )

2026-01-23 09:45:08 +08:00

[BugFix] fix 3vl dense model load quant weight (#6100 )

2026-01-22 20:05:25 +08:00

[Feature] add the magicmtp speculative decoding acceleration algorithm (#5542 )

2026-01-08 09:15:55 +08:00

[Feat] Merge the multi eagle graphs to one graph (#5940 )

2026-01-23 08:37:02 +08:00

[Bugfix] Fix seq_lens reset issue causing performance degradation (#6158 )

2026-01-23 11:29:54 +08:00

[CI] optimize lint term (#5986 )

2026-01-22 15:46:59 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[Feature]refactor the npugraph_ex config, support online-infer with static kernel (#5775 )

2026-01-20 21:31:38 +08:00

ascend_forward_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

batch_invariant.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

cpu_binding.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

envs.py

Default enable MLAPO (#5952 )

2026-01-22 09:26:39 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

platform.py

[Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (#5721 )

2026-01-22 17:22:41 +08:00

profiling_config.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

utils.py

[Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (#5758 )

2026-01-22 10:51:02 +08:00