xc-llm-ascend/vllm_ascend at 44a4ff6960b9d4edbfd8df52695cdb1655009f39 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

drslark 44a4ff6960 [main][BugFix] Avoided a bug of torch_npu.npu_mm_reduce_scatter_base when sp size >= 16 (#6168 )

### What this PR does / why we need it?
If `sp` is enabled and `tp_size` >= 16,
`torch_npu.npu_mm_reduce_scatter_base` will raises a exception.
After consulting with the operator developer, we learned that the
operator does not work when `tp` = 16.
So, we disable the operator when `tp` = 16.

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested

We started a server with `sp` enabled and `tp` = 16.

It started successfully.

```text
[0;36m(APIServer pid=1855938)[0;0m INFO:     Started server process [1855938]
[0;36m(APIServer pid=1855938)[0;0m INFO:     Waiting for application startup.
[0;36m(APIServer pid=1855938)[0;0m INFO:     Application startup complete.
```

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: drslark <slarksblood@qq.com>

2026-01-23 21:12:23 +08:00

..

[CI] optimize lint term (#5986 )

2026-01-22 15:46:59 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Refactor] Quantization Module Refactor (#5738 )

2026-01-23 14:13:47 +08:00

[Feat] Merge the multi eagle graphs to one graph (#5940 )

2026-01-23 08:37:02 +08:00

[bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (#5132 )

2026-01-21 09:13:52 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[Bugfix]KV pool rank 0 consumes more HBM (#6113 )

2026-01-23 19:47:33 +08:00

[EPLB][Bugfix]Reduce unnecessary video memory usage (#6020 )

2026-01-23 14:21:13 +08:00

Drop vLLM 0.13.0 support (#6069 )

2026-01-23 09:45:08 +08:00

[BufFix]Fix the error when using Ascend custom operators with rank=128 (#5394 )

2026-01-09 15:57:43 +08:00

[BugFix] NetLoader: No backend type associated with device type npu (#5700 )

2026-01-09 15:54:54 +08:00

[feature] add_rms_norm support bias (#5790 )

2026-01-23 21:09:54 +08:00

Drop vLLM 0.13.0 support (#6069 )

2026-01-23 09:45:08 +08:00

[Refactor] Quantization Module Refactor (#5738 )

2026-01-23 14:13:47 +08:00

[Feature] add the magicmtp speculative decoding acceleration algorithm (#5542 )

2026-01-08 09:15:55 +08:00

[Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A3B-EAGLE3 (#6138 )

2026-01-23 16:12:56 +08:00

[EPLB][Bugfix]Reduce unnecessary video memory usage (#6020 )

2026-01-23 14:21:13 +08:00

[CI] optimize lint term (#5986 )

2026-01-22 15:46:59 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[bugfix] align max_num_batched_tokens with tp*pcp when using FLASHCOMM1 (#6000 )

2026-01-23 14:19:49 +08:00

ascend_forward_context.py

[main][BugFix] Avoided a bug of torch_npu.npu_mm_reduce_scatter_base when sp size >= 16 (#6168 )

2026-01-23 21:12:23 +08:00

batch_invariant.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

cpu_binding.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

envs.py

Default enable MLAPO (#5952 )

2026-01-22 09:26:39 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

platform.py

[Refactor] Quantization Module Refactor (#5738 )

2026-01-23 14:13:47 +08:00

profiling_config.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

utils.py

[Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (#5758 )

2026-01-22 10:51:02 +08:00