xc-llm-ascend

Files

weijinqian0 6972df5951 [Feature] optimize sp & qwen3 next support sp. (#3225 )

This PR will accomplish the following tasks: 
**optimize SP**
In the old version implementation, the first layer was all_reduce, which
used rms to split chunks. We changed it to perform reduce_scatter on the
embedding side, replace one all_reduce operation and one chunk with one
reduce_scatter operation.
**Support qwen3 next**
Since Qwen3 Next includes a linear attention module, the prefix name of
this module cannot take effect directly.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>

2025-10-13 23:02:12 +08:00

e2e

Add models test and add serval new models yaml (#3394 )

2025-10-12 17:27:50 +08:00

[Feature] optimize sp & qwen3 next support sp. (#3225 )

2025-10-13 23:02:12 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00