Files

linfeng-yuan 068ed706c8 [feat][torchair] support super kernel feat for quantized dsr1 (#3485 )

### What this PR does / why we need it?
Port #1916 and #2157 to master branch to fuse operators in deepseek moe
layers, which can reduce scheduling overhead on devices. Note that this
feature is valid only when `tp_size = 1` and
`multistream_overlap_shared_expert` is enabled with torchair graph mode.

### Does this PR introduce _any_ user-facing change?
Users can enable this feature with `--additional-config
'{"torchair_graph_config":{"enabled":true, "enable_super_kernel":true},
"multistream_overlap_shared_expert":true}'`.

### How was this patch tested?
E2E deepseek serving with 2P1D disaggregated prefill scenarios.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: linfeng-yuan <1102311262@qq.com>

2025-10-20 20:04:37 +08:00

source

[feat][torchair] support super kernel feat for quantized dsr1 (#3485 )

2025-10-20 20:04:37 +08:00

Makefile

[Doc]Add Chinese translation for documentation (#1870 )

2025-07-21 11:26:27 +08:00

README.md

[Doc]Add Chinese translation for documentation (#1870 )

2025-07-21 11:26:27 +08:00

requirements-docs.txt

[Doc]Add Chinese translation for documentation (#1870 )

2025-07-21 11:26:27 +08:00

requirements-test.txt

static EPLB fix bug, add unit test (#1186 )

2025-06-18 19:46:56 +08:00

README.md

vLLM Ascend Plugin documents

Live doc: https://vllm-ascend.readthedocs.io

Build the docs

# Install dependencies.
pip install -r requirements-docs.txt

# Build the docs.
make clean
make html

# Build the docs with translation
make intl

# Open the docs with your browser
python -m http.server -d _build/html/

Launch your browser and open:

English version: http://localhost:8000
Chinese version: http://localhost:8000/zh_CN