xc-llm-ascend

Author	SHA1	Message	Date
realliujiaxu	5d12446573	[Feat][SP] Suport SP for VL MoE models (#7044 ) ### What this PR does / why we need it? 2nd PR for https://github.com/vllm-project/vllm-ascend/issues/5712, extend SP to VL MoE models. ### Does this PR introduce _any_ user-facing change? remove `sp_threshold` in additional config and reuse `sp_min_token_num` from vLLM. ### How was this patch tested? - Model: Qwen3-VL-30B-A3B, - TP4 DP2 - 100 reqs - max concurrency 1 \| Seq length \| Mean TTFT (ms) main \| Mean TTFT (ms) this PR \| \|------------\|---------------------\|------------------------\| \| 4k \| 429.40 \| 323.3 \| \| 16k \| 1297.01 \| 911.74 \| - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>	2026-03-24 17:16:00 +08:00
meihanc	ab9cd2e305	[CI]Add CI summary log (#7202 ) ### What this PR does / why we need it? This PR adds a new CI log summarizer, `ci_log_summary.py`, and wires it into unit-test and e2e workflows so failed jobs publish a structured failure summary to the GitHub step summary. Examples: - `python3 .github/workflows/scripts/ci_log_summary.py --log-file /tmp/unit-test.log --mode ut --step-name "Unit test"` - `python3 .github/workflows/scripts/ci_log_summary.py --run-id 23127187822 --format json` A maintenance note is added to `ci_utils.py` to clarify that the `START` / `PASSED` / `FAILED (exit code X)` log lines are parsed by `ci_log_summary.py`, so any future format changes must be coordinated with the corresponding summarizer regexes. 🤖 Generated with [Codex]<noreply@openai.com> - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: meihanc <jcccx.cmh@gmail.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-19 09:32:06 +08:00
Qiu	38cfcd572a	[doc](cp) correct the prefill of GQA and adjust desc of block table. (#5697 ) ### What this PR does / why we need it? correct the seq length of KV for prefill of GQA and clarify the desc of block table distribution in developer guide. - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` --------- Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>	2026-01-19 18:53:48 +08:00
Qiu	b10ef9b9f3	[docs] Correct image about prefill phase of PCP (#5598 ) ### What this PR does / why we need it? Remove the incorrectly depicted DCP all_gather operation in the prefill stage PCP for GQA diagram. Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>	2026-01-05 20:21:59 +08:00
InSec	7cf65d0581	[Doc]modify the quantization user guide and add a quantization adaptation developer guide (#5554 ) ### What this PR does / why we need it? This PR makes the following modifications: 1.delete the `user_guide/feature_guide/quantization-llm-compressor.md` and merge it into `user_guide/feature_guide/quantization.md`. 2.update the content of `user_guide/feature_guide/quantization.md`. 3.add guidance `developer_guide/feature_guide/quantization.md' on the adaptation of quantization algorithms and quantized models. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `7157596103` --------- Signed-off-by: IncSec <1790766300@qq.com> Signed-off-by: InSec <1790766300@qq.com>	2026-01-05 09:12:11 +08:00
Qiu	da0b113cf5	[doc]<PCP&DCP> add developer guide for PCP&DCP (#5372 ) ### What this PR does / why we need it? add developer guide for PCP&DCP - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-12-26 16:17:38 +08:00
wangxiaoteng888	b1a00e0512	[docs] [P/D] add feature guide for disaggregated-prefill (#3950 ) ### What this PR does / why we need it? add feature guide for disaggregated-prefill ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by ci - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng888 <56506195+wangxiaoteng888@users.noreply.github.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-11-10 09:31:30 +08:00
offline893	5cff3069f4	[Doc]Add developer guide of eplb. (#3759 ) ### What this PR does / why we need it? Add developer guide of eplb - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-11-05 18:35:41 +08:00
Li Wang	7f73c28a24	[CI][Doc] Optimize multi-node CI (#3565 ) ### What this PR does / why we need it? This pull request mainly do the following things: 1. Add a doc for multi-node CI, The main content is the mechanism principle and how to contribute 2. Simplify the config yaml for more developer-friendly 3. Optimized the mooncake installation script to prevent accidental failures during installation 4. Fix the workflow to ensure the kubernetes can be apply correctly 5. Add Qwen3-235B-W8A8 disaggregated_prefill test 6. Add GLM-4.5 multi dp test 7. Add 2p1d 4nodes disaggregated_prefill test 8. Refactor nightly tests ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: `17c540a993` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-25 09:23:47 +08:00
Li Wang	bf84f2dbfa	[Doc] Support kimi-k2-w8a8 (#2162 ) ### What this PR does / why we need it? In fact, the kimi-k2 model is similar to the deepseek model, and we only need to make a few changes to support it. what does this pr do: 1. Add kimi-k2-w8a8 deployment doc 2. Update quantization doc 3. Upgrade torchair support list ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: `9edd1db02b` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-08-06 19:28:47 +08:00
Li Wang	0c4aa2b4f1	[Doc] Add multi node data parallel doc (#1685 ) ### What this PR does / why we need it? add multi node data parallel doc ### Does this PR introduce _any_ user-facing change? add multi node data parallel doc ### How was this patch tested? - vLLM version: v0.9.1 - vLLM main: `805d62ca88` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-10 09:36:37 +08:00

11 Commits