xc-llm-ascend

Author	SHA1	Message	Date
Yizhou	638dbcdb32	[Perf] Remove D2H operations to imporve performance (#4063 ) ### What this PR does / why we need it? Replace masked in-place assignment with a device-side torch.where so selection stays on-device, allowing subsequent device ops to be enqueued earlier and removing an implicit D2H sync, reducing latency by several hundreds μs on Ascend. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-11-12 09:08:55 +08:00
thonean	e38fe92f40	[Misc][Doc] Add service profiling feature with user guide (#3756 ) ### What this PR does / why we need it? To support the data collection capabilities of the msServiceProfiler on vLLM-ascned framework and enable customization of data collection points via configuration file, a default profiling configuration has been added to vllm-ascend, facilitating debugging and optimization for developers and users. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: minghangc <29514143@qq.com>	2025-11-12 09:07:14 +08:00
Canlin Guo	1c677c3b87	[Test][Accuracy] Add accuracy evaluation config for InternVL3_5-8B (#3964 ) ### What this PR does / why we need it? To continuously monitor the accuracy of the InternVL3_5-8B model, this PR adds the corresponding configuration file to the CI. We need to add the `-hf` suffix to avoid incompatibility with the `lm-eval` preprocessor. ### How was this patch tested? `pytest -sv ./tests/e2e/models/test_lm_eval_correctness.py --config ./tests/e2e/models/configs/InternVL3_5-8B.yaml` - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-12 09:05:55 +08:00
zzhxxx	46a41b26d3	oproj TP support acl graph (#4073 ) ### What this PR does / why we need it? Reference #2167 and orpoj TP supports ACL graph. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: zzhx1 <zzh_201018@outlook.com>	2025-11-11 19:39:06 +08:00
jiangyunfan1	0e6e08e939	[TEST]Update nightly cases and add mtpx (#4111 ) ### What this PR does / why we need it? This PR updates some nightly test cases and adds mtpx cases, we need to test them daily ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-11 17:39:58 +08:00
Li Wang	9cc42226d5	[CI] Integrate mooncake to vllm-ascend base image (#4062 ) ### What this PR does / why we need it? This patch aims to integrate the mooncake [v0.3.7.2.post2](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2) to vllm-ascend images - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-11 15:51:16 +08:00
wangxiyuan	f811a24bf0	Remove VLLM_USE_V1 (#4086 ) Drop VLLM_USE_V1 usage. This env has been removed from vLLM already. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 15:43:39 +08:00
zhangxinyuehfad	d5567680a2	[Fixbug] Fix ut test (#4116 ) ### What this PR does / why we need it? Fix ut test：pytest<9.0.0 test_models_distributed_Qwen3_NEXT_MTP_TP4_SIMILARITY failed by https://github.com/vllm-project/vllm-ascend/pull/3967, skip it now, and fix it later. test ok :https://github.com/vllm-project/vllm-ascend/actions/runs/19255274573/job/55048851066?pr=4116 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-11 15:31:00 +08:00
zhangxinyuehfad	fae1c59a79	[Fix] Refactor and fix dist test to e2e full test (#3808 ) ### What this PR does / why we need it? Fix ci test on A3 1. delete lables 2. fix filter yaml file name 3. refactor dist test to e2e full test 4. skip test_models_distributed_Qwen3_MOE_TP2_WITH_EP & test_models_distributed_Qwen3_MOE_W8A8_WITH_EP because of https://github.com/vllm-project/vllm-ascend/issues/3895 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-11 10:36:05 +08:00
zhangxinyuehfad	b77b4f1abf	[Test] Add nightly test for DeepSeek-V3.2-Exp (#3908 ) ### What this PR does / why we need it? Add nightly test for DeepSeek-V3.2-Exp ### How was this patch tested? test action： https://github.com/vllm-project/vllm-ascend/actions/runs/19156153634/job/54757008557?pr=3908 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-11 10:29:57 +08:00
Yikun Jiang	e384755ce1	[Doc] Recover installation doc to use pip install (#4109 ) ### What this PR does / why we need it? Use pip installation in installation doc and change related doctest to validate. ### Does this PR introduce _any_ user-facing change? No, doc only ### How was this patch tested? Doctest related CI passed - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-11-11 09:25:44 +08:00
Apocalypse	71866d5311	[feature] chunkprefill support pcp&dcp (#3801 ) ### What this PR does / why we need it? ChunkPrefill now can support Long Sequence Feature Pcp&Dcp ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI tests passed with self-test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: Apocalypse990923-qshi <qiushixu@usc.edu> Signed-off-by: Delphine-Nic <tanwenqin@huawei.com> Co-authored-by: Delphine-Nic <tanwenqin@huawei.com> Co-authored-by: Delphine-Nic <3834144971@qq.com>	2025-11-11 09:18:02 +08:00
zhaomingyu13	7ffbe73d54	[main][Bugfix] Fix ngram precision issue and open e2e ngram test (#4090 ) ### What this PR does / why we need it? Fix ngram precision issue and open e2e ngram test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com> Co-authored-by: Icey <1790571317@qq.com>	2025-11-11 09:06:24 +08:00
wangxiyuan	64220c68c5	[Doc] Add release note for v0.11.0rc1 (#3931 ) Add release note for v0.11.0rc1. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-10 21:01:50 +08:00
Icey	e04a87f4be	[BugFix] Fixes Qwen3-Next enable nz accuracy problem (#4058 ) ### What this PR does / why we need it? - Fixes Qwen3-Next enable nz accuracy problem ### Does this PR introduce _any_ user-facing change? N/A - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: wxsIcey <1790571317@qq.com>	2025-11-10 20:54:57 +08:00
22dimensions	e6625bb582	[Doc] add qwen3 w4a4 tutorial (#4076 ) ### What this PR does / why we need it? v0.11.0rc1 will introduce w4a4 quantization feature, so add this tutorial. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-11-10 20:30:07 +08:00
rjg-lyh	a1558b99c2	[Core] Restore scheduling logic under default configuration (#3967 ) ### What this PR does / why we need it? This PR reverts the changes introduced in PR #2894 Initially, due to performance issues with the older version of the chunked prefill ops, the default behavior was to use the Ascend scheduler to disable the chunked prefill feature. However, with the improvements in the performance of the new chunked prefill ops, this interception strategy has been removed. This change also aligns with the community's default configuration behavior. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: rjg-lyh <1318825571@qq.com>	2025-11-10 17:48:56 +08:00
herizhen	75c3f9a780	[Typo] LLama has been changed to Llama (#4089 ) ### What this PR does / why we need it? First-generation model:uses"LLama",subsequent models use"Llama" The second"L"here should be lowercase.Other instances of "LLama"on this page should be corrected accordingly ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-11-10 16:22:52 +08:00
zhangxinyuehfad	d40ba52454	[Fix] fix Qwen2-Audio-7B-Instruct accuracy test (#4017 ) ### What this PR does / why we need it? fix Qwen2-Audio-7B-Instruct accuracy test ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-10 11:54:18 +08:00
Canlin Guo	de49fb3deb	[Feature][Build] Upgrade the minimum version to 3.10 (#3926 ) ### What this PR does / why we need it? Closes #3728, #3657. The main branch is now aligned with the vllm `releases/v0.11.1` branch, which no longer supports `Python 3.9`. Check it [here](https://github.com/vllm-project/vllm/blob/releases/v0.11.1/pyproject.toml). ### Does this PR introduce _any_ user-facing change? The newest version of vllm-ascend don't support Python 3.9. ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-10 11:50:12 +08:00
Levi	0a62e671fb	[Feat] flashcomm_v2 optim solution (#3232 ) ### What this PR does / why we need it? Supports generalized FlashComm2 optimization, which reduces communication overhead, decreases RmsNorm computation, and saves one AllGather step by replacing Allreduce operations in the Attention module with pre-AlltoAll and post-AllGather operations (used in combination with FlashComm1). This feature is enabled during the Prefill phase and is recommended to be used together with FlashComm1, delivering broad performance improvements, especially in long sequence scenarios with large tensor parallelism (TP) configurations. Benchmark tests show that under TP16DP1 configuration, it can improve the prefill performance of the DeepSeek model by 8% on top of FlashComm1. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: zzhxx <2783294813@qq.com> Signed-off-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: zzhxx <2783294813@qq.com>	2025-11-10 11:01:45 +08:00
wangxiaoteng888	b1a00e0512	[docs] [P/D] add feature guide for disaggregated-prefill (#3950 ) ### What this PR does / why we need it? add feature guide for disaggregated-prefill ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by ci - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng888 <56506195+wangxiaoteng888@users.noreply.github.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-11-10 09:31:30 +08:00
zhangyiming	a74e76b02d	[Doc] Remove extra MLAPO installation step for DeepSeek-V3.2. (#4024 ) ### What this PR does / why we need it? Remove extra MLAPO installation step for DeepSeek-V3.2. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: menogrey <1299267905@qq.com>	2025-11-10 09:09:59 +08:00
jiangyunfan1	c116524379	[TEST]Add qwen3-235b-w8a8 and qwen3-30b-w8a8 nightly test (#3973 ) ### What this PR does / why we need it? This PR adds some qwen3-235b-w8a8 cases qwen3-30b-w8a8 cases, we need test them daily ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-08 18:49:28 +08:00
lilinsiman	a3ff765c65	[Info][main] Corrected the errors in the information (#4055 ) ### What this PR does / why we need it? Corrected the errors in the information ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-08 18:48:59 +08:00
weiguihua2	1d7cb5880a	[Bugfix]fix pcp dcp attn aclgraph (#4066 ) ### What this PR does / why we need it? In the DCP-PCP graph mode scenario, there is a shape issue with multiple batches. This PR fixes this problem. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-11-08 18:47:12 +08:00
hucong	48094148f8	[BugFix] Improve the performance of prefixcache features (#4022 ) ### What this PR does / why we need it? The code bug caused an empty bubble. When the npu_paged_cache_load operator was called, it forcibly transferred seq_len2 to the device, which triggered synchronization and interrupted the CPU operator's launch stream. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: underfituu <hzhucong@163.com>	2025-11-08 18:45:31 +08:00
zxr2333	1d81a289d0	[P/D][BugFix]Fix proxy format processing errors & Layerwise connector performance optimization (#4043 ) ### What this PR does / why we need it? 1. Fix proxy format processing errors. 2. Layer-wise connector performance optimization. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By CI. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>	2025-11-08 18:44:06 +08:00
wangx700	24d6314718	[Bugfix] fix sleepmode level2 e2e test (#4019 ) ### What this PR does / why we need it? enable sleepmode level2 e2e test and add the check logic to ensure the nz is not enabled. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? use e2e tests - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangx700 <wangxin700@huawei.com>	2025-11-08 14:11:55 +08:00
offline893	f7ca3bc0fa	[CI]Fix eplb ci. (#4052 ) ### What this PR does / why we need it? This pr fixes ci on eplb - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-11-07 23:53:35 +08:00
offline893	e687d6af85	[BugFix]Fix group list type of mc2. (#4047 ) ### What this PR does / why we need it? Fix accrucy problem of eplb because of PTA upgrade. ### How was this patch tested? Main: baseline: \| dataset \| version \| metric \| mode \| vllm-api-general-chat \| \|----- \| ----- \| ----- \| ----- \| -----\| \| aime2024 \| 604a78 \| accuracy \| gen \| 87.50 \| EPLB: \| dataset \| version \| metric \| mode \| vllm-api-general-chat \| \|----- \| ----- \| ----- \| ----- \| -----\| \| aime2024 \| 604a78 \| accuracy \| gen \| 87.50 \| - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-11-07 17:41:56 +08:00
drslark	23b785fdfb	[Feat] Adapted mtp function to Qwen3-next (#3918 ) ### What this PR does / why we need it? Adapts mtp function to Qwen3-next. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: drslark <slarksblood@qq.com>	2025-11-07 16:39:03 +08:00
zhangyiming	46ef280105	[Doc] Add model feature matrix table. (#4040 ) ### What this PR does / why we need it? Add model feature matrix table. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: menogrey <1299267905@qq.com>	2025-11-07 11:28:05 +08:00
lilinsiman	22286fc67d	[UT] Add new ut case for aclgraph in auto enable (#4031 ) ### What this PR does / why we need it? add new ut case for aclgraph in auto enable ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-07 10:39:11 +08:00
LookAround0301	79e536d939	[Feat] update op for mla (#4000 ) ### What this PR does / why we need it? 1、in mla_v1 module, add torch_npu.npu_attention_update op when pcp and dcp ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: LookAround <lixushi@huawei.com>	2025-11-07 09:48:39 +08:00
LookAround0301	f8610b7d67	[long_seq] fix A2 accuracy problem (#4030 ) ### What this PR does / why we need it? 1、update prepare_finalize.py：fix A2 accuracy problem when pcp and dcp - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: LookAround <lixushi@huawei.com>	2025-11-07 09:29:33 +08:00
Angazenn	e0d58d543b	[main][bugfix] Fix a rare bug triggered by _npu_paged_attention in FULL_DECODE_ONLY mode (#3986 ) ### What this PR does / why we need it? This PR fixes a bug where the workspace of `_npu_paged_attention` in setup is smaller than execution. For current implementation of FULL_DECODE_ONLY with `_npu_paged_attention`, we use `_npu_paged_attention_get_workspace` when capturing with `max_model_len` as `seq_lens`. This assumes that PA with larger `seq_lens` inputs should have larger workspace than smaller `seq_lens`. However, there are rare cases where PA with smaller `seq_lens` incurs larger space. So I add `get_workspace` directly into `update_attn_params`. This change might introduce small(≈1%) performance degradation for low num_tokens(such as 1) in decode phase, and there is no other known memory issues. So I think this change is acceptable. We can remove this if new attention op (such as `npu_fused_infer_attention_score`) does not have such problems. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: Angazenn <supperccell@163.com>	2025-11-06 23:08:07 +08:00
drslark	1804b60ec8	[BugFix][main] Adapted to torch_npu.npu_fused_infer_attention_score (#4025 ) ### What this PR does / why we need it? Fixes a compatible bug with `torch_npu.npu_fused_infer_attention_score` which is discribed in https://github.com/vllm-project/vllm-ascend/issues/4020. @momo609 tells us this solution. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? The environment is same with this issue, https://github.com/vllm-project/vllm-ascend/issues/4020. We modify the code according to https://github.com/vllm-project/vllm-ascend/pull/3918. And run below codes: ```python # run with Qwen3-next-mtp prompts = [ "Who are you?", ] sampling_params = SamplingParams(temperature=0.0, top_p=0.95, top_k=40, max_tokens=128) llm = LLM(model="/home/model/Qwen3-Next-80B-A3B-Instruct", tensor_parallel_size=4, enforce_eager=True, distributed_executor_backend="mp", gpu_memory_utilization=0.7, speculative_config={ "method": "qwen3_next_mtp", "num_speculative_tokens": 1, }, max_model_len=4096) outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` Outputs: ```text Prompt: 'Who are you?', Generated text: ' I am Qwen, a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I am designed to answer questions, create text such as stories, official documents, emails, scripts, and more, as well as perform logical reasoning, programming, and other tasks. If you have any questions or need assistance, feel free to let me know anytime!' ``` Now, `torch_npu.npu_fused_infer_attention_score` is compatible with Qwen3-Next. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: drslark <slarksblood@qq.com>	2025-11-06 22:00:24 +08:00
realliujiaxu	22005c64c1	[Bugfix] Add constraints for sequence parallelism (#4014 ) ### What this PR does / why we need it? Add Add constraints for sequence parallelism for unsupported scenarios: 1. tp_size > 1 2. enable_expert_parallel must be True for MoE model ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>	2025-11-06 20:02:03 +08:00
Li Wang	259eb25f88	[CI] Quick fix mooncake for nightly-ci (#4028 ) ### What this PR does / why we need it? Since we have upgraded to CANN 8.3rc1, we will no longer use the privately maintained Mooncake repository, but instead use the official release released by Mooncake: https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2 . Next step: this is only a temporary solution. We will integrate mooncake into the vllm-ascend base image later for easier use. see https://github.com/vllm-project/vllm-ascend/pull/3989 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-06 18:46:00 +08:00
jiangyunfan1	34b278a339	[TEST]Update nightly acc test standard (#4032 ) ### What this PR does / why we need it? This PR updates the acc test standard for some cases, we need it to better maintain acc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-06 16:58:38 +08:00
weiguihua2	2eebe1dc0a	[feat]decode convert bsnd to tnd and fix bug when pcp and dcp (#3980 ) ### What this PR does / why we need it? 1、in attention_v1 module, convert bsnd t0 tnd when pcp and dcp 2、fix tochair bug: service startup problem ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-11-06 14:58:24 +08:00
Liziqi-77	25b24c02ea	[Feat](Mooncake) Supports multiple input suffixes for global_segment_size (#3690 ) ### What this PR does / why we need it? - global_segment_size and local_buffer_size use constants for unified management. - Newly added support for input formats ending with GB, MB, KB, and B, while being compatible with existing input methods. ### Does this PR introduce _any_ user-facing change? - Users can use new input methods - The documentation has also been modified ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: 李子琦 <liziqi_ing@163.com>	2025-11-06 14:48:15 +08:00
zxr2333	b206e831e9	[P/D]Make kv-transfer env variable take effect & Fix load-balance proxy (#3981 ) ### What this PR does / why we need it? Make kv-transfer env variable take effect and Fix load-balance proxy. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By CI. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-11-06 12:02:47 +08:00
zhangxinyuehfad	737cad2b6b	[Test] Refactor accuracy test to nightly test (#3814 ) ### What this PR does / why we need it? Refactor accuracy test to nightly test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-06 09:06:59 +08:00
pz1116	b1488ecdb1	[main][doc][kv_pool]Add adxl timeout parameter in kv pool user guide (#4012 ) ### What this PR does / why we need it? Add adxl timeout parameter in kv pool user guide, avoiding timeout error when initializing connections between devices. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>	2025-11-05 18:39:35 +08:00
offline893	5cff3069f4	[Doc]Add developer guide of eplb. (#3759 ) ### What this PR does / why we need it? Add developer guide of eplb - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-11-05 18:35:41 +08:00
pz1116	e0c23cb011	[docs] Add kv pool developer guide (#3752 ) ### What this PR does / why we need it? Add kv pool developer guide ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? vLLM version: v0.11.0rc3 vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: Pz1116 <zpbzpb123123@gmail.com> Signed-off-by: pz1116 <zpbzpb123123@gmail.com>	2025-11-05 18:03:36 +08:00
zouyida2052	1ba158567c	[Doc] add mtp doc (#3770 ) ### What this PR does / why we need it? add mtp develop doc - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: zouyida2052 <zouyida2002@gmail.com>	2025-11-05 16:38:35 +08:00
wangxiyuan	3ac76fdccc	[Doc] Update version policy (#3999 ) Add version policy for main branch to clear how vllm-ascend work with vllm - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-05 14:55:54 +08:00

... 2 3 4 5 6 ...

1480 Commits