13 Commits

Author SHA1 Message Date
meihanc
bff4fbfca5 upgrade to 0.18.0 (#7502)
### What this PR does / why we need it?
1. upgrade to 0.18.0
2. ensure kernel_block_sizes is int for Eagle drafter
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.17.0
- vLLM main:
8b6325758c

---------

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2026-03-21 16:05:38 +08:00
Li Wang
83a4065b4b [CI] Add pre-commit check for patch logger (#7446)
### What this PR does / why we need it?
See https://github.com/vllm-project/vllm-ascend/pull/7402, pre-commit
hook will forbid init_logger(__name__) in vllm_ascend patch modules

- vLLM version: v0.17.0
- vLLM main:
8a680463fa

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-19 16:53:20 +08:00
Nengjun Ma
8b79d4de52 Main2main upgrade to vllm 0317 afternoon (#7409)
### What this PR does / why we need it?

1.fix "TypeError: get_attn_backend() remove variable": [Refactor
`check_and_update_config`](https://github.com/vllm-project/vllm/pull/35122)

2.fix [Rename `compile_ranges_split_points` to
`compile_ranges_endpoints`](https://github.com/vllm-project/vllm/pull/36027)

3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace
memory related torch.cuda
APIs"](https://github.com/vllm-project/vllm/pull/37031)

4.fix [Support multiple KV groups in OffloadingSpec
](https://github.com/vllm-project/vllm/pull/36610) removed
self.offloaded_block_size and changed self.gpu_block_size from a scalar
to a tuple of per-group block sizes, adding block_size_factor.

5.fix [Consolidate
SupportsEagle](https://github.com/vllm-project/vllm/pull/36063) renamed
get_eagle3_aux_hidden_state_layers() to
get_eagle3_default_aux_hidden_state_layers() and added a
supports_eagle3() guard before calling it.

### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
E2E


- vLLM version: v0.17.0
- vLLM main:
8a680463fa

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
2026-03-18 23:24:27 +08:00
Canlin Guo
e4458b2d2b [Main2Main] Upgrade vLLM to 0226 (#6813)
### What this PR does / why we need it?

Breaking:
1. https://github.com/vllm-project/vllm/pull/33452
2. https://github.com/vllm-project/vllm/pull/33451
3. https://github.com/vllm-project/vllm/pull/32567
4. https://github.com/vllm-project/vllm/pull/32344

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: MrZ20 <2609716663@qq.com>
2026-02-27 16:05:21 +08:00
meihanc
922e5c163b [main2main] upgrade vllm main 0202 (#6560)
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
https://github.com/vllm-project/vllm/pull/32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to https://github.com/vllm-project/vllm/pull/33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
https://github.com/vllm-project/vllm/pull/33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
https://github.com/vllm-project/vllm/pull/32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
https://github.com/vllm-project/vllm/pull/32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to https://github.com/vllm-project/vllm/pull/27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
https://github.com/vllm-project/vllm/pull/33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
https://github.com/vllm-project/vllm/pull/32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2026-02-05 19:31:17 +08:00
SILONG ZENG
6ccccad102 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5) (#5996)
### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
|
`.../distributed/kv_transfer/kv_pool/ascend_store/ascend_store_connector.py`
|
|
`vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/backend/backend.py`
|
| `
.../distributed/kv_transfer/kv_pool/ascend_store/backend/memcache_backend.py`
|
| `
.../distributed/kv_transfer/kv_pool/ascend_store/backend/mooncake_backend.py`
|
| `
vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/config_data.py`
|
| `
vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/kv_transfer.py`
|
| `
vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/pool_scheduler.py`
|
| `
vllm_ascend/distributed/kv_transfer/kv_pool/ascend_store/pool_worker.py`
|
| `
.../distributed/kv_transfer/kv_pool/cpu_offload/cpu_kv_cache_manager.py`
|
| `
.../distributed/kv_transfer/kv_pool/cpu_offload/cpu_offload_connector.py`
|
| ` vllm_ascend/distributed/kv_transfer/kv_pool/cpu_offload/metadata.py`
|
| ` vllm_ascend/distributed/kv_transfer/kv_pool/ucm_connector.py` |
| `
vllm_ascend/distributed/kv_transfer/utils/mooncake_transfer_engine.py` |
| ` vllm_ascend/distributed/kv_transfer/utils/utils.py` |
| ` vllm_ascend/kv_offload/cpu_npu.py` |
| ` vllm_ascend/kv_offload/npu.py` |
| ` vllm_ascend/lora/lora_ops.py` |
| ` vllm_ascend/lora/punica_npu.py` |
| ` vllm_ascend/lora/utils.py` |

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: SILONG ZENG <2609716663@qq.com>
2026-01-24 22:45:38 +08:00
zhangxinyuehfad
819a4459ce Drop vLLM 0.13.0 support (#6069)
### What this PR does / why we need it?
Drop vLLM 0.13.0 support, upgrade to 0.14.0

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-01-23 09:45:08 +08:00
wjunLu
c11a05c4e1 [Main2Main] Upgrade vllm commit to 0113 (#5839)
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
https://github.com/vllm-project/vllm/pull/31916
https://github.com/vllm-project/vllm/pull/32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
https://github.com/vllm-project/vllm/pull/24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
https://github.com/vllm-project/vllm/pull/31998

- Skip some pooling tests, which are caused by
https://github.com/vllm-project/vllm/pull/32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
https://github.com/vllm-project/vllm/pull/32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
https://github.com/vllm-project/vllm/pull/32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2026-01-15 09:48:53 +08:00
whx
3f33ad23fe [BugFix] Fix npu-cpu offloading interface change bug. (#5290)
### What this PR does / why we need it?
Last month the interface of `OffloadingSpec` has
changed(https://github.com/vllm-project/vllm/pull/27743). This PR fixes
this bug and adds e2e test for cpu offloading.

### Does this PR introduce _any_ user-facing change?
None

### How was this patch tested?
CI passed with new added test.


- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
2025-12-27 10:21:20 +08:00
wangxiyuan
7f2673ea2d upgrade vLLM to main (#4608)
1. fix https://github.com/vllm-project/vllm/pull/28542
The model structure modifications we involved in are:
     - Qwen2.5-VL(still exist some patch)
     - Qwen2-VL
     - Qwen2
     - DeepSeek series
     - Qwen-moe series
2. fix https://github.com/vllm-project/vllm/pull/29121
   the output token now  type changed from np to `list[list[int]]`

3. fix https://github.com/vllm-project/vllm/pull/29262
    `xformers` backend for multimodal now has been deprecated
4. fix https://github.com/vllm-project/vllm/pull/29342

5. fix https://github.com/vllm-project/vllm/pull/28579
6. fix https://github.com/vllm-project/vllm/pull/28718
7. fix https://github.com/vllm-project/vllm/issues/28665
8. fix https://github.com/vllm-project/vllm/pull/26847
vllm introduced the `optimization-level`, some default config has been
changed, and the param `--enforce-eager` has been deprecated
9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple
for sampler.
10. fix https://github.com/vllm-project/vllm/pull/29471 we'll remove the
related patch to avoid this kind of error.

Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangli <wangli858794774@gmail.com>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2025-12-02 22:10:52 +08:00
wangxiyuan
a1f142b7ad Drop 0.11.0 support (#4377)
There is a lot hack code for v0.11.0, which makes the code hard to
upgrade to newer vLLM version. Since v0.11.0 will release soon. Let's
drop v0.11.0 support first. Then we'll upgrade to v0.11.2 soon.


- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-24 17:08:20 +08:00
Icey
a7450db1bd Upgrade to 0.11.1 newest vllm commit (#3762)
### What this PR does / why we need it?

c9461e05a4

Fix ```spec decode rejection sampler```, caused by
https://github.com/vllm-project/vllm/pull/26060
Fix some ```import```, caused by
https://github.com/vllm-project/vllm/pull/27374
Fix ```scheduler_config.send_delta_data```, caused by
https://github.com/vllm-project/vllm-ascend/pull/3719
Fix ```init_with_cudagraph_sizes```, caused by
https://github.com/vllm-project/vllm/pull/26016
Fix ```vl model```of replacing PatchEmbed's conv3d to linear layer,
caused by https://github.com/vllm-project/vllm/pull/27418

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new added/existing test.


- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: Icey <1790571317@qq.com>
2025-10-28 14:55:03 +08:00
kx
bc30874f8b [Feat] add native kvcache offload (#3433)
### What this PR does / why we need it?
This pr is for https://github.com/vllm-project/vllm-ascend/issues/3241 ,
which is in-house solution for offloading KV cache data from the GPU
memory to other medium (in particular, CPU memory)。Previous solutions
required reliance on third-party components, which had issues with
compatibility between different versions.

### How was this patch tested?
use the following script for testing:

export CUDA_VISIBLE_DEVICES=0
export TP=1
export MODEL_PATH=/model/Qwen3-14B
export MODEL_NAME=Qwen3-14B
export PORT=10000
#export ASCEND_LAUNCH_BLOCKING=1
#export ASCEND_SLOG_PRINT_TO_STDOUT=1

python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port
${PORT} --dtype bfloat16 --model ${MODEL_PATH} --served-model-name
${MODEL_NAME} --tensor-parallel-size ${TP} --gpu-memory-utilization 0.7
--max-model-len 32768 --trust-remote-code --disable-log-requests \
    --block-size 128 \
--kv-transfer-config
'{"kv_connector":"OffloadingConnector","kv_role":"kv_both","kv_connector_extra_config":{"block_size":
128, "num_cpu_blocks": 1000, "spec_name":"NPUOffloadingSpec",
"spec_module_path": "vllm_ascend.kv_offload.npu"}}'


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: HF-001 <1670186653@qq.com>
2025-10-22 14:15:49 +08:00