Canlin Guo
e4458b2d2b
[Main2Main] Upgrade vLLM to 0226 ( #6813 )
...
### What this PR does / why we need it?
Breaking:
1. https://github.com/vllm-project/vllm/pull/33452
2. https://github.com/vllm-project/vllm/pull/33451
3. https://github.com/vllm-project/vllm/pull/32567
4. https://github.com/vllm-project/vllm/pull/32344
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.15.0
- vLLM main:
83b47f67b1
---------
Signed-off-by: MrZ20 <2609716663@qq.com >
Signed-off-by: gcanlin <canlinguosdu@gmail.com >
Co-authored-by: MrZ20 <2609716663@qq.com >
2026-02-27 16:05:21 +08:00
realliujiaxu
5def28dcd3
[Feat]support sequence parallelism by pass for VL models ( #5632 )
2026-02-27 08:27:41 +08:00
meihanc
5b8e47cb68
[bugfix]Fix no attribute 'data' when MLAPO is enable ( #6601 )
...
### What this PR does / why we need it?
This PR fixes an `AttributeError: 'Parameter' object has no attribute
'data'` that occurs when MLAPO is enabled with vLLM v0.15.0.
The error is caused by a monkey-patch on
`MLAAttention.process_weights_after_loading` which is incompatible with
changes in vLLM v0.15.0. This is likely related to PyTorch's deprecation
of the `.data` attribute on `torch.nn.Parameter` objects.
This change makes the monkey-patch conditional, so it is not applied for
vLLM v0.15.0 and newer versions, resolving the crash.
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com >
2026-02-10 09:04:32 +08:00
SILONG ZENG
06aa6036f6
[Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8 ) ( #6604 )
...
### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
| vllm_ascend/ops/\_\_init\_\_.py |
| vllm_ascend/ops/activation.py |
| vllm_ascend/ops/flashcomm2_oshard_manager.py |
| vllm_ascend/ops/layernorm.py |
| vllm_ascend/ops/mla.py |
| vllm_ascend/ops/mm_encoder_attention.py |
| vllm_ascend/ops/register_custom_ops.py |
| vllm_ascend/ops/vocab_parallel_embedding.py |
| vllm_ascend/ops/weight_prefetch.py |
| vllm_ascend/spec_decode/\_\_init\_\_.py |
| vllm_ascend/spec_decode/eagle_proposer.py |
| vllm_ascend/spec_decode/interface.py |
| vllm_ascend/spec_decode/mtp_proposer.py |
| vllm_ascend/spec_decode/ngram_proposer.py |
| vllm_ascend/spec_decode/suffix_proposer.py |
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
Signed-off-by: MrZ20 <2609716663@qq.com >
2026-02-07 09:16:07 +08:00
wangxiyuan
06c0aed124
[CI] Fix broken CI ( #6599 )
...
Revert
4fb3d5e1b2
it breaks E2E Test
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
2026-02-06 17:23:58 +08:00
SILONG ZENG
4fb3d5e1b2
[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #8 ) ( #6129 )
...
### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
| vllm_ascend/ops/\_\_init\_\_.py |
| vllm_ascend/ops/activation.py |
| vllm_ascend/ops/flashcomm2_oshard_manager.py |
| vllm_ascend/ops/layernorm.py |
| vllm_ascend/ops/mla.py |
| vllm_ascend/ops/mm_encoder_attention.py |
| vllm_ascend/ops/register_custom_ops.py |
| vllm_ascend/ops/vocab_parallel_embedding.py |
| vllm_ascend/ops/weight_prefetch.py |
| vllm_ascend/spec_decode/\_\_init\_\_.py |
| vllm_ascend/spec_decode/eagle_proposer.py |
| vllm_ascend/spec_decode/interface.py |
| vllm_ascend/spec_decode/mtp_proposer.py |
| vllm_ascend/spec_decode/ngram_proposer.py |
| vllm_ascend/spec_decode/suffix_proposer.py |
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
d68209402d
Signed-off-by: MrZ20 <2609716663@qq.com >
Signed-off-by: SILONG ZENG <2609716663@qq.com >
2026-02-06 15:25:08 +08:00
meihanc
922e5c163b
[main2main] upgrade vllm main 0202 ( #6560 )
...
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
https://github.com/vllm-project/vllm/pull/32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to https://github.com/vllm-project/vllm/pull/33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
https://github.com/vllm-project/vllm/pull/33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
https://github.com/vllm-project/vllm/pull/32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
https://github.com/vllm-project/vllm/pull/32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to https://github.com/vllm-project/vllm/pull/27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
https://github.com/vllm-project/vllm/pull/33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
https://github.com/vllm-project/vllm/pull/32863
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com >
Signed-off-by: hfadzxy <starmoon_zhang@163.com >
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com >
Co-authored-by: hfadzxy <starmoon_zhang@163.com >
2026-02-05 19:31:17 +08:00
zhangxinyuehfad
819a4459ce
Drop vLLM 0.13.0 support ( #6069 )
...
### What this PR does / why we need it?
Drop vLLM 0.13.0 support, upgrade to 0.14.0
- vLLM version: v0.13.0
- vLLM main:
d68209402d
---------
Signed-off-by: hfadzxy <starmoon_zhang@163.com >
2026-01-23 09:45:08 +08:00
wjunLu
c11a05c4e1
[Main2Main] Upgrade vllm commit to 0113 ( #5839 )
...
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)
- Modify import paths due to the refactors
https://github.com/vllm-project/vllm/pull/31916
https://github.com/vllm-project/vllm/pull/32054
- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
https://github.com/vllm-project/vllm/pull/24498
- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
https://github.com/vllm-project/vllm/pull/31998
- Skip some pooling tests, which are caused by
https://github.com/vllm-project/vllm/pull/32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4
We will reopen those tests when main2main reachs
https://github.com/vllm-project/vllm/pull/32243
- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
https://github.com/vllm-project/vllm/pull/32118
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
Signed-off-by: wjunLu <wjunlu217@gmail.com >
Signed-off-by: hfadzxy <starmoon_zhang@163.com >
Co-authored-by: hfadzxy <starmoon_zhang@163.com >
2026-01-15 09:48:53 +08:00
Shanshan Shen
b94d589769
[MM][Bugfix] Update hf_config to hf_text_config ( #5319 )
...
### What this PR does / why we need it?
Following https://github.com/vllm-project/vllm-ascend/pull/5205 , update
`hf_config` to `hf_text_config`.
Find more details at
https://github.com/vllm-project/vllm-ascend/pull/5205#issuecomment-3675417534
and
https://github.com/vllm-project/vllm-ascend/pull/5205#issuecomment-3677920872 .
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: release/v0.13.0
- vLLM main:
5fbfa8d9ef
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-01-06 16:41:39 +08:00
Chu Yuelin
d07d8a4535
[Model] Add LongCat-Flash ( #3833 )
...
### What this PR does / why we need it?
Add LongCat-Flash support.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed
- vLLM version: v0.13.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: chuyuelin <923822139@qq.com >
Co-authored-by: chuyuelin <chuyuelin1@huawei.com >
2025-12-31 17:06:55 +08:00
wangxiyuan
7f2673ea2d
upgrade vLLM to main ( #4608 )
...
1. fix https://github.com/vllm-project/vllm/pull/28542
The model structure modifications we involved in are:
- Qwen2.5-VL(still exist some patch)
- Qwen2-VL
- Qwen2
- DeepSeek series
- Qwen-moe series
2. fix https://github.com/vllm-project/vllm/pull/29121
the output token now type changed from np to `list[list[int]]`
3. fix https://github.com/vllm-project/vllm/pull/29262
`xformers` backend for multimodal now has been deprecated
4. fix https://github.com/vllm-project/vllm/pull/29342
5. fix https://github.com/vllm-project/vllm/pull/28579
6. fix https://github.com/vllm-project/vllm/pull/28718
7. fix https://github.com/vllm-project/vllm/issues/28665
8. fix https://github.com/vllm-project/vllm/pull/26847
vllm introduced the `optimization-level`, some default config has been
changed, and the param `--enforce-eager` has been deprecated
9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple
for sampler.
10. fix https://github.com/vllm-project/vllm/pull/29471 we'll remove the
related patch to avoid this kind of error.
Co-authored-by: hfadzxy <starmoon_zhang@163.com >
Co-authored-by: wangli <wangli858794774@gmail.com >
- vLLM version: v0.11.2
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Signed-off-by: wangli <wangli858794774@gmail.com >
Signed-off-by: hfadzxy <starmoon_zhang@163.com >
Co-authored-by: wangli <wangli858794774@gmail.com >
Co-authored-by: hfadzxy <starmoon_zhang@163.com >
2025-12-02 22:10:52 +08:00
wangxiyuan
1874265074
Move mla to ops module ( #4575 )
...
Move mla custom op to correct module
- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-29 18:36:55 +08:00