Commit Graph

12 Commits

Author SHA1 Message Date
wangxiyuan
4e3919e965 Reapply "[Refactor] Unify full-graph parameter update logic (#6041)" (#6227) (#6231)
This reverts commit 95649344aa.

The CI failure doesn't related to this change. Let's reapply it.

- vLLM version: v0.14.0
- vLLM main:
d68209402d
2026-01-26 09:04:54 +08:00
wangxiyuan
95649344aa Revert "[Refactor] Unify full-graph parameter update logic (#6041)" (#6227)
This reverts commit 8966a99710.

It breaks the test
`tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]`

- vLLM version: v0.14.0
- vLLM main:
d68209402d
2026-01-25 15:25:38 +08:00
LICO67373
8966a99710 [Refactor] Unify full-graph parameter update logic (#6041)
### What this PR does / why we need it?

**Refactor: Unify full-graph parameter update logic**

This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.

**Key improvements:**

1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
   - Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`

2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
   - Each Backend manages its own parameter update logic internally
   - Simplify caller code to just dispatch to the appropriate Backend

3. **Cleaner parameter handling**
   - Remove unnecessary `pcp_size` and `dcp_size` parameter passing
   - Get parallel configuration directly from distributed groups
   - Consistent with how other parts of the codebase obtain these values

**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure

### Does this PR introduce _any_ user-facing change?

**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.

### How was this patch tested?

- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness

---

- vLLM version: v0.13.0

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
2026-01-24 20:12:57 +08:00
dsxsteven
8378bc28b0 [Misc] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 (#6013)
### What this PR does / why we need it?
PCP/DCP splits the kv-cache onto different cards. After introducing the
parameter cp-kv-cache-interleave-size, the first size tokens will be
cached at Card 0, and so on.
However, if there are too few tokens, some cards will not store the
key-value pairs, resulting in values ​​of 0, corrupted values, and
precision issues. Currently, additional operations are introduced to
avoid this precision problem.

After we integrate FIA operator in mla_cp._forward_decode and CANN
updates to 8.5.0, we now can remove these additional operations.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
passed all CI by CANN 8.5.0
- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

Signed-off-by: dsxsteven <dsxsteven@sina.com>
Signed-off-by: dsxsteven <36877507+dsxsteven@users.noreply.github.com>
2026-01-23 14:13:12 +08:00
zhangxinyuehfad
819a4459ce Drop vLLM 0.13.0 support (#6069)
### What this PR does / why we need it?
Drop vLLM 0.13.0 support, upgrade to 0.14.0

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-01-23 09:45:08 +08:00
Qiu
58ff465821 [bugfix] fix the complex and potentially problematic generate_kv_idx. (#5957)
### What this PR does / why we need it?
In long-sequence scenarios, the chunked-prefill component may encounter
dimension misalignment issues, which previously occurred during
precision testing on the code_generate_lite dataset. This PR removes
redundant computations and instead derives the value using existing
results and straightforward calculations.
- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
2026-01-21 14:21:02 +08:00
SILONG ZENG
329961b375 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2) (#5977)
### What this PR does / why we need it?
**Scope of Changes**:
| File Path |
| :--- |
| `vllm_ascend/attention/attention_mask.py` |
| `vllm_ascend/attention/attention_v1.py` |
| `vllm_ascend/attention/context_parallel/attention_cp.py` |
| `vllm_ascend/attention/context_parallel/common_cp.py` |
| `vllm_ascend/attention/context_parallel/mla_cp.py` |
| `vllm_ascend/attention/utils.py` |
| `vllm_ascend/batch_invariant.py` |
| `vllm_ascend/device/device_op.py` |
| `vllm_ascend/device_allocator/camem.py` |
| `vllm_ascend/envs.py` |


- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-01-19 08:59:46 +08:00
wjunLu
c11a05c4e1 [Main2Main] Upgrade vllm commit to 0113 (#5839)
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
https://github.com/vllm-project/vllm/pull/31916
https://github.com/vllm-project/vllm/pull/32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
https://github.com/vllm-project/vllm/pull/24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
https://github.com/vllm-project/vllm/pull/31998

- Skip some pooling tests, which are caused by
https://github.com/vllm-project/vllm/pull/32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
https://github.com/vllm-project/vllm/pull/32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
https://github.com/vllm-project/vllm/pull/32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2026-01-15 09:48:53 +08:00
Qiu
a88937f5cb [bugfix](cp) replace None with zeros/inf tensor to avoid TypeError (#5837)
### What this PR does / why we need it?
When there is no kv cache in some devices, the `_compute_prefill_context
func` will return `None`, which is unexecpted. This PR replaces None
with full zeros/-inf tensors to avoid TypeError.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
```bash
pytest tests/e2e/multicard/4-cards/long_sequence/test_chunked_prefill.py -k test_models_chunked_prefill_with_empty_kvcache
```

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
2026-01-14 20:57:48 +08:00
LICO67373
c8a324ab73 [Refactor] Add comments for Metadata classes in attention module (#5789)
### What this PR does / why we need it?

Add docstrings for Metadata and MetadataBuilder classes in the attention
module to improve code readability.

Related to #5463 (Item 11: Add some comments for CommonMetadata and
others)

**Modified files:**
- `vllm_ascend/attention/context_parallel/common_cp.py`: Added comments
for `AscendPCPMetadata`, `CPChunkedContextMetadata`,
`AscendMetadataForPrefill`, `AscendMetadataForDecode`
- `vllm_ascend/attention/utils.py`: Added comments for
`AscendPrefillContextParallelMetadata`
- `vllm_ascend/attention/mla_v1.py`: Added comments for
`ChunkedContextMetadata`, `AscendMLADecodeMetadata`
- `vllm_ascend/attention/attention_v1.py`: Added comments for
`AscendMetadata`, `AscendAttentionMetadataBuilder`
- `vllm_ascend/attention/context_parallel/attention_cp.py`: Added
comments for `AscendAttentionCPMetadataBuilder`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Documentation only, no functional changes.

Signed-off-by: lico67373 <918688502@qq.com>
2026-01-13 08:46:50 +08:00
LICO67373
380f089fbf [Refactor] Fix AttentionMaskBuilder singleton and remove redundant pcp_prefill_mask (#4870)
## What this PR does / why we need it?

This PR fixes the `AttentionMaskBuilder` singleton initialization issue
introduced in PR #4779 and removes the unused `pcp_prefill_mask` field.

### Background

After PR #4779 made `AttentionMaskBuilder` a singleton with `@singleton`
decorator, the class constructor now requires a `device` parameter.
However, two initialization sites were still using the old parameterless
constructor, causing failures.

### Changes

1. **Fix singleton initialization**
- Fixed `AttentionMaskBuilder()` → `AttentionMaskBuilder(self.device)`
in `AscendMLAMetadataBuilder.__init__()`
- Fixed `AttentionMaskBuilder()` → `AttentionMaskBuilder(self.device)`
in `AscendAttentionMetadataBuilder.__init__()`

2. **Remove unused field**
- Removed `pcp_prefill_mask` field from
`AscendPrefillContextParallelMetadata` (never used in codebase)
   - Updated related test assertions

### Related

- Issue #5463
- PR #4779 (Unify all mask generation methods)
- PR #5389 (Make AttentionMaskBuilder singleton)

## Does this PR introduce _any_ user-facing change?

No. This is an internal refactoring.

## How was this patch tested?

-  Local testing: No linter errors
-  Unit tests for attention modules verified
-  CI pipeline

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
2026-01-07 17:09:52 +08:00
wujinyuan1
4a3663327b [Refactor]7/N Extract common code to common_cp (#5490)
RFC: https://github.com/vllm-project/vllm-ascend/issues/4629
Reason:
Eliminate duplicate code for two file(mla_cp.py attention_cp.py) to
common_cp.py.

vLLM version: 0.13.0rc3
vLLM main:
ad32e3e19c

vLLM version: release/v0.13.0
vLLM main:
5fbfa8d9ef

- vLLM version: v0.13.0
- vLLM main:
5326c89803

---------

Signed-off-by: wujinyuan1 <wjy9595@qq.com>
Signed-off-by: wujinyuan1 <wujinyuan1@huawei.com>
Co-authored-by: wujinyuan1 <wjy9595@qq.com>
2026-01-05 17:41:12 +08:00