xc-llm-ascend

Files

zxr2333 d39d80830c [KVCache]Qwen3.5 supports contiguous tensor hybrid-attn kv-cache (#6887 )

### What this PR does / why we need it?
Supports contiguous tensor hybrid-attn kv-cache on fullattn-mamba hybrid
model, such as Qwen3Next and Qwen3.5.
Due to the restrictions of Ascend operators, all KV tensors, conv
tensors, and SSM tensors must be contiguous. Therefore, this PR uses the
following solution to generate the KV cache:
tensor1: [(kv_padding), conv                      , ...]
tensor2: [k                   , ssm                       , ...]
tensor3: [v                   , (mamba_padding), ...]
Under this scheme, although some waste may occur, the tensors of all
caches are guaranteed to be contiguous.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By CI.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>

2026-03-09 15:28:40 +08:00

__init__.py

[Main2Main] Upgrade vLLM to 0303 (#6944 )

2026-03-06 09:08:52 +08:00

patch_balance_schedule.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

patch_distributed.py

[Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (#5776 )

2026-01-17 11:49:18 +08:00

patch_fusion_matcher_compat_ops.py

[Main2Main] Upgrade vLLM to 0303 (#6944 )