xc-llm-ascend/vllm_ascend at 7fe0469e279cc9446048809d90575efb8c960bde - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

zxr2333 fe4cad24e9 [BugFix]fix qwen3.5 reshape_kvcache bug (#7209 )

### What this PR does / why we need it?

This PR fixes a bug in `reshape_kvcache_tensors` when reshaping the
Mamba cache for models like Qwen3.5. The previous implementation did not
correctly handle cases where the KV cache tensors have different data
types. This change ensures that slicing is performed based on byte
offsets before reshaping the tensors, which correctly handles
heterogeneous dtypes.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

By CI.

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>

2026-03-12 23:51:40 +08:00

..

[Misc] Fix main lint to make CI happy (#7204 )

2026-03-12 18:27:48 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[eagle][cp] fix eagle_cp enable bug2 (#7079 )

2026-03-10 16:32:49 +08:00

[bugfix] fix pass bug: pass really rope dim for npu_rotary_embedding (#6880 )

2026-03-06 19:35:17 +08:00

[BugFix]Fix recomputed scheduler bug (#7137 )

2026-03-11 00:32:19 +08:00

[misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

2026-03-02 18:17:01 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

improve the ttft when use mooncake (#6125 )

2026-03-12 16:13:48 +08:00

Support per-step heat collection and enhance FlashLB for multi-stage load balancing (#6477 )

2026-03-12 15:49:09 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Bugfix][LoRA] Fix the issue when enable LoRA + tp + fully_sharded_loras (#6650 )

2026-03-11 15:43:15 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

Support per-step heat collection and enhance FlashLB for multi-stage load balancing (#6477 )

2026-03-12 15:49:09 +08:00

[MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139 )

2026-03-12 20:01:24 +08:00

Refactor quantization layer name mapping to leverage vLLM built-in mappers (#7050 )

2026-03-12 15:48:14 +08:00

[Feature] Add docs of batch invariance and make some extra operators patch (#6910 )

2026-03-05 09:12:40 +08:00

[main][bugfix] Fixed the problem of drafter crashed in FULL mode (#7158 )

2026-03-12 18:38:50 +08:00

[BugFix]fix qwen3.5 reshape_kvcache bug (#7209 )

2026-03-12 23:51:40 +08:00

[Feat]Xlite Qwen3 MoE Support Data Parallel (#6715 )

2026-03-09 17:53:35 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

refactor: add a check before layer_sharding logging (#7186 )

2026-03-12 11:56:04 +08:00

ascend_forward_context.py

[EPLB][bugfix] Bugfix for fused mc2 (#6794 )

2026-03-09 11:26:57 +08:00

batch_invariant.py

[Feature] Add docs of batch invariance and make some extra operators patch (#6910 )

2026-03-05 09:12:40 +08:00

cpu_binding.py

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945 )

2026-03-03 17:20:52 +08:00

envs.py

[MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618 )

2026-02-09 15:38:58 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

Revert "[Feature][Quant] Auto-detect quantization format from model f… (#6873 )

2026-03-10 11:27:32 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[Build] Add support for Ascend950 chip (#7151 )

2026-03-12 10:25:51 +08:00