xc-llm-ascend/vllm_ascend at 49e6983b3be77873cd21ddd9b4cf3deddf65e0ea - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Mengqing Cao 5fed166a99 [ModelRunner][Refactor] Refactor kv cache tensor initialization logic (#3106 )

### What this PR does / why we need it?
Refactor kv cache tensor initialization logic. 
1. Unify the kvcache tensor initialization logic of deepseek and normal
models
2. spilt `initialize_kv_cache_tensors` into `_allocate_kv_cache_tensors`
and `_reshape_kv_cache_tensors`, following gpu modelrunner in vllm

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.
1. prefill disaggregation scenario
4. deepseek + aclgraph/eager mode
5. qwen3 next


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-11-04 17:26:54 +08:00

..

revert TND modify when dcp pcp (#3948 )

2025-11-03 22:22:17 +08:00

revert TND modify when dcp pcp (#3948 )

2025-11-03 22:22:17 +08:00

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[PD Disaggregation]Set adxl engine as default backend and update README (#3761 )

2025-11-04 16:06:39 +08:00

[CI]Fix oom of deepseek-eplb nigtly test. (#3884 )

2025-10-30 10:18:07 +08:00

Upgrade to 0.11.1 newest vllm commit (#3762 )

2025-10-28 14:55:03 +08:00

[1/N][Refactor] Refactor code to adapt with vllm main (#3612 )

2025-10-24 16:55:08 +08:00

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

[Model][3/N] Refactor sfa into mla and remove deepseek_v3_2.py (#3769 )

2025-10-30 17:06:38 +08:00

[Perf] move quant before allgather in Allgather EP (#3420 )

2025-11-04 16:49:58 +08:00

[Model][3/N] Refactor sfa into mla and remove deepseek_v3_2.py (#3769 )

2025-10-30 17:06:38 +08:00

[Perf] move quant before allgather in Allgather EP (#3420 )

2025-11-04 16:49:58 +08:00

Upgrade to 0.11.1 newest vllm commit (#3762 )

2025-10-28 14:55:03 +08:00

[BugFix] Fix deepseek v3.2 mtp bug. (#3900 )

2025-11-04 14:06:59 +08:00

correct bug to fix the value of max_num_tokens (#3933 )

2025-11-03 14:17:51 +08:00

[ModelRunner][Refactor] Refactor kv cache tensor initialization logic (#3106 )

2025-11-04 17:26:54 +08:00

__init__.py

[1/N][Refactor] Refactor code to adapt with vllm main (#3612 )

2025-10-24 16:55:08 +08:00

ascend_config.py

[1/N][Refactor] Refactor code to adapt with vllm main (#3612 )

2025-10-24 16:55:08 +08:00

ascend_forward_context.py

Update torch-npu version to 2.7.1 (#3896 )

2025-10-31 17:16:31 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[main] remove dbo code (#3712 )

2025-10-25 15:53:01 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

mfix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_lentp (#3910 )

2025-10-31 09:24:50 +08:00

utils.py

[MM][Bugfix] Add MoE verification for multi-modal models (#3897 )

2025-11-04 09:16:19 +08:00