xc-llm-ascend

Files

Mengqing Cao 7aa4f85f10 [Bugfix][kvcache] revert multiple kv cache groups (#923 )

Revert multiple kv cache groups related changes as this feature is
reverted in vllm https://github.com/vllm-project/vllm/pull/18459

---------

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-05-22 15:15:33 +08:00

__init__.py

port deepseekv2 and mtp to main branch (#429 )

2025-04-19 17:38:18 +08:00

cache_engine.py

support deepseek quant & mix-parallel with graphmode (#585 )

2025-04-23 16:23:25 +08:00

draft_model_runner.py

[CI] upgrade vllm to 0.8.5 (#715 )

2025-04-30 09:15:50 +08:00

model_runner_v1.py

[Bugfix][kvcache] revert multiple kv cache groups (#923 )

2025-05-22 15:15:33 +08:00

model_runner.py

[Core] Support the features of prefix cache and chunked prefill in v0/v1 (#782 )

2025-05-09 16:39:28 +08:00

multi_step_runner.py

[Performance]: Custom AscendC Kernel of Multi-Step Prepare Input (#814 )

2025-05-20 09:31:30 +08:00

multi_step_worker.py

support multistep decode (#299 )

2025-03-11 19:20:06 +08:00

pooling_model_runner.py

[MISC] Clean up torch_npu (#688 )

2025-04-29 18:03:38 +08:00

worker_v1.py

[Bugfix] Tweak distributed process group initialization and add dummy… (#816 )

2025-05-12 17:31:29 +08:00

worker.py

[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694 )

2025-05-01 22:31:36 +08:00