xc-llm-ascend

Files

Mengqing Cao a0c3e9ba50 [Bugfix] Adjust inputbatch to be compatible with latest vllm (#945 )

Adjust inputbatch to be compatible with latest vllm, as kvcache group
feature has been redo in https://github.com/vllm-project/vllm/pull/18593

---------

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-05-26 10:33:28 +08:00

__init__.py

port deepseekv2 and mtp to main branch (#429 )

2025-04-19 17:38:18 +08:00

cache_engine.py

support deepseek quant & mix-parallel with graphmode (#585 )

2025-04-23 16:23:25 +08:00

draft_model_runner.py

[CI] upgrade vllm to 0.8.5 (#715 )

2025-04-30 09:15:50 +08:00

model_runner_v1.py

[Bugfix] Adjust inputbatch to be compatible with latest vllm (#945 )

2025-05-26 10:33:28 +08:00

model_runner.py

[Core] Support the features of prefix cache and chunked prefill in v0/v1 (#782 )

2025-05-09 16:39:28 +08:00

multi_step_runner.py

[Performance]: Custom AscendC Kernel of Multi-Step Prepare Input (#814 )

2025-05-20 09:31:30 +08:00

multi_step_worker.py

support multistep decode (#299 )

2025-03-11 19:20:06 +08:00

pooling_model_runner.py

[MISC] Clean up torch_npu (#688 )

2025-04-29 18:03:38 +08:00

worker_v1.py

[V1][LoRA][Test] V1 Engine LoRA support & e2e test (#893 )

2025-05-22 19:20:51 +08:00

worker.py

[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694 )

2025-05-01 22:31:36 +08:00