xc-llm-ascend

Files

rjg-lyh fa99f89e93 [Core] Support the features of prefix cache and chunked prefill in v0/v1 (#782 )

### What this PR does / why we need it?
Support the features of prefix cache and chunked prefill in v0/v1.

---------

Signed-off-by: rjg-lyh <1318825571@qq.com>

2025-05-09 16:39:28 +08:00

__init__.py

port deepseekv2 and mtp to main branch (#429 )

2025-04-19 17:38:18 +08:00

cache_engine.py

support deepseek quant & mix-parallel with graphmode (#585 )

2025-04-23 16:23:25 +08:00

draft_model_runner.py

[CI] upgrade vllm to 0.8.5 (#715 )

2025-04-30 09:15:50 +08:00

model_runner_v1.py

[Core] Support the features of prefix cache and chunked prefill in v0/v1 (#782 )

2025-05-09 16:39:28 +08:00

model_runner.py

[Core] Support the features of prefix cache and chunked prefill in v0/v1 (#782 )

2025-05-09 16:39:28 +08:00

multi_step_runner.py

[CI] upgrade vllm to 0.8.5 (#715 )

2025-04-30 09:15:50 +08:00

multi_step_worker.py

support multistep decode (#299 )

2025-03-11 19:20:06 +08:00

pooling_model_runner.py

[MISC] Clean up torch_npu (#688 )

2025-04-29 18:03:38 +08:00

worker_v1.py

[CI] upgrade vllm to 0.8.5 (#715 )

2025-04-30 09:15:50 +08:00

worker.py

[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694 )

2025-05-01 22:31:36 +08:00