xc-llm-ascend

Files

zhenghaojiang eb43a475f4 [Feat] chunkprefill mla support torchair graph (#1772 )

chunkprefill mla only support eager mode now，we want to optimaze it by
support torchair graph, the idea is simple, when all the request is
running in decode, use torchair graph to deal with it, else when
chunkprefill or prefill only, use the eager mode

- vLLM version: v0.10.0
- vLLM main:
ebf7605b0d

Signed-off-by: haojiangzheng <justineric096@gmail.com>
Co-authored-by: haojiangzheng <justineric096@gmail.com>

2025-08-11 19:58:59 +08:00

e2e

[Perf][MTP] Optimize reject sampler in greedy situation. (#2137 )

2025-08-11 17:37:49 +08:00

[Feat] chunkprefill mla support torchair graph (#1772 )

2025-08-11 19:58:59 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00