xc-llm-ascend

Files

whx 386817b4d1 [Model Runner][Performance] Cache the jugement result of is_encoder_decoder to decrease framework overhead (#138 )

In Model Runner, is_encoder_decoder is exacted from model_config to
determin whether vllm is running for enc-dec models. Obtaining this
status requires a long call stack, and the CPU overhead is high. So this
PR cache this status in __init__ of ModelInputForNPUBuilder.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>

2025-02-21 22:43:11 +08:00

ops

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

quantization

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

__init__.py

[Core] Init vllm-ascend (#3 )

2025-02-05 10:53:12 +08:00

attention.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )