Files
xc-llm-ascend/vllm_ascend
whx 386817b4d1 [Model Runner][Performance] Cache the jugement result of is_encoder_decoder to decrease framework overhead (#138)
In Model Runner, is_encoder_decoder is exacted from model_config to
determin whether vllm is running for enc-dec models. Obtaining this
status requires a long call stack, and the CPU overhead is high. So this
PR cache this status in __init__ of ModelInputForNPUBuilder.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
2025-02-21 22:43:11 +08:00
..
2025-02-05 10:53:12 +08:00
2025-02-21 17:10:30 +08:00