xc-llm-ascend

Files

whx 0d3463400a [Performance] Change the shape of kv_cache to avoid view of k_cache and v_cache. (#204 )

This PR changes the shape of kv cache to avoid the view of k_cache and
v_cache.
What's more, cache the metadata of k_cache and v_cache to avoid
duplicative slice operations to improve performance.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

2025-03-05 10:51:07 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

quant_config.py

[Performance] Change the shape of kv_cache to avoid view of k_cache and v_cache. (#204 )

2025-03-05 10:51:07 +08:00

quantizer.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00