[Platform][Model Runner] Add hash of request_ids; Change blocksize back to 128. (#293)

This PR changes the initial value of blocksize back to 128 and adds hash value of request id list in model runner for implementing sampling param cache in sampler. Signed-off-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com>
2025-03-11 18:50:28 +08:00
parent 007aeaa48b
commit feb6bdb12e
2 changed files with 6 additions and 2 deletions
--- a/vllm_ascend/platform.py
+++ b/vllm_ascend/platform.py
@@ -108,8 +108,7 @@ class NPUPlatform(Platform):
            parallel_config.worker_cls = "vllm_ascend.worker.worker.NPUWorker"
        cache_config = vllm_config.cache_config
        if cache_config and cache_config.block_size is None:
-            # TODO: Set block_size to 128 will lead unexpected accuracy issue in mla case.  Please set block_size to 128 back once the problem is fixed.
-            cache_config.block_size = 16
+            cache_config.block_size = 128

    @classmethod
    def get_attn_backend_cls(cls, selected_backend, head_size, dtype,