[Refactor] Cleanup platform (#5566)
### What this PR does / why we need it?
1. add `COMPILATION_PASS_KEY` constant
2. clean up useless platform interface `empty_cache`, `synchronize`,
`mem_get_info`, `clear_npu_memory`
3. rename `CUSTOM_OP_REGISTERED` to `_CUSTOM_OP_REGISTERED`
4. remove uesless env `VLLM_ENABLE_CUDAGRAPH_GC`
NPUPlatform is the interface called by vLLM. Do not call it inner
vllm-ascend.
### Does this PR introduce _any_ user-facing change?
This PR is just a cleanup. All CI should pass.
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
7157596103
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -128,10 +128,17 @@ class TestCaMem(PytestBase):
|
||||
2000: data2,
|
||||
}
|
||||
|
||||
# mock is_pin_memory_available, return False as some machine only has cpu
|
||||
with patch(
|
||||
"vllm_ascend.device_allocator.camem.NPUPlatform.is_pin_memory_available",
|
||||
return_value=False):
|
||||
# Mock torch.empty to force pin_memory=False
|
||||
original_torch_empty = torch.empty
|
||||
|
||||
def mock_torch_empty(*args, **kwargs):
|
||||
# If pin_memory was explicitly set to True, change it to False
|
||||
if 'pin_memory' in kwargs and kwargs['pin_memory'] is True:
|
||||
kwargs['pin_memory'] = False
|
||||
return original_torch_empty(*args, **kwargs)
|
||||
|
||||
with patch("vllm_ascend.device_allocator.camem.torch.empty",
|
||||
side_effect=mock_torch_empty):
|
||||
allocator.sleep(offload_tags="tag1")
|
||||
|
||||
# only offload tag1, other tag2 call unmap_and_release
|
||||
|
||||
Reference in New Issue
Block a user