[Feature] adapt to uva buffer and main2main (#6657)
### What this PR does / why we need it?
vllm model runner v2 use uva buffer to prepare input data, but npu
doesn't support uva yet, this pr implement a uvawrapper class to mimic
gpu's uva backend. what's more, this pr make some modifications to adapt
to the newer main branch.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
- vLLM main:
13397841ab
---------
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
This commit is contained in:
@@ -35,9 +35,14 @@ from vllm_ascend.worker.v2.utils import torch_cuda_wrapper
|
||||
class AclGraphManager(CudaGraphManager):
|
||||
"""ACL Graph Manager for Ascend NPUs."""
|
||||
|
||||
def __init__(self, vllm_config: VllmConfig, device: torch.device):
|
||||
def __init__(
|
||||
self,
|
||||
vllm_config: VllmConfig,
|
||||
use_mrope: bool,
|
||||
device: torch.device,
|
||||
):
|
||||
with torch_cuda_wrapper():
|
||||
super().__init__(vllm_config, device)
|
||||
super().__init__(vllm_config, use_mrope, device)
|
||||
|
||||
def capture_graph(
|
||||
self,
|
||||
|
||||
Reference in New Issue
Block a user