[core] Support capture custom ops into aclgraph (#2113)
### What this PR does / why we need it?
Thanks to the PR https://github.com/vllm-project/vllm-ascend/pull/426
make vllm-ascend support the aclgraph inference to reduce the host
overhead. However, the capability of aclgraph strongly relies on the
functionality provided by `torch.compile`, which is the key feature
supported in torch 2.x . Therefore, capture custom op into aclgraph is
only possible when it can be recognize and captured by `torch.compile`.
In this PR, we register the meta implementation of current custom ops to
enable the fx graph capture. And by doing that, insert those custom ops
into aclgraph become a natural thing to the ascend runtime.
### Does this PR introduce _any_ user-facing change?
No user face change.
### How was this patch tested?
Tested in unittest, we will integrate the `rotary_embedding` op into a
small custom model and use `torch.compile` and aclgraph to capture and
replay it to verify its functionality.
- vLLM version: v0.10.0
- vLLM main:
1b99028069
---------
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
This commit is contained in:
@@ -27,6 +27,17 @@
|
||||
|
||||
namespace vllm_ascend {
|
||||
|
||||
AscendType get_dtype_from_torch(at::ScalarType scalarType)
|
||||
{
|
||||
if (scalarType == at::ScalarType::Float) {
|
||||
return AscendType::FP32;
|
||||
} else if (scalarType == at::ScalarType::BFloat16) {
|
||||
return AscendType::BF16;
|
||||
} else {
|
||||
return AscendType::FP16;
|
||||
}
|
||||
}
|
||||
|
||||
std::tuple<at::Tensor, at::Tensor> rotary_embedding(at::Tensor &positions, at::Tensor &query, at::Tensor &key,
|
||||
int64_t head_size, at::Tensor &cos_sin_cache, bool is_neox)
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user