[Perf]Add YaRN custom op (#3355)
### What this PR does / why we need it? YaRN scaling is used to improve long seq accuracy for models like Qwen3. In vLLM, YaRN scaling refers to `YaRNScalingRotaryEmbedding` class which inherits from original `RotaryEmbedding`. Although `YaRNScalingRotaryEmbedding` does not rewrite the `forward` function of `RotaryEmbedding` , using YaRN on npu still run into the native implementation of foward in `RotaryEmbedding`, rather than forward_oot in vLLM-Ascend. Thus I register another custom op here to enable the oot implementation for YaRN in vLLM-Ascend, similar to #3151 . ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: Angazenn <supperccell@163.com>
This commit is contained in:
@@ -508,7 +508,8 @@ def register_ascend_customop(vllm_config: Optional[VllmConfig] = None):
|
||||
AscendQKVParallelLinear,
|
||||
AscendRowParallelLinear)
|
||||
from vllm_ascend.ops.rotary_embedding import (
|
||||
AscendDeepseekScalingRotaryEmbedding, AscendRotaryEmbedding)
|
||||
AscendDeepseekScalingRotaryEmbedding, AscendRotaryEmbedding,
|
||||
AscendYaRNRotaryEmbedding)
|
||||
from vllm_ascend.ops.vocab_parallel_embedding import (
|
||||
AscendLogitsProcessor, AscendParallelLMHead,
|
||||
AscendVocabParallelEmbedding)
|
||||
@@ -520,6 +521,7 @@ def register_ascend_customop(vllm_config: Optional[VllmConfig] = None):
|
||||
"RotaryEmbedding": AscendRotaryEmbedding,
|
||||
"ColumnParallelLinear": AscendColumnParallelLinear,
|
||||
"RowParallelLinear": AscendRowParallelLinear,
|
||||
"YaRNScalingRotaryEmbedding": AscendYaRNRotaryEmbedding,
|
||||
"MergedColumnParallelLinear": AscendMergedColumnParallelLinear,
|
||||
"QKVParallelLinear": AscendQKVParallelLinear,
|
||||
"DeepseekScalingRotaryEmbedding": AscendDeepseekScalingRotaryEmbedding,
|
||||
|
||||
Reference in New Issue
Block a user