xc-llm-ascend/vllm_ascend at 27e0f2c0355a45a8dbe897564edbe4e414a83dbb - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Angazenn 27e0f2c035 [Perf]Add YaRN custom op (#3355 )

### What this PR does / why we need it?
YaRN scaling is used to improve long seq accuracy for models like Qwen3.
In vLLM, YaRN scaling refers to `YaRNScalingRotaryEmbedding` class which
inherits from original `RotaryEmbedding`. Although
`YaRNScalingRotaryEmbedding` does not rewrite the `forward` function of
`RotaryEmbedding` , using YaRN on npu still run into the native
implementation of foward in `RotaryEmbedding`, rather than forward_oot
in vLLM-Ascend. Thus I register another custom op here to enable the oot
implementation for YaRN in vLLM-Ascend, similar to #3151 .

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: Angazenn <supperccell@163.com>

2025-10-11 08:36:20 +08:00

..

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

[BugFix] Fix ascend scheduler assert error (#3191 )

2025-09-28 18:22:08 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

【bugfix】fix connector register failed (#3335 )

2025-10-09 21:09:54 +08:00

FlashLB algorithm (#3042 )

2025-09-23 10:27:14 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 )

2025-10-09 14:12:46 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[Perf]Add YaRN custom op (#3355 )

2025-10-11 08:36:20 +08:00

[Misc] Clean up useless patch (#3320 )

2025-10-09 14:07:26 +08:00

[Feat] Load balance of tokens across experts in dummy_run (#3184 )

2025-10-10 09:00:07 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

bugfix for mtp (#3300 )

2025-10-09 19:22:46 +08:00

bugfix for mtp when running torchair in a2 (#3354 )

2025-10-10 23:07:24 +08:00

[Bugfix] Optimized exception throwing when stream captures exception (#3322 )

2025-10-10 17:09:28 +08:00

__init__.py

【bugfix】fix connector register failed (#3335 )

2025-10-09 21:09:54 +08:00

ascend_config.py

[1/N][Feat] Add weight prefetch feature for Attention layers (#3146 )

2025-10-09 20:38:39 +08:00

ascend_forward_context.py

Revert PTA upgrade PR (#3352 )

2025-10-10 14:09:53 +08:00

envs.py

Add DeepSeek V3.2 support (#3270 )

2025-09-30 03:25:58 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

utils.py

[Perf]Add YaRN custom op (#3355 )

2025-10-11 08:36:20 +08:00