Files
xc-llm-ascend/vllm_ascend
linfeng-yuan 15592c0d48 [bugfix] fix accuracy prolem for deepseek V3/R1 models with torchair graph in long sequence predictions (#1331)
### What this PR does / why we need it?
Fix the issue of insufficient cached cosine and sine length in MLA's
TorchAir graph mode, which causes accuracy deviation during
long-sequence inference.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
We tested the accuracy of this patch with DeepSeek R1 e2e becnhmark
serving, and get 83.33 sore for AIME2024 dataset with DP4TP4EP16
setting.

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-06-23 09:52:27 +08:00
..
2025-04-22 08:57:25 +08:00