xc-llm-ascend/vllm_ascend at 08cfc7cb4bd10ce8c263473f538d10eac412b9fb - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

linfeng-yuan 15592c0d48 [bugfix] fix accuracy prolem for deepseek V3/R1 models with torchair graph in long sequence predictions (#1331 )

### What this PR does / why we need it?
Fix the issue of insufficient cached cosine and sine length in MLA's
TorchAir graph mode, which causes accuracy deviation during
long-sequence inference.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
We tested the accuracy of this patch with DeepSeek R1 e2e becnhmark
serving, and get 83.33 sore for AIME2024 dataset with DP4TP4EP16
setting.

Signed-off-by: linfeng-yuan <1102311262@qq.com>

2025-06-23 09:52:27 +08:00

..

[bugfix] fix accuracy prolem for deepseek V3/R1 models with torchair graph in long sequence predictions (#1331 )

2025-06-23 09:52:27 +08:00

[CI] Upgrade vllm to 0.9.1 (#1165 )

2025-06-11 16:33:11 +08:00

[Scheduler][MTP] Add support for speculative decoding in AsecendScheduler. (#943 )

2025-06-11 20:55:44 +08:00

device_allocator

[Bugfix] Remove cuda related lines and add additional pip mirror (#1252 )

2025-06-17 21:25:40 +08:00

[bugfix] some bugs maybe fail to run (#896 )

2025-06-03 11:07:33 +08:00

[Bugfix] fix import error (#600 )

2025-04-22 08:57:25 +08:00

[Bugfix] fix env variable in dbo (#1284 )

2025-06-23 09:07:57 +08:00

[perf]: support dual-batch overlap(dbo) for deepseek (#941 )

2025-06-07 16:46:58 +08:00

[Platform] Add initial experimental support for Altlas 300I series (#1333 )

2025-06-21 09:00:16 +08:00

[CI/UT][bugfix] fix v0 spec decode (#1321 )

2025-06-23 09:05:13 +08:00

static EPLB fix bug, add unit test (#1186 )

2025-06-18 19:46:56 +08:00

Spec decode support for V1 Engine (#874 )

2025-05-23 14:25:46 +08:00

[CI/UT][bugfix] fix v0 spec decode (#1321 )

2025-06-23 09:05:13 +08:00

__init__.py

[CI] Patch torch.library.infer_schema for fused moe ops to fix CI (#854 )

2025-05-14 19:49:09 +08:00

ascend_config.py

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

envs.py

[Bugfix][Spec Decode] Enable ACL_OP_INIT_MODE=1 directly only when using V0 spec decode (#1258 )

2025-06-18 17:50:20 +08:00

platform.py

[Platform] Add initial experimental support for Altlas 300I series (#1333 )

2025-06-21 09:00:16 +08:00

utils.py

[Platform] Add initial experimental support for Altlas 300I series (#1333 )

2025-06-21 09:00:16 +08:00