Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684)

Co-authored-by: yukavio <kavioyu@gmail.com>
This commit is contained in:
Lianmin Zheng
2024-12-31 02:25:05 -08:00
committed by GitHub
parent 6c42fa229d
commit b0524c3789
7 changed files with 131 additions and 58 deletions

View File

@@ -92,7 +92,7 @@ jobs:
python3 test_data_parallelism.py
- name: Evaluate MLA accuracy (TP=2)
timeout-minutes: 20
timeout-minutes: 10
run: |
cd test/srt
python3 test_mla.py