[OAI] Add rid tracing for v1/embeddings and fix rid type in Chat (#6397)

This commit is contained in:
Chang Su
2025-05-18 13:05:38 -07:00
committed by GitHub
parent 6dc6b30637
commit 066cf44546
3 changed files with 8 additions and 3 deletions

View File

@@ -918,8 +918,8 @@ class FlashAttentionBackend(AttentionBackend):
and local_attn_metadata is not None
and (hasattr(layer, "use_irope") and layer.use_irope)
)
# When Spec Decode enabled, forward_decode would be called with two mode:
# When Spec Decode enabled, forward_decode would be called with two mode:
# 1. DRAFT_DECODE: we enable cascade attention when top_k > 1
# 2. IDLE: we dont need cascade attention, spec_info will be none in this case
use_cascade_attn = forward_batch.spec_info is not None and self.topk > 1