[v0.18.0][BugFix] Fix DSV3.1 W4A8 TTFT degradation (#8674)
### What this PR does / why we need it? Fix TTFT degradation on Deepseek-V3.1-W4A8. Revert change of `balance_flag` in https://github.com/vllm-project/vllm-ascend/pull/7611. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.18.0 Signed-off-by: Wangbingjie <wangbj1207@126.com>
This commit is contained in:
@@ -266,7 +266,7 @@ class BalanceScheduler(Scheduler):
|
||||
if len(self.running) == self.max_num_running_reqs:
|
||||
break
|
||||
|
||||
balance_flag = max(t.item() for t in self.balance_queue) >= self.max_num_running_reqs - 1
|
||||
balance_flag = max(t.item() for t in self.balance_queue) == self.max_num_running_reqs
|
||||
if balance_flag:
|
||||
break
|
||||
|
||||
|
||||
Reference in New Issue
Block a user