Maintain seq_lens_sum to make more FlashInfer operations non-blocking (#1741)

This commit is contained in:
Lianmin Zheng
2024-10-21 01:43:16 -07:00
committed by GitHub
parent cf470fea32
commit 09603c6dc9
8 changed files with 98 additions and 43 deletions

View File

@@ -621,7 +621,6 @@ Please cite our paper, [SGLang: Efficient Execution of Structured Language Model
We also learned from the design and reused code from the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), and [LMQL](https://github.com/eth-sri/lmql).
<p align="center">
<a href="#sglangtop" target="_blank">
<bold>Back To Top </bold>