Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Jhin <jhinpan@umich.edu>
This commit is contained in:
@@ -8,12 +8,17 @@
|
||||
"\n",
|
||||
"SGLang now provides an EAGLE-based speculative decoding option. The implementation aims to maximize speed and efficiency and is considered to be among the fastest in open-source LLM engines.\n",
|
||||
"\n",
|
||||
"To run the following tests or benchmarks, you also need to install [**cutex**](https://pypi.org/project/cutex/): \n",
|
||||
"> ```bash\n",
|
||||
"> pip install cutex\n",
|
||||
"> ```\n",
|
||||
"\n",
|
||||
"### Performance Highlights\n",
|
||||
"\n",
|
||||
"- **Official EAGLE code** ([SafeAILab/EAGLE](https://github.com/SafeAILab/EAGLE)): ~200 tokens/s\n",
|
||||
"- **Standard SGLang Decoding**: ~156 tokens/s\n",
|
||||
"- **EAGLE Decoding in SGLang**: ~297 tokens/s\n",
|
||||
"- **EAGLE Decoding in SGLang (w/ `torch.compile`)**: ~316 tokens/s\n",
|
||||
"- Official EAGLE code ([SafeAILab/EAGLE](https://github.com/SafeAILab/EAGLE)): ~200 tokens/s\n",
|
||||
"- Standard SGLang Decoding: ~156 tokens/s\n",
|
||||
"- EAGLE Decoding in SGLang: ~297 tokens/s\n",
|
||||
"- EAGLE Decoding in SGLang (w/ `torch.compile`): ~316 tokens/s\n",
|
||||
"\n",
|
||||
"All benchmarks below were run on a single H100."
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user