diff --git a/docs/references/deepseek.md b/docs/references/deepseek.md index 913395357..2bdceb904 100644 --- a/docs/references/deepseek.md +++ b/docs/references/deepseek.md @@ -34,6 +34,10 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o **Usage**: This optimization is aimed at improving throughput and should be used for scenarios with high QPS (Queries Per Second). Data Parallelism Attention optimization can be enabeld by `--enable-dp-attention` for DeepSeek Series Models. +

+ Data Parallelism Attention Performance Comparison +

+ **Reference**: Check [Blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#data-parallelism-attention-for-deepseek-models). ## Multi Node Tensor Parallelism