From eefcbdd3533b065b950276ce23c8ab7a4f69bd99 Mon Sep 17 00:00:00 2001 From: Didier Durand Date: Tue, 11 Feb 2025 19:58:36 +0100 Subject: [PATCH] fix deepseek_v3 typo (#3497) --- docs/references/deepseek.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/references/deepseek.md b/docs/references/deepseek.md index dd29ad1cc..d54ec008b 100644 --- a/docs/references/deepseek.md +++ b/docs/references/deepseek.md @@ -40,7 +40,7 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o Multi-head Latent Attention for DeepSeek Series Models

-**Usage**: MLA optimization is enabled by defalut, to disable, use `--disable-mla`. +**Usage**: MLA optimization is enabled by default, to disable, use `--disable-mla`. **Reference**: Check [Blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/#deepseek-multi-head-latent-attention-mla-throughput-optimizations) and [Slides](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/lmsys_1st_meetup_deepseek_mla.pdf) for more details. @@ -52,7 +52,7 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o Data Parallelism Attention for DeepSeek Series Models

-**Usage**: This optimization is aimed at improving throughput and should be used for scenarios with high QPS (Queries Per Second). Data Parallelism Attention optimization can be enabeld by `--enable-dp-attention` for DeepSeek Series Models. +**Usage**: This optimization is aimed at improving throughput and should be used for scenarios with high QPS (Queries Per Second). Data Parallelism Attention optimization can be enabled by `--enable-dp-attention` for DeepSeek Series Models.

Data Parallelism Attention Performance Comparison