Revert "fix some typos" (#6244)
This commit is contained in:
@@ -134,7 +134,7 @@ python3 -m sglang.launch_server \
|
||||
|
||||
SGLang supports the following quantization methods based on torchao `["int8dq", "int8wo", "fp8wo", "fp8dq-per_tensor", "fp8dq-per_row", "int4wo-32", "int4wo-64", "int4wo-128", "int4wo-256"]`.
|
||||
|
||||
Note: According to [this issue](https://github.com/sgl-project/sglang/issues/2219#issuecomment-2561890230), `"int8dq"` method currently has some bugs when using together with CUDA graph capture. So we suggest to disable the CUDA graph capture when using `"int8dq"` method. Namely, please use the following command:
|
||||
Note: According to [this issue](https://github.com/sgl-project/sglang/issues/2219#issuecomment-2561890230), `"int8dq"` method currently has some bugs when using together with cuda graph capture. So we suggest to disable cuda graph capture when using `"int8dq"` method. Namely, please use the following command:
|
||||
|
||||
```bash
|
||||
python3 -m sglang.launch_server \
|
||||
|
||||
Reference in New Issue
Block a user