Add examples in sampling parameters (#4039)
This commit is contained in:
@@ -52,7 +52,7 @@ These adjustments should return the desired accuracy.
|
||||
## Extending Evaluation Capabilities
|
||||
|
||||
1. **Contribute New Benchmarks**
|
||||
* Follow our [contribution guidelines](https://docs.sglang.ai/references/contribution_guide.html) to add new test scripts
|
||||
* Follow our [contribution guidelines](../references/contribution_guide.md) to add new test scripts
|
||||
2. **Request Implementations**
|
||||
* Feel free to open an issue describing your evaluation needs
|
||||
3. **Use Alternative Tools**
|
||||
|
||||
@@ -39,7 +39,7 @@ Again, please go through the entire documentation to confirm your system is usin
|
||||
|
||||
## Installing SGLang
|
||||
|
||||
For general installation instructions, see the official [SGLang Installation Docs](https://docs.sglang.ai/start/install.html). Below are the AMD-specific steps summarized for convenience.
|
||||
For general installation instructions, see the official [SGLang Installation Docs](../start/install.md). Below are the AMD-specific steps summarized for convenience.
|
||||
|
||||
### Install from Source
|
||||
|
||||
|
||||
@@ -43,7 +43,7 @@ python -m sglang.launch_server \
|
||||
--mem-fraction-static 0.8 \
|
||||
--context-length 8192
|
||||
```
|
||||
The quantization and limited context length (`--dtype half --context-length 8192`) are due to the limited computational resources in [Nvidia jetson kit](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/). A detailed explanation can be found in [Server Arguments](https://docs.sglang.ai/backend/server_arguments.html).
|
||||
The quantization and limited context length (`--dtype half --context-length 8192`) are due to the limited computational resources in [Nvidia jetson kit](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/). A detailed explanation can be found in [Server Arguments](../backend/server_arguments.md).
|
||||
|
||||
After launching the engine, refer to [Chat completions](https://docs.sglang.ai/backend/openai_api_completions.html#Usage) to test the usability.
|
||||
* * * * *
|
||||
@@ -66,7 +66,7 @@ This enables TorchAO's int4 weight-only quantization with a 128-group size. The
|
||||
* * * * *
|
||||
Structured output with XGrammar
|
||||
-------------------------------
|
||||
Please refer to [SGLang doc structured output](https://docs.sglang.ai/backend/structured_outputs.html).
|
||||
Please refer to [SGLang doc structured output](../backend/structured_outputs.ipynb).
|
||||
* * * * *
|
||||
|
||||
Thanks to the support from [shahizat](https://github.com/shahizat).
|
||||
|
||||
Reference in New Issue
Block a user