chore: Initial support for input config files (#10534)

Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
This commit is contained in:
kushanam
2025-09-24 14:45:52 -07:00
committed by GitHub
parent c3faf2d6e6
commit d7b20dd65d
4 changed files with 350 additions and 0 deletions

View File

@@ -8,6 +8,23 @@ You can find all arguments by `python3 -m sglang.launch_server --help`
## Common launch commands
- To use a configuration file, create a YAML file with your server arguments and specify it with `--config`. CLI arguments will override config file values.
```bash
# Create config.yaml
cat > config.yaml << EOF
model-path: meta-llama/Meta-Llama-3-8B-Instruct
host: 0.0.0.0
port: 30000
tensor-parallel-size: 2
enable-metrics: true
log-requests: true
EOF
# Launch server with config file
python -m sglang.launch_server --config config.yaml
```
- To enable multi-GPU tensor parallelism, add `--tp 2`. If it reports the error "peer access is not supported between these two devices", add `--enable-p2p-check` to the server launch command.
```bash
@@ -65,6 +82,7 @@ Please consult the documentation below and [server_args.py](https://github.com/s
| Arguments | Description | Defaults |
|-----------|-------------|----------|
| `--config` | Path to a YAML configuration file containing server arguments. Arguments in the config file will be merged with command-line arguments, with CLI arguments taking precedence. | None |
| `--model-path` | The path of the model weights. This can be a local folder or a Hugging Face repo ID. | None |
| `--tokenizer-path` | The path of the tokenizer. | None |
| `--tokenizer-mode` | Tokenizer mode. 'auto' will use the fast tokenizer if available, and 'slow' will always use the slow tokenizer. | auto |