chore: Initial support for input config files (#10534)
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>
This commit is contained in:
@@ -8,6 +8,23 @@ You can find all arguments by `python3 -m sglang.launch_server --help`
|
||||
|
||||
## Common launch commands
|
||||
|
||||
- To use a configuration file, create a YAML file with your server arguments and specify it with `--config`. CLI arguments will override config file values.
|
||||
|
||||
```bash
|
||||
# Create config.yaml
|
||||
cat > config.yaml << EOF
|
||||
model-path: meta-llama/Meta-Llama-3-8B-Instruct
|
||||
host: 0.0.0.0
|
||||
port: 30000
|
||||
tensor-parallel-size: 2
|
||||
enable-metrics: true
|
||||
log-requests: true
|
||||
EOF
|
||||
|
||||
# Launch server with config file
|
||||
python -m sglang.launch_server --config config.yaml
|
||||
```
|
||||
|
||||
- To enable multi-GPU tensor parallelism, add `--tp 2`. If it reports the error "peer access is not supported between these two devices", add `--enable-p2p-check` to the server launch command.
|
||||
|
||||
```bash
|
||||
@@ -65,6 +82,7 @@ Please consult the documentation below and [server_args.py](https://github.com/s
|
||||
|
||||
| Arguments | Description | Defaults |
|
||||
|-----------|-------------|----------|
|
||||
| `--config` | Path to a YAML configuration file containing server arguments. Arguments in the config file will be merged with command-line arguments, with CLI arguments taking precedence. | None |
|
||||
| `--model-path` | The path of the model weights. This can be a local folder or a Hugging Face repo ID. | None |
|
||||
| `--tokenizer-path` | The path of the tokenizer. | None |
|
||||
| `--tokenizer-mode` | Tokenizer mode. 'auto' will use the fast tokenizer if available, and 'slow' will always use the slow tokenizer. | auto |
|
||||
|
||||
Reference in New Issue
Block a user