[Doc][Misc] Correcting the document and uploading the model deployment template (#8287)
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? Correcting the document and uploading the model deployment template ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
This commit is contained in:
@@ -332,7 +332,6 @@ An L0 `dump.json` contains forward I/O for modules together with parameters. Usi
|
||||
"data_name": "Module.conv2.Conv2d.forward.0.parameters.bias.pt"
|
||||
}
|
||||
}
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -389,7 +388,6 @@ An L1 `dump.json` records forward I/O for APIs. Using PyTorch's `relu` function
|
||||
"data_name": "Functional.relu.0.forward.output.0.pt"
|
||||
}
|
||||
]
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -111,7 +111,7 @@ sudo apt update
|
||||
sudo apt install libjemalloc2
|
||||
|
||||
# Configure jemalloc
|
||||
export LD_PRELOAD=/usr/lib/"$(uname -i)"-linux-gnu/libjemalloc.so.2 $LD_PRELOAD
|
||||
export LD_PRELOAD=/usr/lib/"$(uname -i)"-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
|
||||
```
|
||||
|
||||
#### 2.2. Tcmalloc
|
||||
|
||||
@@ -97,7 +97,8 @@ For local `dataset-path`, please set `hf-name` to its Hugging Face ID like
|
||||
First start serving your model:
|
||||
|
||||
```bash
|
||||
VLLM_USE_MODELSCOPE=True vllm serve Qwen/Qwen3-8B
|
||||
export VLLM_USE_MODELSCOPE=True
|
||||
vllm serve Qwen/Qwen3-8B
|
||||
```
|
||||
|
||||
Then run the benchmarking script:
|
||||
@@ -158,7 +159,7 @@ vllm bench throughput \
|
||||
If successful, you will see the following output
|
||||
|
||||
```shell
|
||||
Processed prompts: 100%|█| 10/10 [00:03<00:00, 2.74it/s, est. speed input: 351.02 toks/s, output: 351.02 t
|
||||
Processed prompts: 100%|█| 10/10 [00:03<00:00, 2.74it/s, est. speed input: 351.02 toks/s, output: 351.02 toks/s
|
||||
Throughput: 2.73 requests/s, 699.93 total tokens/s, 349.97 output tokens/s
|
||||
Total num prompt tokens: 1280
|
||||
Total num output tokens: 1280
|
||||
|
||||
Reference in New Issue
Block a user