[Doc][Misc] Correcting the document and uploading the model deployment template (#8287)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
Correcting the document and uploading the model deployment template

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

---------

Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
This commit is contained in:
herizhen
2026-04-15 16:03:11 +08:00
committed by GitHub
parent 147b589f62
commit 95726d20eb
31 changed files with 536 additions and 308 deletions

View File

@@ -332,7 +332,6 @@ An L0 `dump.json` contains forward I/O for modules together with parameters. Usi
"data_name": "Module.conv2.Conv2d.forward.0.parameters.bias.pt"
}
}
},
}
}
}
@@ -389,7 +388,6 @@ An L1 `dump.json` records forward I/O for APIs. Using PyTorch's `relu` function
"data_name": "Functional.relu.0.forward.output.0.pt"
}
]
},
}
}
}

View File

@@ -111,7 +111,7 @@ sudo apt update
sudo apt install libjemalloc2
# Configure jemalloc
export LD_PRELOAD=/usr/lib/"$(uname -i)"-linux-gnu/libjemalloc.so.2 $LD_PRELOAD
export LD_PRELOAD=/usr/lib/"$(uname -i)"-linux-gnu/libjemalloc.so.2:$LD_PRELOAD
```
#### 2.2. Tcmalloc

View File

@@ -97,7 +97,8 @@ For local `dataset-path`, please set `hf-name` to its Hugging Face ID like
First start serving your model:
```bash
VLLM_USE_MODELSCOPE=True vllm serve Qwen/Qwen3-8B
export VLLM_USE_MODELSCOPE=True
vllm serve Qwen/Qwen3-8B
```
Then run the benchmarking script:
@@ -158,7 +159,7 @@ vllm bench throughput \
If successful, you will see the following output
```shell
Processed prompts: 100%|█| 10/10 [00:03<00:00, 2.74it/s, est. speed input: 351.02 toks/s, output: 351.02 t
Processed prompts: 100%|█| 10/10 [00:03<00:00, 2.74it/s, est. speed input: 351.02 toks/s, output: 351.02 toks/s
Throughput: 2.73 requests/s, 699.93 total tokens/s, 349.97 output tokens/s
Total num prompt tokens: 1280
Total num output tokens: 1280