[Doc] Update max_tokens to max_completion_tokens in all docs (#6248)
### What this PR does / why we need it?
Fix:
```
DeprecationWarning: max_tokens is deprecated in favor of the max_completion_tokens field.
```
- vLLM version: v0.14.1
- vLLM main:
d68209402d
Signed-off-by: shen-shanshan <467638484@qq.com>
This commit is contained in:
@@ -51,7 +51,7 @@ The following is a simple example of how to use sleep mode.
|
||||
# record npu memory use baseline in case other process is running
|
||||
used_bytes_baseline = total - free
|
||||
llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True)
|
||||
sampling_params = SamplingParams(temperature=0, max_tokens=10)
|
||||
sampling_params = SamplingParams(temperature=0, max_completion_tokens=10)
|
||||
output = llm.generate(prompt, sampling_params)
|
||||
|
||||
llm.sleep(level=1)
|
||||
@@ -110,7 +110,7 @@ The following is a simple example of how to use sleep mode.
|
||||
-d '{
|
||||
"model": "Qwen/Qwen2.5-0.5B-Instruct",
|
||||
"prompt": "The future of AI is",
|
||||
"max_tokens": 7,
|
||||
"max_completion_tokens": 7,
|
||||
"temperature": 0
|
||||
}'
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user