[Doc] Update max_tokens to max_completion_tokens in all docs (#6248)

### What this PR does / why we need it? Fix: ``` DeprecationWarning: max_tokens is deprecated in favor of the max_completion_tokens field. ``` - vLLM version: v0.14.1 - vLLM main: d68209402d Signed-off-by: shen-shanshan <467638484@qq.com>
2026-01-26 11:57:40 +08:00
parent 418fccf0bc
commit e3eefdecbd
28 changed files with 43 additions and 43 deletions
--- a/docs/source/user_guide/feature_guide/sleep_mode.md
+++ b/docs/source/user_guide/feature_guide/sleep_mode.md
@@ -51,7 +51,7 @@ The following is a simple example of how to use sleep mode.
        # record npu memory use baseline in case other process is running
        used_bytes_baseline = total - free
        llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True)
-        sampling_params = SamplingParams(temperature=0, max_tokens=10)
+        sampling_params = SamplingParams(temperature=0, max_completion_tokens=10)
        output = llm.generate(prompt, sampling_params)

        llm.sleep(level=1)
@@ -110,7 +110,7 @@ The following is a simple example of how to use sleep mode.
        -d '{
            "model": "Qwen/Qwen2.5-0.5B-Instruct",
            "prompt": "The future of AI is",
-            "max_tokens": 7,
+            "max_completion_tokens": 7,
            "temperature": 0
        }'
    ```