Files
xc-llm-ascend/docs
Shaoxu Cheng ba1c82e758 [DOC] Add explaination of 310p special param: max-model-len (#7065)
### What this PR does / why we need it?

This PR updates the documentation for running vLLM on Atlas 300I series
(310p) hardware. It adds a warning to explicitly set `--max-model-len`
to prevent potential Out-of-Memory (OOM) errors that can occur with the
default configuration.

The example commands and Python scripts for online and offline inference
have been updated to:
- Include `--max-model-len 4096` (or `max_model_len=4096`).
- Remove the `compilation-config` parameter, which is no longer
necessary for 310p devices.

These changes ensure users have a clearer and more stable experience
when using vLLM on Atlas 300I hardware.

### Does this PR introduce _any_ user-facing change?
No, this is a documentation-only update.

### How was this patch tested?
The changes are to documentation and do not require testing.


- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: Tflowers-0129 <2906339855@qq.com>
2026-03-09 16:54:43 +08:00
..

vLLM Ascend Plugin documents

Live doc: https://docs.vllm.ai/projects/ascend

Build the docs

# Install dependencies.
pip install -r requirements-docs.txt

# Build the docs.
make clean
make html

# Build the docs with translation
make intl

# Open the docs with your browser
python -m http.server -d _build/html/

Launch your browser and open: