### What this PR does / why we need it? This PR updates the GLM-5 documentation to include: - Information about the first supported version (`vllm-ascend:v0.17.0rc1`). - Updated `--additional-config` parameters to use the new nested `ascend_compilation_config` structure. - Added `VLLM_ASCEND_BALANCE_SCHEDULING` environment variable to deployment scripts. - Improved formatting of deployment steps. - A new "Notice" section explaining optimization environment variables (`VLLM_ASCEND_ENABLE_FLASHCOMM1`, `VLLM_ASCEND_ENABLE_FUSED_MC2`, `VLLM_ASCEND_ENABLE_MLAPO`). - A "Best Practices" section for prefill-decode disaggregation. - An "FAQ" section addressing common tokenizer issues and function calling configuration. ### Does this PR introduce _any_ user-facing change? No, this is a documentation-only update. ### How was this patch tested? Documentation changes were verified for correctness and formatting. --------- Signed-off-by: Zhu Jiyang <zhujiyang2@huawei.com>
vLLM Ascend Plugin documents
Live doc: https://docs.vllm.ai/projects/ascend
Build the docs
# Install dependencies.
pip install -r requirements-docs.txt
# Build the docs.
make clean
make html
# Build the docs with translation
make intl
# Open the docs with your browser
python -m http.server -d _build/html/
Launch your browser and open:
- English version: http://localhost:8000
- Chinese version: http://localhost:8000/zh_CN