xc-llm-ascend/docs/source/user_guide/deployment_guide/index.md

# Deployment Guide

:::{toctree}
:caption: Deployment Guide
:maxdepth: 1
using_volcano_kthena
:::
[Doc] Added deploying on k8s with kthena (#4674) ### What this PR does / why we need it? [Kthena](https://github.com/volcano-sh/kthena) is a Kubernetes-native LLM inference platform that transforms how organizations deploy and manage Large Language Models in production. Built with declarative model lifecycle management and intelligent request routing, it provides high performance and enterprise-grade scalability for LLM inference workloads. The platform extends Kubernetes with purpose-built Custom Resource Definitions (CRDs) for managing LLM workloads, supporting multiple inference engines (vLLM, SGLang, Triton) and advanced serving patterns like prefill-decode disaggregation. This pr added a example on deloying llm on Ascend Kubernetes clusters. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com> 2025-12-23 17:46:04 +08:00			`# Deployment Guide`

			`:::{toctree}`
			`:caption: Deployment Guide`
			`:maxdepth: 1`
			`using_volcano_kthena`
			`:::`