[Doc] Added deploying on k8s with kthena (#4674)

### What this PR does / why we need it?
[Kthena](https://github.com/volcano-sh/kthena) is a Kubernetes-native
LLM inference platform that transforms how organizations deploy and
manage Large Language Models in production. Built with declarative model
lifecycle management and intelligent request routing, it provides high
performance and enterprise-grade scalability for LLM inference
workloads.

The platform extends Kubernetes with purpose-built Custom Resource
Definitions (CRDs) for managing LLM workloads, supporting multiple
inference engines (vLLM, SGLang, Triton) and advanced serving patterns
like prefill-decode disaggregation.

This pr added a example on deloying llm on Ascend Kubernetes clusters.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>

This commit is contained in:

Tiger Xu / Zhonghu Xu

2025-12-23 17:46:04 +08:00

committed by

GitHub

parent 22138e2727

commit cb963c53a5

3 changed files with 441 additions and 0 deletions

7

docs/source/user_guide/deployment_guide/index.md Normal file

View File

@@ -0,0 +1,7 @@
 # Deployment Guide
 :::{toctree}
 :caption: Deployment Guide
 :maxdepth: 1
 using_volcano_kthena
 :::

[Doc] Added deploying on k8s with kthena (#4674)

7 docs/source/user_guide/deployment_guide/index.md Normal file Unescape Escape View File

7

docs/source/user_guide/deployment_guide/index.md Normal file

View File