Logo
Explore Help
Register Sign In
EngineX/xc-llm-ascend
3
0
Fork 0
You've already forked xc-llm-ascend
Code Issues Pull Requests Projects Releases Wiki Activity
Files
db12c1e2c840a98570bf8a793f2a9786999e8c3f
xc-llm-ascend/docs/source/user_guide/deployment_guide/index.md

8 lines
98 B
Markdown
Raw Normal View History

[Doc] Added deploying on k8s with kthena (#4674) ### What this PR does / why we need it? [Kthena](https://github.com/volcano-sh/kthena) is a Kubernetes-native LLM inference platform that transforms how organizations deploy and manage Large Language Models in production. Built with declarative model lifecycle management and intelligent request routing, it provides high performance and enterprise-grade scalability for LLM inference workloads. The platform extends Kubernetes with purpose-built Custom Resource Definitions (CRDs) for managing LLM workloads, supporting multiple inference engines (vLLM, SGLang, Triton) and advanced serving patterns like prefill-decode disaggregation. This pr added a example on deloying llm on Ascend Kubernetes clusters. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>
2025-12-23 17:46:04 +08:00
# Deployment Guide
:::{toctree}
:caption: Deployment Guide
:maxdepth: 1
using_volcano_kthena
:::
Reference in New Issue Copy Permalink
Powered by Gitea Version: 1.24.3 Page: 122ms Template: 1ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API