Go to file

tanjunchen 0efa514bd9 1.add CODE_OF_CONDUCT.md to vLLM Kunlun

2.add MAINTAINERS.md to vLLM Kunlun
3.add MAINTAINERS.md to vLLM Kunlun
4.add contributing guide to vLLM Kunlun

Signed-off-by: tanjunchen <tanjunchen20@gmail.com>

2025-12-27 19:50:12 +08:00

docs

[Kernel] Replace native torch solve_tril by solve_tril_fwd kernel op

2025-12-22 17:37:19 +08:00

vllm_kunlun

Merge pull request #52 from liwei109/awq_gptq

2025-12-24 17:05:26 +08:00

.gitignore

提交vllm0.11.0开发分支

2025-12-10 17:51:24 +08:00

.python-version

Initial commit for vLLM-Kunlun Plugin

2025-12-10 12:05:39 +08:00

build.sh

Initial commit for vLLM-Kunlun Plugin

2025-12-10 12:05:39 +08:00

CHANGELOG.md

提交vllm0.11.0开发分支

2025-12-10 17:51:24 +08:00

ci.yml

提交vllm0.11.0开发分支

2025-12-10 17:51:24 +08:00

CODE_OF_CONDUCT.md

1.add CODE_OF_CONDUCT.md to vLLM Kunlun

2025-12-27 19:50:12 +08:00

CONTRIBUTING.md

1.add CODE_OF_CONDUCT.md to vLLM Kunlun

2025-12-27 19:50:12 +08:00

MAINTAINERS.md

1.add CODE_OF_CONDUCT.md to vLLM Kunlun

2025-12-27 19:50:12 +08:00

pyproject.toml

提交vllm0.11.0开发分支

2025-12-10 17:51:24 +08:00

README.md

1.add CODE_OF_CONDUCT.md to vLLM Kunlun

2025-12-27 19:50:12 +08:00

requirements.txt

[dev] support AWQ/GPTQ quantization for dense models

2025-12-24 13:46:06 +08:00

setup_env.sh

提交vllm0.11.0开发分支

2025-12-10 17:51:24 +08:00

setup.py

Initial commit for vLLM-Kunlun Plugin

2025-12-10 12:05:39 +08:00

README.md

Documentation | Users Forum | slack |

Latest News 🔥

[2025/12] Initial release of vLLM Kunlun

Overview

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU. It is the recommended approach for integrating the Kunlun backend within the vLLM community, adhering to the principles outlined in the RFC Hardware pluggable. This plugin provides a hardware-pluggable interface that decouples the integration of the Kunlun XPU with vLLM.

By utilizing the vLLM Kunlun plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal LLMs, can run effortlessly on the Kunlun XPU.

Prerequisites

Hardware: Kunlun3 P800
OS: Ubuntu 22.04
Software:
- Python >=3.10
- PyTorch ≥ 2.5.1
- vLLM (same version as vllm-kunlun)

Supported Models

Generaltive Models

Model	Support	Quantization	LoRA	Piecewise Kunlun Graph
Qwen3	✅		✅	✅
Qwen3-Moe	✅	✅	✅	✅
Qwen3-Next	✅	✅	✅	✅

Multimodal Language Models

Model	Support	Quantization	LoRA	Piecewise Kunlun Graph	Note
Qwen3-VL	✅			✅

Performance Visualization 🚀

High-performance computing at work: How different models perform on the Kunlun3 P800.

Current environment: 16-way concurrency, input/output size 2048.

Getting Started

Please use the following recommended versions to get started quickly:

Version	Release type	Doc
v0.11.0	Latest stable version	QuickStart and Installation for more details

Contribute to vLLM Kunlun

If you're interested in contributing to this project, please read Contributing to vLLM Kunlun to vLLM Kunlun.

If you're interested in contributing to this project, please read Contributing to vLLM Kunlun.

Star History 🔥

We opened the project at Dec 8, 2025. We love open source and collaboration ❤️

Sponsors 👋

We sincerely appreciate the KunLunXin team for their support in providing GPU resources, which enabled efficient model adaptation debugging, comprehensive end-to-end testing, and broader model compatibility.

License

Apache License 2.0, as found in the LICENSE file.