xc-llm-kunlun/README.md

![vLLM Kunlun Logo](vllm_kunlun/patches/vLLM_Kunlun.jpg)

<p align="center">
  <a href="./docs/_build/html/documentation.html"><b>Documentation</b></a> |
  <a href=""><b>Users Forum</b></a> |
  <a href="join.slack.com/t/vllm-kunlun/shared_invite/zt-3iinb8u5z-FcqZKbNNdMJ_32fHmipzvwjoin.slack.com/t/vllm-kunlun/shared_invite/zt-3iinb8u5z-FcqZKbNNdMJ_32fHmipzvw"><b>slack</b></a> |
</p>

---

## Latest News 🔥
- [2025/12] Initial release of vLLM Kunlun

---

# Overview

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU. It is the recommended approach for integrating the Kunlun backend within the vLLM community, adhering to the principles outlined in the [RFC Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162). This plugin provides a hardware-pluggable interface that decouples the integration of the Kunlun XPU with vLLM.

By utilizing the vLLM Kunlun plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal LLMs, can run effortlessly on the Kunlun XPU.

---
## Prerequisites

- **Hardware**: Kunlun3 P800 
- **OS**: Ubuntu 22.04 
- **Software**:
  - Python >=3.10
  - PyTorch ≥ 2.5.1
  - vLLM (same version as vllm-kunlun)

---
## Supported Models

<h3>Generaltive Models</h3>
<table>
  <thead>
    <tr>
      <th width="20%">Model</th>
      <th width="12%">Support</th>
      <th width="15%">Quantization</th>
      <th width="10%">LoRA</th>
      <th width="20%">Piecewise Kunlun Graph</th>
      <th width="23%">Note</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td class="model-name">Qwen3</td>
      <td class="status-support">✅</td>
      <td></td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td></td>
    </tr>
    <tr>
      <td class="model-name">Qwen3-Moe</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td></td>
    </tr>
    <tr>
      <td class="model-name">Qwen3-Next</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td></td>
    </tr>
  </tbody>
</table>

<h3>Multimodal Language Models</h3>
<table>
  <thead>
    <tr>
      <th width="20%">Model</th>
      <th width="12%">Support</th>
      <th width="15%">Quantization</th>
      <th width="10%">LoRA</th>
      <th width="20%">Piecewise Kunlun Graph</th>
      <th width="23%">Note</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td class="model-name">Qwen3-VL</td>
      <td class="status-support">✅</td>
      <td></td>
      <td></td>
      <td class="status-support">✅</td>
      <td></td>
    </tr>
  </tbody>
</table>


## Performance Visualization 🚀
### High-performance computing at work: How different models perform on the Kunlun3 P800.

Current environment: 16-way concurrency, input/output size 2048.


![Models and tgs](./vllm_kunlun/patches/performance.png)

## Getting Started

Please use the following recommended versions to get started quickly:

| Version | Release type | Doc |
|----------|---------------|-----|
| v0.11.0 | Latest stable version | [QuickStart](./docs/_build/html/quick_start.html) and [Installation](./docs/_build/html/installation.html) for more details |

---

## Contribute to vLLM Kunlun

If you're interested in contributing to this project, please read [Contributing to vLLM Kunlun](CONTRIBUTING.md) to vLLM Kunlun.

If you're interested in contributing to this project, please read [Contributing](CONTRIBUTING.md) to vLLM Kunlun.

## Star History 🔥

We opened the project at Dec 8, 2025. We love open source and collaboration ❤️

[![Star History Chart](https://api.star-history.com/svg?repos=baidu/vLLM-Kunlun&type=Date)](https://www.star-history.com/#baidu/vLLM-Kunlun&Date)

## Sponsors 👋

We sincerely appreciate the [**KunLunXin**](https://www.kunlunxin.com/) team for their support in providing GPU resources, which enabled efficient model adaptation debugging, comprehensive end-to-end testing, and broader model compatibility.

## License

Apache License 2.0, as found in the [LICENSE](./LICENSE) file.
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00			`![vLLM Kunlun Logo](vllm_kunlun/patches/vLLM_Kunlun.jpg)`

			`<p align="center">`
提交vllm0.11.0开发分支 2025-12-10 17:51:24 +08:00			`<a href="./docs/_build/html/documentation.html"><b>Documentation</b></a> \|`
			`<a href=""><b>Users Forum</b></a> \|`
			`<a href="join.slack.com/t/vllm-kunlun/shared_invite/zt-3iinb8u5z-FcqZKbNNdMJ_32fHmipzvwjoin.slack.com/t/vllm-kunlun/shared_invite/zt-3iinb8u5z-FcqZKbNNdMJ_32fHmipzvw"><b>slack</b></a> \|`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00			`</p>`

			`---`

1.add CODE_OF_CONDUCT.md to vLLM Kunlun 2.add MAINTAINERS.md to vLLM Kunlun 3.add MAINTAINERS.md to vLLM Kunlun 4.add contributing guide to vLLM Kunlun Signed-off-by: tanjunchen <tanjunchen20@gmail.com> 2025-12-23 21:57:30 +08:00			`## Latest News 🔥`
Update README.md 2025-12-10 21:46:18 +08:00			`- [2025/12] Initial release of vLLM Kunlun`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00
			`---`

			`# Overview`

1.add CODE_OF_CONDUCT.md to vLLM Kunlun 2.add MAINTAINERS.md to vLLM Kunlun 3.add MAINTAINERS.md to vLLM Kunlun 4.add contributing guide to vLLM Kunlun Signed-off-by: tanjunchen <tanjunchen20@gmail.com> 2025-12-23 21:57:30 +08:00			`vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU. It is the recommended approach for integrating the Kunlun backend within the vLLM community, adhering to the principles outlined in the [RFC Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162). This plugin provides a hardware-pluggable interface that decouples the integration of the Kunlun XPU with vLLM.`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00
			`By utilizing the vLLM Kunlun plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal LLMs, can run effortlessly on the Kunlun XPU.`

			`---`
			`## Prerequisites`

			`- Hardware: Kunlun3 P800`
			`- OS: Ubuntu 22.04`
			`- Software:`
			`- Python >=3.10`
			`- PyTorch ≥ 2.5.1`
			`- vLLM (same version as vllm-kunlun)`

			`---`
			`## Supported Models`

			`<h3>Generaltive Models</h3>`
			`<table>`
			`<thead>`
			`<tr>`
提交vllm0.11.0开发分支 2025-12-10 17:51:24 +08:00			`<th width="20%">Model</th>`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00			`<th width="12%">Support</th>`
			`<th width="15%">Quantization</th>`
			`<th width="10%">LoRA</th>`
			`<th width="20%">Piecewise Kunlun Graph</th>`
提交vllm0.11.0开发分支 2025-12-10 17:51:24 +08:00			`<th width="23%">Note</th>`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00			`</tr>`
			`</thead>`
			`<tbody>`
			`<tr>`
			`<td class="model-name">Qwen3</td>`
			`<td class="status-support">✅</td>`
			`<td></td>`
			`<td class="status-support">✅</td>`
			`<td class="status-support">✅</td>`
			`<td></td>`
			`</tr>`
			`<tr>`
提交vllm0.11.0开发分支 2025-12-10 17:51:24 +08:00			`<td class="model-name">Qwen3-Moe</td>`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00			`<td class="status-support">✅</td>`
			`<td class="status-support">✅</td>`
			`<td class="status-support">✅</td>`
			`<td class="status-support">✅</td>`
			`<td></td>`
			`</tr>`
			`<tr>`
提交vllm0.11.0开发分支 2025-12-10 17:51:24 +08:00			`<td class="model-name">Qwen3-Next</td>`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00			`<td class="status-support">✅</td>`
			`<td class="status-support">✅</td>`
			`<td class="status-support">✅</td>`
			`<td class="status-support">✅</td>`
			`<td></td>`
			`</tr>`
			`</tbody>`
			`</table>`

			`<h3>Multimodal Language Models</h3>`
			`<table>`
			`<thead>`
			`<tr>`
			`<th width="20%">Model</th>`
			`<th width="12%">Support</th>`
			`<th width="15%">Quantization</th>`
			`<th width="10%">LoRA</th>`
			`<th width="20%">Piecewise Kunlun Graph</th>`
			`<th width="23%">Note</th>`
			`</tr>`
			`</thead>`
			`<tbody>`
			`<tr>`
提交vllm0.11.0开发分支 2025-12-10 17:51:24 +08:00			`<td class="model-name">Qwen3-VL</td>`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00			`<td class="status-support">✅</td>`
			`<td></td>`
			`<td></td>`
			`<td class="status-support">✅</td>`
			`<td></td>`
			`</tr>`
			`</tbody>`
			`</table>`



			`## Performance Visualization 🚀`
			`### High-performance computing at work: How different models perform on the Kunlun3 P800.`

			`Current environment: 16-way concurrency, input/output size 2048.`


			`![Models and tgs](./vllm_kunlun/patches/performance.png)`

			`## Getting Started`

			`Please use the following recommended versions to get started quickly:`

			`\| Version \| Release type \| Doc \|`
			`\|----------\|---------------\|-----\|`
提交vllm0.11.0开发分支 2025-12-10 17:51:24 +08:00			`\| v0.11.0 \| Latest stable version \| [QuickStart](./docs/_build/html/quick_start.html) and [Installation](./docs/_build/html/installation.html) for more details \|`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00
			`---`

1.add CODE_OF_CONDUCT.md to vLLM Kunlun 2.add MAINTAINERS.md to vLLM Kunlun 3.add MAINTAINERS.md to vLLM Kunlun 4.add contributing guide to vLLM Kunlun Signed-off-by: tanjunchen <tanjunchen20@gmail.com> 2025-12-23 21:57:30 +08:00			`## Contribute to vLLM Kunlun`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00
1.add CODE_OF_CONDUCT.md to vLLM Kunlun 2.add MAINTAINERS.md to vLLM Kunlun 3.add MAINTAINERS.md to vLLM Kunlun 4.add contributing guide to vLLM Kunlun Signed-off-by: tanjunchen <tanjunchen20@gmail.com> 2025-12-23 21:57:30 +08:00			`If you're interested in contributing to this project, please read [Contributing to vLLM Kunlun](CONTRIBUTING.md) to vLLM Kunlun.`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00
1.add CODE_OF_CONDUCT.md to vLLM Kunlun 2.add MAINTAINERS.md to vLLM Kunlun 3.add MAINTAINERS.md to vLLM Kunlun 4.add contributing guide to vLLM Kunlun Signed-off-by: tanjunchen <tanjunchen20@gmail.com> 2025-12-23 21:57:30 +08:00			`If you're interested in contributing to this project, please read [Contributing](CONTRIBUTING.md) to vLLM Kunlun.`

			`## Star History 🔥`

			`We opened the project at Dec 8, 2025. We love open source and collaboration ❤️`

			`[![Star History Chart](https://api.star-history.com/svg?repos=baidu/vLLM-Kunlun&type=Date)](https://www.star-history.com/#baidu/vLLM-Kunlun&Date)`

			`## Sponsors 👋`

			`We sincerely appreciate the [KunLunXin](https://www.kunlunxin.com/) team for their support in providing GPU resources, which enabled efficient model adaptation debugging, comprehensive end-to-end testing, and broader model compatibility.`
Initial commit for vLLM-Kunlun Plugin 2025-12-10 12:05:39 +08:00
			`## License`

Update README.md 2025-12-10 21:46:18 +08:00			`Apache License 2.0, as found in the [LICENSE](./LICENSE) file.`