This guide provides instructions for using Kunlun Graph Mode with vLLM Kunlun. Please note that graph mode is available both on V1 and V0 Engine. All supported models are highly compatible with Kunlun Graph.
## Getting Started
From vLLM-KunLun-0.10.1.1 with V1 Engine, vLLM Kunlun will run models in graph mode by default to keep the same behavior with vLLM. If you hit any issues, please feel free to open an issue on GitHub and fallback to the eager mode temporarily by setting `enforce_eager=True` when initializing the model.
There is a graph mode supported by vLLM Kunlun:
- **KunlunGraph**: This is the default graph mode supported by vLLM Kunlun. In vLLM-KunLun-0.10.1.1, Qwen, GLM and InternVL series models are well tested.
## Using KunlunGraph
KunlunGraph is enabled by default. Take Qwen series models as an example, just set to use V1 Engine(default) is enough.
Offline example:
```python
import os
from vllm import LLM
model = LLM(model="models/Qwen3-8B-Instruct")
outputs = model.generate("Hello, how are you?")
```
Online example:
```shell
vllm serve Qwen3-8B-Instruct
```
## Using KunlunGraph
Enabling Kunlun Graph on the Kunlun platform requires the use of splitting ops.