[Docs] Add vLLM-Kunlun New Model Adaptation Manual and Update Model Support (#211)

* [Docs] Fix app.readthedocs buliding

Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com>

* [Docs] Add vLLM-Kunlun New Model Adaptation Manual and Update Model Support

Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com>
This commit is contained in:
Xinyu Dong
2026-02-26 10:06:58 +08:00
committed by GitHub
parent b82b6026d6
commit d425a0d0e9
3 changed files with 792 additions and 9 deletions

View File

@@ -2,14 +2,36 @@
## Generative Models
| Model | Support | W8A8 | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Piecewise Kunlun Graph |
| :------------ | :------ | :--- | :--- | :-------------- | :-------------- | :------------ | :--------------------- |
| Qwen3 | ✅ || ✅ | | | | |
| Qwen3-Moe | | ✅ | ✅ | | | | |
| Qwen3-Next | ✅ | ✅ | | ✅ | ✅ | | |
| Deepseek v3.2 | | ✅ | | ✅ | | | ✅ |
| Model | Support | INT8(W8A8) | AWQ(W4A16) | GPTQ(WNA16) | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Kunlun Graph |
| :------------ | :-----: | :--------: | :--------: | :---------: | :---: | :-------------: | :-------------: | :-----------: | :----------: |
| Qwen2 | | | | | ✅ | | | | |
| Qwen2.5 | | ✅ | | | | | | | |
| Qwen3 | ✅ | | | | ✅ | | | | |
| Qwen3-Moe | ✅ | ✅ | | | | | | | |
| Qwen3-Next | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ |
| MiMo-V2-Flash | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| Llama2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
| Llama3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
| Llama3.1 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| gpt-oss | ✅ | ✅ | ✅ | ✅ | | ✅ | | | |
| GLM4.5 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| GLM4.5Air | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| GLM4.7 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| GLM5 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| Kimi-K2 | ✅ | - | ✅ | - | | ✅ | | ✅ | ✅ |
| DeepSeek-R1 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| DeepSeek-V3 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| DeepSeek-V3.2 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
## Multimodal Language Models
| Model | Support | W8A8 | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Piecewise Kunlun Graph |
| :------- | :------ | :--- | :--- | :-------------- | :-------------- | :------------ | :--------------------- |
| Qwen3-VL | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| Model | Support | INT8(W8A8) | AWQ(W4A16) | GPTQ(WNA16) | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Kunlun Graph |
| :------------- | :-----: | :--------: | :--------: | :---------: | :---: | :-------------: | :-------------: | :-----------: | :----------: |
| Qwen2-VL | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
| Qwen2.5-VL | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
| Qwen3-VL | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
| Qwen3-VL-MoE | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ |
| Qwen3-Omni-MoE | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ |
| InternVL-2.5 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| InternVL-3.5 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
| InternS1 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |