[Docs] Add vLLM-Kunlun New Model Adaptation Manual and Update Model Support (#211)
* [Docs] Fix app.readthedocs buliding Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com> * [Docs] Add vLLM-Kunlun New Model Adaptation Manual and Update Model Support Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com>
This commit is contained in:
@@ -2,14 +2,36 @@
|
||||
|
||||
## Generative Models
|
||||
|
||||
| Model | Support | W8A8 | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Piecewise Kunlun Graph |
|
||||
| :------------ | :------ | :--- | :--- | :-------------- | :-------------- | :------------ | :--------------------- |
|
||||
| Qwen3 | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
|
||||
| Qwen3-Moe | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Qwen3-Next | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Deepseek v3.2 | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| Model | Support | INT8(W8A8) | AWQ(W4A16) | GPTQ(WNA16) | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Kunlun Graph |
|
||||
| :------------ | :-----: | :--------: | :--------: | :---------: | :---: | :-------------: | :-------------: | :-----------: | :----------: |
|
||||
| Qwen2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
|
||||
| Qwen2.5 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
|
||||
| Qwen3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
|
||||
| Qwen3-Moe | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ |
|
||||
| Qwen3-Next | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ |
|
||||
| MiMo-V2-Flash | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| Llama2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
|
||||
| Llama3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
|
||||
| Llama3.1 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| gpt-oss | ✅ | ✅ | ✅ | ✅ | | ✅ | | | |
|
||||
| GLM4.5 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| GLM4.5Air | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| GLM4.7 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| GLM5 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| Kimi-K2 | ✅ | - | ✅ | - | | ✅ | | ✅ | ✅ |
|
||||
| DeepSeek-R1 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| DeepSeek-V3 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| DeepSeek-V3.2 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
|
||||
## Multimodal Language Models
|
||||
| Model | Support | W8A8 | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Piecewise Kunlun Graph |
|
||||
| :------- | :------ | :--- | :--- | :-------------- | :-------------- | :------------ | :--------------------- |
|
||||
| Qwen3-VL | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
|
||||
| Model | Support | INT8(W8A8) | AWQ(W4A16) | GPTQ(WNA16) | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Kunlun Graph |
|
||||
| :------------- | :-----: | :--------: | :--------: | :---------: | :---: | :-------------: | :-------------: | :-----------: | :----------: |
|
||||
| Qwen2-VL | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
|
||||
| Qwen2.5-VL | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
|
||||
| Qwen3-VL | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
|
||||
| Qwen3-VL-MoE | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ |
|
||||
| Qwen3-Omni-MoE | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ |
|
||||
| InternVL-2.5 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| InternVL-3.5 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
| InternS1 | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ | ✅ |
|
||||
|
||||
Reference in New Issue
Block a user