Check and update the feature support table. - both multi-step and speculative decoding require adaptation of corresponding workers - prompt adapter (finetune method) require adaption in worker.py and model_runner.py Signed-off-by: MengqingCao <cmq0113@163.com>
800 B
800 B
Feature Support
| Feature | Supported | Note |
|---|---|---|
| Chunked Prefill | ✗ | Plan in 2025 Q1 |
| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q2 |
| LoRA | ✗ | Plan in 2025 Q1 |
| Prompt adapter | ✗ | Plan in 2025 Q1 |
| Speculative decoding | ✗ | Plan in 2025 Q1 |
| Pooling | ✗ | Plan in 2025 Q2 |
| Enc-dec | ✗ | Plan in 2025 Q2 |
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL) | Add more model support in 2025 Q1 |
| LogProbs | ✅ | |
| Prompt logProbs | ✅ | |
| Async output | ✅ | |
| Multi step scheduler | ✗ | Plan in 2025 Q1 |
| Best of | ✅ | |
| Beam search | ✅ | |
| Guided Decoding | ✗ | Plan in 2025 Q1 |
| Tensor Parallel | ✅ | Only "mp" supported now |
| Pipeline Parallel | ✅ | Only "mp" supported now |