1. remove useluss code in attention.py 2. multistep now using StatefulModelInputForNPU and do not use StatefulModelInput Signed-off-by: new-TonyWang <wangtonyyu222@gmail.com>
2.8 KiB
2.8 KiB
Feature Support
| Feature | Supported | CI Coverage | Guidance Document | Current Status | Next Step |
|---|---|---|---|---|---|
| Chunked Prefill | ❌ | NA | Plan in 2025.03.30 | ||
| Automatic Prefix Caching | ❌ | NA | Plan in 2025.03.30 | ||
| LoRA | ❌ | NA | Plan in 2025.06.30 | ||
| Prompt adapter | ❌ | NA | Plan in 2025.06.30 | ||
| Speculative decoding | ✅ | Basic functions available | Need fully test | ||
| Pooling | ✅ | Basic functions available(Bert) | Need fully test and add more models support | ||
| Enc-dec | ❌ | NA | Plan in 2025.06.30 | ||
| Multi Modality | ✅ | ✅ | Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL) | Improve performance, and add more models support | |
| LogProbs | ✅ | Basic functions available | Need fully test | ||
| Prompt logProbs | ✅ | Basic functions available | Need fully test | ||
| Async output | ✅ | Basic functions available | Need fully test | ||
| Multi step scheduler | ✅ | Basic functions available | Need fully test, Find more details at Blog , RFC and issue | ||
| Best of | ✅ | Basic functions available | Need fully test | ||
| Beam search | ✅ | Basic functions available | Need fully test | ||
| Guided Decoding | ✅ | Basic functions available | Find more details at the issue | ||
| Tensor Parallel | ✅ | Basic functions available | Need fully test | ||
| Pipeline Parallel | ✅ | Basic functions available | Need fully test |