[Docs] Fix quantization support description in README (#208)

Updated quantization support description from FP8 to INT8.
This commit is contained in:
Xinyu Dong
2026-02-15 13:12:17 +08:00
committed by GitHub
parent 77dbc2ddeb
commit d9ad42a174

View File

@@ -40,7 +40,7 @@ This plugin provides a hardware-pluggable interface that decouples the integrati
- **Seamless Plugin Integration** — Works as a standard vLLM platform plugin via Python entry points, no need to modify vLLM source code
- **Broad Model Support** — Supports 15+ mainstream LLMs including Qwen, Llama, DeepSeek, Kimi-K2, and multimodal models
- **Quantization Support** — FP8 and other quantization methods for MoE and dense models
- **Quantization Support** — INT8 and other quantization methods for MoE and dense models
- **LoRA Fine-Tuning** — LoRA adapter support for Qwen series models
- **Piecewise Kunlun Graph** — Hardware-accelerated graph optimization for high-performance inference
- **FlashMLA Attention** — Optimized multi-head latent attention for DeepSeek MLA architectures