From d9ad42a174d3160857a4585fc4ba42a542688243 Mon Sep 17 00:00:00 2001 From: Xinyu Dong Date: Sun, 15 Feb 2026 13:12:17 +0800 Subject: [PATCH] [Docs] Fix quantization support description in README (#208) Updated quantization support description from FP8 to INT8. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 859a75a..8e7c0df 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ This plugin provides a hardware-pluggable interface that decouples the integrati - **Seamless Plugin Integration** — Works as a standard vLLM platform plugin via Python entry points, no need to modify vLLM source code - **Broad Model Support** — Supports 15+ mainstream LLMs including Qwen, Llama, DeepSeek, Kimi-K2, and multimodal models -- **Quantization Support** — FP8 and other quantization methods for MoE and dense models +- **Quantization Support** — INT8 and other quantization methods for MoE and dense models - **LoRA Fine-Tuning** — LoRA adapter support for Qwen series models - **Piecewise Kunlun Graph** — Hardware-accelerated graph optimization for high-performance inference - **FlashMLA Attention** — Optimized multi-head latent attention for DeepSeek MLA architectures