61 lines
2.4 KiB
Markdown
61 lines
2.4 KiB
Markdown
---
|
|
license: apache-2.0
|
|
base_model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
|
|
tags:
|
|
- llama.cpp
|
|
- gguf
|
|
- quantized
|
|
- text-generation
|
|
- lightweight
|
|
- lmstudio
|
|
- jan
|
|
- cobalt
|
|
- text-generation-webui
|
|
---
|
|
|
|
# DeepSeek-Coder-V2-Lite-Instruct - GGUF High-Quality Quantizations
|
|
|
|
This repository provides **GGUF** quantized versions of the [deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct) model, optimized for local execution using `llama.cpp` and compatible ecosystems.
|
|
|
|
## 📌 Version Notes
|
|
All quantizations were generated from the official **FP16** weights.
|
|
- **Target:** Efficient execution on consumer hardware, mobile/edge devices, and systems with limited memory.
|
|
- **Performance:** The output quality (reasoning, coherence, and accuracy) is strictly dependent on the base model's parameter scale (9B).
|
|
|
|
## 📊 Quantization Table
|
|
|
|
| File | Method | Bit | Description |
|
|
| :--- | :--- | :--- | :--- |
|
|
| **fp16.gguf** | FP16 | 16-bit | **Original Weights.** No quantization applied. Maximum fidelity. |
|
|
| **Q8_0.gguf** | Q8_0 | 8-bit | **Near-lossless.** Practically identical to the original model with lower memory footprint. |
|
|
| **Q5_K_M.gguf** | Q5_K_M | 5-bit | **High Precision.** Minimizes quantization error for critical tasks. |
|
|
| **Q4_K_M.gguf** | Q4_K_M | 4-bit | **Recommended.** Best balance between speed and performance. |
|
|
| **Q4_K_S.gguf** | Q4_K_S | 4-bit | **Fast/Small.** Optimized for maximum throughput and low RAM usage. |
|
|
|
|
## 🛠️ Technical Details
|
|
- **Quantization Date:** 2026-03-13
|
|
- **Tool used:** `llama-quantize` (llama.cpp)
|
|
- **Method:** K-Quantization (optimized for AVX2/AVX-512 and modern GPU architectures).
|
|
|
|
## 🚀 How to Use
|
|
# Start a local OpenAI-compatible server with a web UI:
|
|
|
|
### llama.cpp (CLI) using model from HuggingFace
|
|
```bash
|
|
./llama-cli -hf daniloreddy/DeepSeek-Coder-V2-Lite-Instruct_GGUF:Q4_K_M -p "User: Hello! Assistant:" -n 512 --temp 0.7
|
|
```
|
|
|
|
### llama.cpp (CLI) using downloaded model
|
|
```bash
|
|
./llama-cli -m path/to/DeepSeek-Coder-V2-Lite-Instruct_Q4_K_M.gguf -p "User: Hello! Assistant:" -n 512 --temp 0.7
|
|
```
|
|
|
|
### llama.cpp (SERVER) using model from HuggingFace
|
|
```bash
|
|
./llama-server -hf daniloreddy/DeepSeek-Coder-V2-Lite-Instruct_GGUF:Q4_K_M --port 8080 -c 4096
|
|
```
|
|
|
|
### llama.cpp (SERVER) using downloaded model
|
|
```bash
|
|
./llama-server -m /path/to/DeepSeek-Coder-V2-Lite-Instruct_Q4_K_M.gguf --port 8080 -c 4096
|
|
``` |