Files
meno-lite-0.1-gguf/README.md
ModelHub XC 38316c7703 初始化项目,由ModelHub XC社区提供模型
Model: bond005/meno-lite-0.1-gguf
Source: Original Platform
2026-05-21 15:40:15 +08:00

76 lines
3.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
language:
- en
- ru
base_model:
- bond005/meno-lite-0.1
tags:
- rag
- ner
- information-extraction
- summarization
- question-answering
- document-qa
- long-context
pipeline_tag: text-generation
---
# Meno-Lite-0.1 GGUF
This repository contains quantized GGUF versions of [Meno-Lite-0.1](https://huggingface.co/bond005/meno-lite-0.1).
All variants were produced using an **importance matrix** computed on the `train` split of the [`ru_llm_calibration`](https://huggingface.co/datasets/bond005/ru_llm_calibration) dataset, and are intended to be run with [`llama.cpp`](https://github.com/ggerganov/llama.cpp).
## Available Formats
| Quantization type | File size | Quality | Recommendation |
| :--- | :--- | :--- | :--- |
| **Q8_0** | ~8.05 GB | Virtually identical to FP16 | **Best quality**. Ideal for CPU inference when memory is not a constraint. |
| **Q5_K_M** | ~5.41 GB | Minimal degradation | **Recommended balance**. Excellent speed and quality, fits most consumer GPUs. |
| **Q4_K_M** | ~4.65 GB | Moderate degradation | **"Golden standard"**. Best trade-off between size and quality. |
| **IQ3_M** | ~3.54 GB | Noticeable degradation | **Maximum memory savings**. Quality drops visibly; suited for highly constrained devices. |
## Quality Evaluation
Quality was measured on the `test` split of the [**Ru LLM Calibration**](https://huggingface.co/datasets/bond005/ru_llm_calibration) dataset using the `llama-perplexity` utility. The original FP16 model served as the reference.
| Metric | Q8_0 | Q5_K_M | Q4_K_M | IQ3_M |
| :--- | :--- | :--- | :--- | :--- |
| **Mean PPL (Q) ↓** | 9.047 | 9.075 | 9.135 | 9.689 |
| **PPL correlation ↑** | 99.97% | 99.87% | 99.69% | 98.64% |
| **Mean KLD ↓** | 0.0020 | 0.0077 | 0.0174 | 0.0804 |
| **Same top p ↑** | 96.71% | 94.36% | 92.16% | 84.58% |
> higher is better; ↓ lower is better
**How to interpret these metrics:**
- **Mean PPL (Q)**: Lower is better. Shows the average perplexity of the quantized model.
- **PPL correlation**: Closer to 100% indicates the quantized model behaves almost identically to FP16. Values above 99.5% are considered excellent.
- **Mean KLD**: Measures the divergence between the output probability distributions. Lower is better; 0 means identical distributions.
- **Same top p**: The percentage of tokens where the quantized model's top prediction matches the FP16 model. Higher is better it reflects how often the model's first-choice token remains unchanged.
## Usage
### 1. Install `llama.cpp`
Follow the [official build instructions](https://github.com/ggerganov/llama.cpp#build).
### 2. Run the model
```bash
# CLI
./llama-cli -hf bond005/meno-lite-0.1-gguf -m meno-lite-0.1-Q4_K_M.gguf -p "Привет, как дела?"
# Server with WebUI (default http://127.0.0.1:8080)
./llama-server -hf bond005/meno-lite-0.1-gguf -m meno-lite-0.1-Q4_K_M.gguf --host 0.0.0.0 --port 8080
```
For more details on available parameters, see the [`llama.cpp` documentation](https://github.com/ggerganov/llama.cpp/tree/master/examples).
## About Meno-Lite-0.1
Meno-Lite-0.1 is a 7B model based on Qwen2.5, fine-tuned for **RAG, document QA, information extraction, and knowledge graph construction**. Read more about its capabilities, training procedure, and limitations in the [main model card](https://huggingface.co/bond005/meno-lite-0.1).
## License
All quantized variants inherit the license of the original model (Apache 2.0).