commit 38316c7703271515267bee13bf7ac0d916c58baa Author: ModelHub XC Date: Thu May 21 15:40:15 2026 +0800 初始化项目,由ModelHub XC社区提供模型 Model: bond005/meno-lite-0.1-gguf Source: Original Platform diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..4a03161 --- /dev/null +++ b/.gitattributes @@ -0,0 +1,41 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +bond005/meno-lite-0.1-gguf/meno-lite-0.1-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text +meno-lite-0.1-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text +meno-lite-0.1-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text +meno-lite-0.1-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text +meno-lite-0.1-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text +meno-lite-0.1-fp16.gguf filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md new file mode 100644 index 0000000..e31baa5 --- /dev/null +++ b/README.md @@ -0,0 +1,76 @@ +--- +license: apache-2.0 +language: +- en +- ru +base_model: +- bond005/meno-lite-0.1 +tags: +- rag +- ner +- information-extraction +- summarization +- question-answering +- document-qa +- long-context +pipeline_tag: text-generation +--- + +# Meno-Lite-0.1 GGUF + +This repository contains quantized GGUF versions of [Meno-Lite-0.1](https://huggingface.co/bond005/meno-lite-0.1). +All variants were produced using an **importance matrix** computed on the `train` split of the [`ru_llm_calibration`](https://huggingface.co/datasets/bond005/ru_llm_calibration) dataset, and are intended to be run with [`llama.cpp`](https://github.com/ggerganov/llama.cpp). + +## Available Formats + +| Quantization type | File size | Quality | Recommendation | +| :--- | :--- | :--- | :--- | +| **Q8_0** | ~8.05 GB | Virtually identical to FP16 | **Best quality**. Ideal for CPU inference when memory is not a constraint. | +| **Q5_K_M** | ~5.41 GB | Minimal degradation | **Recommended balance**. Excellent speed and quality, fits most consumer GPUs. | +| **Q4_K_M** | ~4.65 GB | Moderate degradation | **"Golden standard"**. Best trade-off between size and quality. | +| **IQ3_M** | ~3.54 GB | Noticeable degradation | **Maximum memory savings**. Quality drops visibly; suited for highly constrained devices. | + +## Quality Evaluation + +Quality was measured on the `test` split of the [**Ru LLM Calibration**](https://huggingface.co/datasets/bond005/ru_llm_calibration) dataset using the `llama-perplexity` utility. The original FP16 model served as the reference. + +| Metric | Q8_0 | Q5_K_M | Q4_K_M | IQ3_M | +| :--- | :--- | :--- | :--- | :--- | +| **Mean PPL (Q) ↓** | 9.047 | 9.075 | 9.135 | 9.689 | +| **PPL correlation ↑** | 99.97% | 99.87% | 99.69% | 98.64% | +| **Mean KLD ↓** | 0.0020 | 0.0077 | 0.0174 | 0.0804 | +| **Same top p ↑** | 96.71% | 94.36% | 92.16% | 84.58% | + +> ↑ – higher is better; ↓ – lower is better + +**How to interpret these metrics:** +- **Mean PPL (Q)**: Lower is better. Shows the average perplexity of the quantized model. +- **PPL correlation**: Closer to 100% indicates the quantized model behaves almost identically to FP16. Values above 99.5% are considered excellent. +- **Mean KLD**: Measures the divergence between the output probability distributions. Lower is better; 0 means identical distributions. +- **Same top p**: The percentage of tokens where the quantized model's top prediction matches the FP16 model. Higher is better – it reflects how often the model's first-choice token remains unchanged. + +## Usage + +### 1. Install `llama.cpp` + +Follow the [official build instructions](https://github.com/ggerganov/llama.cpp#build). + +### 2. Run the model + +```bash +# CLI +./llama-cli -hf bond005/meno-lite-0.1-gguf -m meno-lite-0.1-Q4_K_M.gguf -p "Привет, как дела?" + +# Server with WebUI (default http://127.0.0.1:8080) +./llama-server -hf bond005/meno-lite-0.1-gguf -m meno-lite-0.1-Q4_K_M.gguf --host 0.0.0.0 --port 8080 +``` + +For more details on available parameters, see the [`llama.cpp` documentation](https://github.com/ggerganov/llama.cpp/tree/master/examples). + +## About Meno-Lite-0.1 + +Meno-Lite-0.1 is a 7B model based on Qwen2.5, fine-tuned for **RAG, document QA, information extraction, and knowledge graph construction**. Read more about its capabilities, training procedure, and limitations in the [main model card](https://huggingface.co/bond005/meno-lite-0.1). + +## License + +All quantized variants inherit the license of the original model (Apache 2.0). \ No newline at end of file diff --git a/meno-lite-0.1-IQ3_M.gguf b/meno-lite-0.1-IQ3_M.gguf new file mode 100644 index 0000000..1f40ba6 --- /dev/null +++ b/meno-lite-0.1-IQ3_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:271223b473cad2c80920d5413d01e38810836450e6e538f006122deac3c757c2 +size 3544151936 diff --git a/meno-lite-0.1-Q4_K_M.gguf b/meno-lite-0.1-Q4_K_M.gguf new file mode 100644 index 0000000..15baf6e --- /dev/null +++ b/meno-lite-0.1-Q4_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05fea614c2b6e54b49d7e24d6ed40f5c21b62cc0d052fd3475628f5eff3a9cc7 +size 4649923456 diff --git a/meno-lite-0.1-Q5_K_M.gguf b/meno-lite-0.1-Q5_K_M.gguf new file mode 100644 index 0000000..19090ef --- /dev/null +++ b/meno-lite-0.1-Q5_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43cb5cef009fefb5a3349c5a71a33ca4ef01181a26e0331f630e9396124814e0 +size 5408584576 diff --git a/meno-lite-0.1-Q8_0.gguf b/meno-lite-0.1-Q8_0.gguf new file mode 100644 index 0000000..6223e20 --- /dev/null +++ b/meno-lite-0.1-Q8_0.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:31801da910dea698b0544816cd22caacf1308b07733f136f6426044662b9b48f +size 8046989184 diff --git a/meno-lite-0.1-fp16.gguf b/meno-lite-0.1-fp16.gguf new file mode 100644 index 0000000..3c90a77 --- /dev/null +++ b/meno-lite-0.1-fp16.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66f130e0bd6a37c5f22774e126c79090a277e9d1aceaa3965da47e31a053390d +size 15139868288