69 lines
1.6 KiB
Markdown
69 lines
1.6 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
base_model: g023/qwen3-tiny-v2
|
||
|
|
tags:
|
||
|
|
- qwen3
|
||
|
|
- gguf
|
||
|
|
- q8_0
|
||
|
|
- finetuned
|
||
|
|
- grpo
|
||
|
|
- lora-merged
|
||
|
|
- text-generation
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
library_name: llama.cpp
|
||
|
|
quantized_by: g023
|
||
|
|
---
|
||
|
|
|
||
|
|
# Qwen3-g023-tiny-v2-FT-Q8_0 - GRPO Finetuned Q8_0 GGUF Export
|
||
|
|
|
||
|
|
https://huggingface.co/g023/qwen3-tiny-v2-finetuned/
|
||
|
|
|
||
|
|
Q8_0 GGUF export of a GRPO finetuned Qwen3 model to achieve improved reasoning and reduced repetition.
|
||
|
|
Original SRC Model: https://huggingface.co/g023/qwen3-tiny-v2
|
||
|
|
|
||
|
|
*THIS IS A WIP (WORK IN PROGRESS)*
|
||
|
|
|
||
|
|
## Files
|
||
|
|
|
||
|
|
- `Qwen3-g023-tiny-v2-FT-Q8_0.gguf`: Q8_0 GGUF model (~1.81 GB)
|
||
|
|
- `Modelfile`: Ollama template + tested default sampling settings
|
||
|
|
- `params_best.json`: Best sampled parameters from automated sweep
|
||
|
|
- `sweep_results.json`: Full sweep results and per-test outcomes
|
||
|
|
|
||
|
|
## Tested Best Parameters (Default in Modelfile)
|
||
|
|
|
||
|
|
- `temperature`: 0.65
|
||
|
|
- `top_p`: 0.9
|
||
|
|
- `top_k`: 20
|
||
|
|
- `min_p`: 0.0
|
||
|
|
- `repeat_penalty`: 1.05
|
||
|
|
- `presence_penalty`: 0.1
|
||
|
|
- `frequency_penalty`: 0.1
|
||
|
|
- `num_ctx`: 40000
|
||
|
|
|
||
|
|
## Usage (Ollama)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ollama create qwen3-g023-tiny-v2-FT-Q8_0 -f Modelfile
|
||
|
|
ollama run qwen3-g023-tiny-v2-FT-Q8_0
|
||
|
|
|
||
|
|
# thinking on
|
||
|
|
ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think "Explain why the sky is blue"
|
||
|
|
|
||
|
|
# thinking off
|
||
|
|
ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think=false "Explain why the sky is blue"
|
||
|
|
```
|
||
|
|
|
||
|
|
### or pull from huggingface directly to ollama:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ollama run hf.co/g023/qwen3-tiny-v2-finetuned:Q8_0
|
||
|
|
```
|
||
|
|
|
||
|
|
## Notes
|
||
|
|
|
||
|
|
- Template is the Qwen3-compatible template with think/no_think handling.
|
||
|
|
- If you want stricter non-thinking behavior, compare alternatives in `sweep_results.json`.
|