qwen3-tiny-v2-finetuned/README.md

---
license: apache-2.0
language:
  - en
base_model: g023/qwen3-tiny-v2
tags:
  - qwen3
  - gguf
  - q8_0
  - finetuned
  - grpo
  - lora-merged
  - text-generation
pipeline_tag: text-generation
library_name: llama.cpp
quantized_by: g023
---

# Qwen3-g023-tiny-v2-FT-Q8_0 - GRPO Finetuned Q8_0 GGUF Export

https://huggingface.co/g023/qwen3-tiny-v2-finetuned/

Q8_0 GGUF export of a GRPO finetuned Qwen3 model to achieve improved reasoning and reduced repetition. 
Original SRC Model: https://huggingface.co/g023/qwen3-tiny-v2

*THIS IS A WIP (WORK IN PROGRESS)* 

## Files

- `Qwen3-g023-tiny-v2-FT-Q8_0.gguf`: Q8_0 GGUF model (~1.81 GB)
- `Modelfile`: Ollama template + tested default sampling settings
- `params_best.json`: Best sampled parameters from automated sweep
- `sweep_results.json`: Full sweep results and per-test outcomes

## Tested Best Parameters (Default in Modelfile)

- `temperature`: 0.65
- `top_p`: 0.9
- `top_k`: 20
- `min_p`: 0.0
- `repeat_penalty`: 1.05
- `presence_penalty`: 0.1
- `frequency_penalty`: 0.1
- `num_ctx`: 40000

## Usage (Ollama)

```bash
ollama create qwen3-g023-tiny-v2-FT-Q8_0 -f Modelfile
ollama run qwen3-g023-tiny-v2-FT-Q8_0

# thinking on
ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think "Explain why the sky is blue"

# thinking off
ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think=false "Explain why the sky is blue"
```

### or pull from huggingface directly to ollama:

```bash
ollama run hf.co/g023/qwen3-tiny-v2-finetuned:Q8_0
``` 

## Notes

- Template is the Qwen3-compatible template with think/no_think handling.
- If you want stricter non-thinking behavior, compare alternatives in `sweep_results.json`.
初始化项目，由ModelHub XC社区提供模型 Model: g023/qwen3-tiny-v2-finetuned Source: Original Platform 2026-05-06 13:43:43 +08:00			`---`
			`license: apache-2.0`
			`language:`
			`- en`
			`base_model: g023/qwen3-tiny-v2`
			`tags:`
			`- qwen3`
			`- gguf`
			`- q8_0`
			`- finetuned`
			`- grpo`
			`- lora-merged`
			`- text-generation`
			`pipeline_tag: text-generation`
			`library_name: llama.cpp`
			`quantized_by: g023`
			`---`

			`# Qwen3-g023-tiny-v2-FT-Q8_0 - GRPO Finetuned Q8_0 GGUF Export`

			`https://huggingface.co/g023/qwen3-tiny-v2-finetuned/`

			`Q8_0 GGUF export of a GRPO finetuned Qwen3 model to achieve improved reasoning and reduced repetition.`
			`Original SRC Model: https://huggingface.co/g023/qwen3-tiny-v2`

			`THIS IS A WIP (WORK IN PROGRESS)`

			`## Files`

			- `Qwen3-g023-tiny-v2-FT-Q8_0.gguf`: Q8_0 GGUF model (~1.81 GB)
			- `Modelfile`: Ollama template + tested default sampling settings
			- `params_best.json`: Best sampled parameters from automated sweep
			- `sweep_results.json`: Full sweep results and per-test outcomes

			`## Tested Best Parameters (Default in Modelfile)`

			- `temperature`: 0.65
			- `top_p`: 0.9
			- `top_k`: 20
			- `min_p`: 0.0
			- `repeat_penalty`: 1.05
			- `presence_penalty`: 0.1
			- `frequency_penalty`: 0.1
			- `num_ctx`: 40000

			`## Usage (Ollama)`

			```bash
			`ollama create qwen3-g023-tiny-v2-FT-Q8_0 -f Modelfile`
			`ollama run qwen3-g023-tiny-v2-FT-Q8_0`

			`# thinking on`
			`ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think "Explain why the sky is blue"`

			`# thinking off`
			`ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think=false "Explain why the sky is blue"`
			```

			`### or pull from huggingface directly to ollama:`

			```bash
			`ollama run hf.co/g023/qwen3-tiny-v2-finetuned:Q8_0`
			```

			`## Notes`

			`- Template is the Qwen3-compatible template with think/no_think handling.`
			- If you want stricter non-thinking behavior, compare alternatives in `sweep_results.json`.