--- license: apache-2.0 language: - en base_model: g023/qwen3-tiny-v2 tags: - qwen3 - gguf - q8_0 - finetuned - grpo - lora-merged - text-generation pipeline_tag: text-generation library_name: llama.cpp quantized_by: g023 --- # Qwen3-g023-tiny-v2-FT-Q8_0 - GRPO Finetuned Q8_0 GGUF Export https://huggingface.co/g023/qwen3-tiny-v2-finetuned/ Q8_0 GGUF export of a GRPO finetuned Qwen3 model to achieve improved reasoning and reduced repetition. Original SRC Model: https://huggingface.co/g023/qwen3-tiny-v2 *THIS IS A WIP (WORK IN PROGRESS)* ## Files - `Qwen3-g023-tiny-v2-FT-Q8_0.gguf`: Q8_0 GGUF model (~1.81 GB) - `Modelfile`: Ollama template + tested default sampling settings - `params_best.json`: Best sampled parameters from automated sweep - `sweep_results.json`: Full sweep results and per-test outcomes ## Tested Best Parameters (Default in Modelfile) - `temperature`: 0.65 - `top_p`: 0.9 - `top_k`: 20 - `min_p`: 0.0 - `repeat_penalty`: 1.05 - `presence_penalty`: 0.1 - `frequency_penalty`: 0.1 - `num_ctx`: 40000 ## Usage (Ollama) ```bash ollama create qwen3-g023-tiny-v2-FT-Q8_0 -f Modelfile ollama run qwen3-g023-tiny-v2-FT-Q8_0 # thinking on ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think "Explain why the sky is blue" # thinking off ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think=false "Explain why the sky is blue" ``` ### or pull from huggingface directly to ollama: ```bash ollama run hf.co/g023/qwen3-tiny-v2-finetuned:Q8_0 ``` ## Notes - Template is the Qwen3-compatible template with think/no_think handling. - If you want stricter non-thinking behavior, compare alternatives in `sweep_results.json`.