grpo-tax-qwen-3b-gguf/README.md

---
license: apache-2.0
language:
- en
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
- gguf
- qwen2
- grpo
- tax
- finance
- fine-tuned
pipeline_tag: text-generation
---

# grpo-tax-qwen-3b-GGUF

> Built with [NEO — Your Autonomous AI Agent](https://heyneo.com)

GGUF quantized versions of **Qwen2.5-3B-Instruct** fine-tuned with **GRPO (Group Relative Policy Optimization)** on tax and financial reasoning tasks.

## Model Details

| Property | Value |
|----------|-------|
| Base Model | Qwen/Qwen2.5-3B-Instruct |
| Fine-tuning Method | GRPO (Group Relative Policy Optimization) |
| Domain | Tax & Financial Reasoning |
| Architecture | Qwen2 |
| Context Length | 32,768 tokens |
| Format | GGUF |

## Available Quantizations

| File | Quantization | Size | Use Case |
|------|-------------|------|----------|
| `grpo-tax-qwen-3b-Q4_K_M.gguf` | Q4_K_M | ~2.0 GB | Best balance of speed and quality |
| `grpo-tax-qwen-3b-Q8_0.gguf` | Q8_0 | ~3.2 GB | Higher quality, more RAM required |

## Usage

### With llama.cpp

```bash
# Download the model
huggingface-cli download daksh-neo/grpo-tax-qwen-3b-gguf grpo-tax-qwen-3b-Q4_K_M.gguf

# Run inference
./llama-cli -m grpo-tax-qwen-3b-Q4_K_M.gguf \
  -p "<|im_start|>system\nYou are a tax expert assistant.<|im_end|>\n<|im_start|>user\nWhat is the standard deduction for 2024?<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.7
```

### With Ollama

```bash
# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./grpo-tax-qwen-3b-Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM "You are a helpful tax and financial assistant."
EOF

ollama create grpo-tax-qwen-3b -f Modelfile
ollama run grpo-tax-qwen-3b
```

### With Python (llama-cpp-python)

```python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="daksh-neo/grpo-tax-qwen-3b-gguf",
    filename="grpo-tax-qwen-3b-Q4_K_M.gguf",
    n_ctx=4096,
)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful tax assistant."},
        {"role": "user", "content": "Explain what a W-2 form is."}
    ]
)
print(response["choices"][0]["message"]["content"])
```

## Training Details

This model was fine-tuned using GRPO (Group Relative Policy Optimization), a reinforcement learning from human feedback (RLHF) variant that optimizes the model's responses on tax and financial reasoning tasks without requiring a separate reward model. GRPO trains by comparing groups of sampled responses and reinforcing higher-quality answers.

**Training focus areas:**
- Federal and state tax regulations
- Tax form interpretation (W-2, 1099, Schedule C, etc.)
- Deductions and credits
- Tax planning strategies
- Financial compliance questions

## Limitations

- This model is fine-tuned on tax knowledge up to its training cutoff and may not reflect the latest tax law changes.
- Always consult a qualified tax professional for official tax advice.
- The model is not a substitute for professional legal or financial guidance.

## Related Models

- [daksh-neo/grpo-tax-qwen-1.5b-gguf](https://huggingface.co/daksh-neo/grpo-tax-qwen-1.5b-gguf) — Smaller 1.5B version for resource-constrained environments

## License

Apache 2.0 — see [Qwen2.5 license](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/blob/main/LICENSE) for base model terms.

---

<div align="center">
Built with <a href="https://heyneo.com">NEO</a> — Your Autonomous AI Agent
</div>
初始化项目，由ModelHub XC社区提供模型 Model: daksh-neo/grpo-tax-qwen-3b-gguf Source: Original Platform 2026-04-22 13:19:58 +08:00			`---`
			`license: apache-2.0`
			`language:`
			`- en`
			`base_model: Qwen/Qwen2.5-3B-Instruct`
			`tags:`
			`- gguf`
			`- qwen2`
			`- grpo`
			`- tax`
			`- finance`
			`- fine-tuned`
			`pipeline_tag: text-generation`
			`---`

			`# grpo-tax-qwen-3b-GGUF`

			`> Built with [NEO — Your Autonomous AI Agent](https://heyneo.com)`

			`GGUF quantized versions of Qwen2.5-3B-Instruct fine-tuned with GRPO (Group Relative Policy Optimization) on tax and financial reasoning tasks.`

			`## Model Details`

			`\| Property \| Value \|`
			`\|----------\|-------\|`
			`\| Base Model \| Qwen/Qwen2.5-3B-Instruct \|`
			`\| Fine-tuning Method \| GRPO (Group Relative Policy Optimization) \|`
			`\| Domain \| Tax & Financial Reasoning \|`
			`\| Architecture \| Qwen2 \|`
			`\| Context Length \| 32,768 tokens \|`
			`\| Format \| GGUF \|`

			`## Available Quantizations`

			`\| File \| Quantization \| Size \| Use Case \|`
			`\|------\|-------------\|------\|----------\|`
			\| `grpo-tax-qwen-3b-Q4_K_M.gguf` \| Q4_K_M \| ~2.0 GB \| Best balance of speed and quality \|
			\| `grpo-tax-qwen-3b-Q8_0.gguf` \| Q8_0 \| ~3.2 GB \| Higher quality, more RAM required \|

			`## Usage`

			`### With llama.cpp`

			```bash
			`# Download the model`
			`huggingface-cli download daksh-neo/grpo-tax-qwen-3b-gguf grpo-tax-qwen-3b-Q4_K_M.gguf`

			`# Run inference`
			`./llama-cli -m grpo-tax-qwen-3b-Q4_K_M.gguf \`
			`-p "<\|im_start\|>system\nYou are a tax expert assistant.<\|im_end\|>\n<\|im_start\|>user\nWhat is the standard deduction for 2024?<\|im_end\|>\n<\|im_start\|>assistant\n" \`
			`-n 512 --temp 0.7`
			```

			`### With Ollama`

			```bash
			`# Create a Modelfile`
			`cat > Modelfile << 'EOF'`
			`FROM ./grpo-tax-qwen-3b-Q4_K_M.gguf`
			`TEMPLATE """<\|im_start\|>system`
			`{{ .System }}<\|im_end\|>`
			`<\|im_start\|>user`
			`{{ .Prompt }}<\|im_end\|>`
			`<\|im_start\|>assistant`
			`"""`
			`SYSTEM "You are a helpful tax and financial assistant."`
			`EOF`

			`ollama create grpo-tax-qwen-3b -f Modelfile`
			`ollama run grpo-tax-qwen-3b`
			```

			`### With Python (llama-cpp-python)`

			```python
			`from llama_cpp import Llama`

			`llm = Llama.from_pretrained(`
			`repo_id="daksh-neo/grpo-tax-qwen-3b-gguf",`
			`filename="grpo-tax-qwen-3b-Q4_K_M.gguf",`
			`n_ctx=4096,`
			`)`

			`response = llm.create_chat_completion(`
			`messages=[`
			`{"role": "system", "content": "You are a helpful tax assistant."},`
			`{"role": "user", "content": "Explain what a W-2 form is."}`
			`]`
			`)`
			`print(response["choices"][0]["message"]["content"])`
			```

			`## Training Details`

			`This model was fine-tuned using GRPO (Group Relative Policy Optimization), a reinforcement learning from human feedback (RLHF) variant that optimizes the model's responses on tax and financial reasoning tasks without requiring a separate reward model. GRPO trains by comparing groups of sampled responses and reinforcing higher-quality answers.`

			`Training focus areas:`
			`- Federal and state tax regulations`
			`- Tax form interpretation (W-2, 1099, Schedule C, etc.)`
			`- Deductions and credits`
			`- Tax planning strategies`
			`- Financial compliance questions`

			`## Limitations`

			`- This model is fine-tuned on tax knowledge up to its training cutoff and may not reflect the latest tax law changes.`
			`- Always consult a qualified tax professional for official tax advice.`
			`- The model is not a substitute for professional legal or financial guidance.`

			`## Related Models`

			`- [daksh-neo/grpo-tax-qwen-1.5b-gguf](https://huggingface.co/daksh-neo/grpo-tax-qwen-1.5b-gguf) — Smaller 1.5B version for resource-constrained environments`

			`## License`

			`Apache 2.0 — see [Qwen2.5 license](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/blob/main/LICENSE) for base model terms.`

			`---`

			`<div align="center">`
			`Built with <a href="https://heyneo.com">NEO</a> — Your Autonomous AI Agent`
			`</div>`