123 lines
3.4 KiB
Markdown
123 lines
3.4 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
base_model: Qwen/Qwen2.5-3B-Instruct
|
||
|
|
tags:
|
||
|
|
- gguf
|
||
|
|
- qwen2
|
||
|
|
- grpo
|
||
|
|
- tax
|
||
|
|
- finance
|
||
|
|
- fine-tuned
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
---
|
||
|
|
|
||
|
|
# grpo-tax-qwen-3b-GGUF
|
||
|
|
|
||
|
|
> Built with [NEO — Your Autonomous AI Agent](https://heyneo.com)
|
||
|
|
|
||
|
|
GGUF quantized versions of **Qwen2.5-3B-Instruct** fine-tuned with **GRPO (Group Relative Policy Optimization)** on tax and financial reasoning tasks.
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
| Property | Value |
|
||
|
|
|----------|-------|
|
||
|
|
| Base Model | Qwen/Qwen2.5-3B-Instruct |
|
||
|
|
| Fine-tuning Method | GRPO (Group Relative Policy Optimization) |
|
||
|
|
| Domain | Tax & Financial Reasoning |
|
||
|
|
| Architecture | Qwen2 |
|
||
|
|
| Context Length | 32,768 tokens |
|
||
|
|
| Format | GGUF |
|
||
|
|
|
||
|
|
## Available Quantizations
|
||
|
|
|
||
|
|
| File | Quantization | Size | Use Case |
|
||
|
|
|------|-------------|------|----------|
|
||
|
|
| `grpo-tax-qwen-3b-Q4_K_M.gguf` | Q4_K_M | ~2.0 GB | Best balance of speed and quality |
|
||
|
|
| `grpo-tax-qwen-3b-Q8_0.gguf` | Q8_0 | ~3.2 GB | Higher quality, more RAM required |
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### With llama.cpp
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Download the model
|
||
|
|
huggingface-cli download daksh-neo/grpo-tax-qwen-3b-gguf grpo-tax-qwen-3b-Q4_K_M.gguf
|
||
|
|
|
||
|
|
# Run inference
|
||
|
|
./llama-cli -m grpo-tax-qwen-3b-Q4_K_M.gguf \
|
||
|
|
-p "<|im_start|>system\nYou are a tax expert assistant.<|im_end|>\n<|im_start|>user\nWhat is the standard deduction for 2024?<|im_end|>\n<|im_start|>assistant\n" \
|
||
|
|
-n 512 --temp 0.7
|
||
|
|
```
|
||
|
|
|
||
|
|
### With Ollama
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Create a Modelfile
|
||
|
|
cat > Modelfile << 'EOF'
|
||
|
|
FROM ./grpo-tax-qwen-3b-Q4_K_M.gguf
|
||
|
|
TEMPLATE """<|im_start|>system
|
||
|
|
{{ .System }}<|im_end|>
|
||
|
|
<|im_start|>user
|
||
|
|
{{ .Prompt }}<|im_end|>
|
||
|
|
<|im_start|>assistant
|
||
|
|
"""
|
||
|
|
SYSTEM "You are a helpful tax and financial assistant."
|
||
|
|
EOF
|
||
|
|
|
||
|
|
ollama create grpo-tax-qwen-3b -f Modelfile
|
||
|
|
ollama run grpo-tax-qwen-3b
|
||
|
|
```
|
||
|
|
|
||
|
|
### With Python (llama-cpp-python)
|
||
|
|
|
||
|
|
```python
|
||
|
|
from llama_cpp import Llama
|
||
|
|
|
||
|
|
llm = Llama.from_pretrained(
|
||
|
|
repo_id="daksh-neo/grpo-tax-qwen-3b-gguf",
|
||
|
|
filename="grpo-tax-qwen-3b-Q4_K_M.gguf",
|
||
|
|
n_ctx=4096,
|
||
|
|
)
|
||
|
|
|
||
|
|
response = llm.create_chat_completion(
|
||
|
|
messages=[
|
||
|
|
{"role": "system", "content": "You are a helpful tax assistant."},
|
||
|
|
{"role": "user", "content": "Explain what a W-2 form is."}
|
||
|
|
]
|
||
|
|
)
|
||
|
|
print(response["choices"][0]["message"]["content"])
|
||
|
|
```
|
||
|
|
|
||
|
|
## Training Details
|
||
|
|
|
||
|
|
This model was fine-tuned using GRPO (Group Relative Policy Optimization), a reinforcement learning from human feedback (RLHF) variant that optimizes the model's responses on tax and financial reasoning tasks without requiring a separate reward model. GRPO trains by comparing groups of sampled responses and reinforcing higher-quality answers.
|
||
|
|
|
||
|
|
**Training focus areas:**
|
||
|
|
- Federal and state tax regulations
|
||
|
|
- Tax form interpretation (W-2, 1099, Schedule C, etc.)
|
||
|
|
- Deductions and credits
|
||
|
|
- Tax planning strategies
|
||
|
|
- Financial compliance questions
|
||
|
|
|
||
|
|
## Limitations
|
||
|
|
|
||
|
|
- This model is fine-tuned on tax knowledge up to its training cutoff and may not reflect the latest tax law changes.
|
||
|
|
- Always consult a qualified tax professional for official tax advice.
|
||
|
|
- The model is not a substitute for professional legal or financial guidance.
|
||
|
|
|
||
|
|
## Related Models
|
||
|
|
|
||
|
|
- [daksh-neo/grpo-tax-qwen-1.5b-gguf](https://huggingface.co/daksh-neo/grpo-tax-qwen-1.5b-gguf) — Smaller 1.5B version for resource-constrained environments
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
Apache 2.0 — see [Qwen2.5 license](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/blob/main/LICENSE) for base model terms.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
<div align="center">
|
||
|
|
Built with <a href="https://heyneo.com">NEO</a> — Your Autonomous AI Agent
|
||
|
|
</div>
|