--- license: apache-2.0 language: - en base_model: Qwen/Qwen2.5-3B-Instruct tags: - gguf - qwen2 - grpo - tax - finance - fine-tuned pipeline_tag: text-generation --- # grpo-tax-qwen-3b-GGUF > Built with [NEO — Your Autonomous AI Agent](https://heyneo.com) GGUF quantized versions of **Qwen2.5-3B-Instruct** fine-tuned with **GRPO (Group Relative Policy Optimization)** on tax and financial reasoning tasks. ## Model Details | Property | Value | |----------|-------| | Base Model | Qwen/Qwen2.5-3B-Instruct | | Fine-tuning Method | GRPO (Group Relative Policy Optimization) | | Domain | Tax & Financial Reasoning | | Architecture | Qwen2 | | Context Length | 32,768 tokens | | Format | GGUF | ## Available Quantizations | File | Quantization | Size | Use Case | |------|-------------|------|----------| | `grpo-tax-qwen-3b-Q4_K_M.gguf` | Q4_K_M | ~2.0 GB | Best balance of speed and quality | | `grpo-tax-qwen-3b-Q8_0.gguf` | Q8_0 | ~3.2 GB | Higher quality, more RAM required | ## Usage ### With llama.cpp ```bash # Download the model huggingface-cli download daksh-neo/grpo-tax-qwen-3b-gguf grpo-tax-qwen-3b-Q4_K_M.gguf # Run inference ./llama-cli -m grpo-tax-qwen-3b-Q4_K_M.gguf \ -p "<|im_start|>system\nYou are a tax expert assistant.<|im_end|>\n<|im_start|>user\nWhat is the standard deduction for 2024?<|im_end|>\n<|im_start|>assistant\n" \ -n 512 --temp 0.7 ``` ### With Ollama ```bash # Create a Modelfile cat > Modelfile << 'EOF' FROM ./grpo-tax-qwen-3b-Q4_K_M.gguf TEMPLATE """<|im_start|>system {{ .System }}<|im_end|> <|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant """ SYSTEM "You are a helpful tax and financial assistant." EOF ollama create grpo-tax-qwen-3b -f Modelfile ollama run grpo-tax-qwen-3b ``` ### With Python (llama-cpp-python) ```python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="daksh-neo/grpo-tax-qwen-3b-gguf", filename="grpo-tax-qwen-3b-Q4_K_M.gguf", n_ctx=4096, ) response = llm.create_chat_completion( messages=[ {"role": "system", "content": "You are a helpful tax assistant."}, {"role": "user", "content": "Explain what a W-2 form is."} ] ) print(response["choices"][0]["message"]["content"]) ``` ## Training Details This model was fine-tuned using GRPO (Group Relative Policy Optimization), a reinforcement learning from human feedback (RLHF) variant that optimizes the model's responses on tax and financial reasoning tasks without requiring a separate reward model. GRPO trains by comparing groups of sampled responses and reinforcing higher-quality answers. **Training focus areas:** - Federal and state tax regulations - Tax form interpretation (W-2, 1099, Schedule C, etc.) - Deductions and credits - Tax planning strategies - Financial compliance questions ## Limitations - This model is fine-tuned on tax knowledge up to its training cutoff and may not reflect the latest tax law changes. - Always consult a qualified tax professional for official tax advice. - The model is not a substitute for professional legal or financial guidance. ## Related Models - [daksh-neo/grpo-tax-qwen-1.5b-gguf](https://huggingface.co/daksh-neo/grpo-tax-qwen-1.5b-gguf) — Smaller 1.5B version for resource-constrained environments ## License Apache 2.0 — see [Qwen2.5 license](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/blob/main/LICENSE) for base model terms. ---
Built with NEO — Your Autonomous AI Agent