ModelHub XC c743d1a865 初始化项目,由ModelHub XC社区提供模型
Model: daksh-neo/grpo-tax-qwen-1.5b-gguf
Source: Original Platform
2026-04-19 15:48:29 +08:00

license, language, base_model, tags, pipeline_tag
license language base_model tags pipeline_tag
apache-2.0
en
Qwen/Qwen2.5-1.5B-Instruct
gguf
qwen2
grpo
tax
finance
fine-tuned
text-generation

grpo-tax-qwen-1.5b-GGUF

Built with NEO — Your Autonomous AI Agent

GGUF quantized versions of Qwen2.5-1.5B-Instruct fine-tuned with GRPO (Group Relative Policy Optimization) on tax and financial reasoning tasks.

Model Details

Property Value
Base Model Qwen/Qwen2.5-1.5B-Instruct
Fine-tuning Method GRPO (Group Relative Policy Optimization)
Domain Tax & Financial Reasoning
Architecture Qwen2
Context Length 32,768 tokens
Format GGUF

Available Quantizations

File Quantization Size Use Case
grpo-tax-qwen-1.5b-Q4_K_M.gguf Q4_K_M ~1.0 GB Best balance of speed and quality
grpo-tax-qwen-1.5b-Q8_0.gguf Q8_0 ~1.6 GB Higher quality, more RAM required

Usage

With llama.cpp

# Download the model
huggingface-cli download daksh-neo/grpo-tax-qwen-1.5b-gguf grpo-tax-qwen-1.5b-Q4_K_M.gguf

# Run inference
./llama-cli -m grpo-tax-qwen-1.5b-Q4_K_M.gguf \
  -p "<|im_start|>system\nYou are a tax expert assistant.<|im_end|>\n<|im_start|>user\nWhat is the standard deduction for 2024?<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.7

With Ollama

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./grpo-tax-qwen-1.5b-Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM "You are a helpful tax and financial assistant."
EOF

ollama create grpo-tax-qwen-1.5b -f Modelfile
ollama run grpo-tax-qwen-1.5b

With Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="daksh-neo/grpo-tax-qwen-1.5b-gguf",
    filename="grpo-tax-qwen-1.5b-Q4_K_M.gguf",
    n_ctx=4096,
)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful tax assistant."},
        {"role": "user", "content": "Explain what a W-2 form is."}
    ]
)
print(response["choices"][0]["message"]["content"])

Training Details

This model was fine-tuned using GRPO (Group Relative Policy Optimization), a reinforcement learning from human feedback (RLHF) variant that optimizes the model's responses on tax and financial reasoning tasks without requiring a separate reward model. GRPO trains by comparing groups of sampled responses and reinforcing higher-quality answers.

Training focus areas:

  • Federal and state tax regulations
  • Tax form interpretation (W-2, 1099, Schedule C, etc.)
  • Deductions and credits
  • Tax planning strategies
  • Financial compliance questions

Limitations

  • This model is fine-tuned on tax knowledge up to its training cutoff and may not reflect the latest tax law changes.
  • Always consult a qualified tax professional for official tax advice.
  • The model is not a substitute for professional legal or financial guidance.

License

Apache 2.0 — see Qwen2.5 license for base model terms.


Built with NEO — Your Autonomous AI Agent
Description
Model synced from source: daksh-neo/grpo-tax-qwen-1.5b-gguf
Readme 26 KiB