初始化项目,由ModelHub XC社区提供模型

Model: daksh-neo/grpo-tax-qwen-1.5b-gguf
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-19 15:48:29 +08:00
commit c743d1a865
4 changed files with 165 additions and 0 deletions

37
.gitattributes vendored Normal file
View File

@@ -0,0 +1,37 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
grpo-tax-qwen-1.5b-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
grpo-tax-qwen-1.5b-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

122
README.md Normal file
View File

@@ -0,0 +1,122 @@
---
license: apache-2.0
language:
- en
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
- gguf
- qwen2
- grpo
- tax
- finance
- fine-tuned
pipeline_tag: text-generation
---
# grpo-tax-qwen-1.5b-GGUF
> Built with [NEO — Your Autonomous AI Agent](https://heyneo.com)
GGUF quantized versions of **Qwen2.5-1.5B-Instruct** fine-tuned with **GRPO (Group Relative Policy Optimization)** on tax and financial reasoning tasks.
## Model Details
| Property | Value |
|----------|-------|
| Base Model | Qwen/Qwen2.5-1.5B-Instruct |
| Fine-tuning Method | GRPO (Group Relative Policy Optimization) |
| Domain | Tax & Financial Reasoning |
| Architecture | Qwen2 |
| Context Length | 32,768 tokens |
| Format | GGUF |
## Available Quantizations
| File | Quantization | Size | Use Case |
|------|-------------|------|----------|
| `grpo-tax-qwen-1.5b-Q4_K_M.gguf` | Q4_K_M | ~1.0 GB | Best balance of speed and quality |
| `grpo-tax-qwen-1.5b-Q8_0.gguf` | Q8_0 | ~1.6 GB | Higher quality, more RAM required |
## Usage
### With llama.cpp
```bash
# Download the model
huggingface-cli download daksh-neo/grpo-tax-qwen-1.5b-gguf grpo-tax-qwen-1.5b-Q4_K_M.gguf
# Run inference
./llama-cli -m grpo-tax-qwen-1.5b-Q4_K_M.gguf \
-p "<|im_start|>system\nYou are a tax expert assistant.<|im_end|>\n<|im_start|>user\nWhat is the standard deduction for 2024?<|im_end|>\n<|im_start|>assistant\n" \
-n 512 --temp 0.7
```
### With Ollama
```bash
# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./grpo-tax-qwen-1.5b-Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM "You are a helpful tax and financial assistant."
EOF
ollama create grpo-tax-qwen-1.5b -f Modelfile
ollama run grpo-tax-qwen-1.5b
```
### With Python (llama-cpp-python)
```python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="daksh-neo/grpo-tax-qwen-1.5b-gguf",
filename="grpo-tax-qwen-1.5b-Q4_K_M.gguf",
n_ctx=4096,
)
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a helpful tax assistant."},
{"role": "user", "content": "Explain what a W-2 form is."}
]
)
print(response["choices"][0]["message"]["content"])
```
## Training Details
This model was fine-tuned using GRPO (Group Relative Policy Optimization), a reinforcement learning from human feedback (RLHF) variant that optimizes the model's responses on tax and financial reasoning tasks without requiring a separate reward model. GRPO trains by comparing groups of sampled responses and reinforcing higher-quality answers.
**Training focus areas:**
- Federal and state tax regulations
- Tax form interpretation (W-2, 1099, Schedule C, etc.)
- Deductions and credits
- Tax planning strategies
- Financial compliance questions
## Limitations
- This model is fine-tuned on tax knowledge up to its training cutoff and may not reflect the latest tax law changes.
- Always consult a qualified tax professional for official tax advice.
- The model is not a substitute for professional legal or financial guidance.
## Related Models
- [daksh-neo/grpo-tax-qwen-3b-gguf](https://huggingface.co/daksh-neo/grpo-tax-qwen-3b-gguf) — Larger 3B version with higher accuracy
## License
Apache 2.0 — see [Qwen2.5 license](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE) for base model terms.
---
<div align="center">
Built with <a href="https://heyneo.com">NEO</a> — Your Autonomous AI Agent
</div>

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8ee32daa14af7c1911237ec375d9febe1d0086e216263804e9859639b06f7e73
size 986048096

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9b900e234dffad2155c8a8226d9fe26579d2a7ed5d7c7a1350e87be91998175d
size 1646572640