302 lines
8.2 KiB
Markdown
302 lines
8.2 KiB
Markdown
---
|
|
license: apache-2.0
|
|
base_model: Qwen/Qwen2.5-3B-Instruct
|
|
tags:
|
|
- reasoning
|
|
- chain-of-thought
|
|
- thinking
|
|
- qwen2.5
|
|
- merged-model
|
|
- retrace
|
|
- openo1
|
|
datasets:
|
|
- nnsohamnn/ReTrace501-v1
|
|
- O1-OPEN/OpenO1-SFT
|
|
language:
|
|
- en
|
|
pipeline_tag: text-generation
|
|
---
|
|
|
|
# 🧠 Qwen2.5-3B-Instruct ReTrace-OpenO1 Merged
|
|
|
|
<div align="center">
|
|
|
|
[](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged)
|
|
[](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA)
|
|
[](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
|
|
[](LICENSE)
|
|
|
|
**A reasoning-focused model trained on 5,000 chain-of-thought examples**
|
|
|
|
[🚀 Try Demo](https://huggingface.co/spaces/nnsohamnn/Qwen-2.5-3b-Think-QLora) • [📊 Dataset ReTrace](https://huggingface.co/datasets/nnsohamnn/ReTrace501-v1) • [📊 Dataset OpenO1](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## 📋 Model Description
|
|
|
|
This is a **fully merged model** of Qwen2.5-3B-Instruct fine-tuned with LoRA on 5,000 reasoning samples (500 ReTrace + 4,500 OpenO1-SFT). The model generates structured reasoning with explicit `<Thought>` and `<Output>` tags, demonstrating enhanced step-by-step problem-solving capabilities.
|
|
|
|
### 🎯 Key Features
|
|
|
|
- ✅ **Fully Merged**: Ready-to-use model (no adapter loading needed)
|
|
- ✅ **Structured Reasoning**: Outputs thinking in `<Thought>` tags, final answer in `<Output>` tags
|
|
- ✅ **5K Training Samples**: 500 ReTrace + 4,500 OpenO1-SFT examples
|
|
- ✅ **Multi-Domain**: Math, logic, word problems, and general reasoning
|
|
- ✅ **Production Ready**: FP16, 6GB model size
|
|
|
|
---
|
|
|
|
## 📊 Training Loss
|
|
|
|

|
|
|
|
### 📈 Training Statistics
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| **Initial Loss** | 1.3374 |
|
|
| **Final Loss** | 0.6798 |
|
|
| **Best Loss** | 0.6662 (Step 240) |
|
|
| **Improvement** | 49.2% ↓ |
|
|
| **Total Steps** | 310 |
|
|
|
|
---
|
|
|
|
## ⚙️ Training Configuration
|
|
|
|
```
|
|
# Model
|
|
BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
|
|
MAX_SEQ_LENGTH = 4096
|
|
|
|
# LoRA
|
|
LORA_R = 64
|
|
LORA_ALPHA = 128
|
|
LORA_DROPOUT = 0.05
|
|
|
|
# Training
|
|
BATCH_SIZE = 8
|
|
GRADIENT_ACCUMULATION = 4
|
|
LEARNING_RATE = 2e-4
|
|
NUM_EPOCHS = 2
|
|
WARMUP_STEPS = 50
|
|
|
|
# Datasets
|
|
- 500 samples from ReTrace501-v1
|
|
- 4,500 samples from OpenO1-SFT
|
|
```
|
|
|
|
---
|
|
|
|
## 🚀 Usage
|
|
|
|
### Installation
|
|
|
|
```
|
|
pip install torch transformers accelerate
|
|
```
|
|
|
|
### Quick Inference
|
|
|
|
```
|
|
import torch
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
# =========================
|
|
# Load model and tokenizer
|
|
# =========================
|
|
model_name = "nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged"
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(
|
|
model_name,
|
|
trust_remote_code=True
|
|
)
|
|
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_name,
|
|
torch_dtype=torch.float16,
|
|
device_map="auto",
|
|
trust_remote_code=True
|
|
)
|
|
|
|
# =========================
|
|
# LLM question function
|
|
# =========================
|
|
def ask_llm(question: str):
|
|
messages = [
|
|
{
|
|
"role": "system",
|
|
"content": (
|
|
"You are a helpful AI assistant. When solving problems, show your detailed reasoning process inside <Thought> tags, then provide your final answer inside <Output> tags and explain the final answer from reasoning in short. Break down complex problems step-by-step."
|
|
)
|
|
},
|
|
{
|
|
"role": "user",
|
|
"content": question
|
|
}
|
|
]
|
|
|
|
prompt = tokenizer.apply_chat_template(
|
|
messages,
|
|
tokenize=False,
|
|
add_generation_prompt=True
|
|
)
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|
|
|
outputs = model.generate(
|
|
**inputs,
|
|
max_new_tokens=1024,
|
|
temperature=0.7,
|
|
top_p=0.9
|
|
)
|
|
|
|
# Correct slicing
|
|
prompt_len = inputs["input_ids"].shape[1]
|
|
response = tokenizer.decode(
|
|
outputs[0][prompt_len:],
|
|
skip_special_tokens=True
|
|
)
|
|
|
|
return response
|
|
|
|
|
|
# =========================
|
|
# Change ONLY this block
|
|
# =========================
|
|
question = """
|
|
A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
|
|
"""
|
|
|
|
print(ask_llm(question))
|
|
|
|
```
|
|
|
|
### Expected Output
|
|
# Question
|
|
A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
|
|
```
|
|
<Thought>
|
|
Let's define the events:
|
|
- \( D \): The event that the item is defective.
|
|
- \( D^c \): The event that the item is not defective.
|
|
- \( T \): The event that the test reports the item as defective.
|
|
|
|
Given probabilities:
|
|
- \( P(D) = 0.04 \) (4% defective)
|
|
- \( P(T|D) = 0.95 \) (Test correctly identifies defective items)
|
|
- \( P(T|D^c) = 0.03 \) (Test incorrectly labels good items as defective)
|
|
|
|
We need to find \( P(D|T) \), the probability that the item is defective given that the test reports it as defective.
|
|
|
|
Using Bayes' theorem:
|
|
\[
|
|
P(D|T) = \frac{P(T|D)P(D)}{P(T)}
|
|
\]
|
|
|
|
First, we need to find \( P(T) \), the total probability that the test reports a defective item. This can be found using the law of total probability:
|
|
\[
|
|
P(T) = P(T|D)P(D) + P(T|D^c)P(D^c)
|
|
\]
|
|
|
|
Calculate each term:
|
|
\[
|
|
P(D^c) = 1 - P(D) = 1 - 0.04 = 0.96
|
|
\]
|
|
\[
|
|
P(T|D^c) = 0.03
|
|
\]
|
|
\[
|
|
P(T) = (0.95)(0.04) + (0.03)(0.96) = 0.038 + 0.0288 = 0.0668
|
|
\]
|
|
|
|
Now, substitute back into Bayes' theorem:
|
|
\[
|
|
P(D|T) = \frac{(0.95)(0.04)}{0.0668} = \frac{0.038}{0.0668} \approx 0.572
|
|
\]
|
|
|
|
So, the probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.
|
|
|
|
</Thought>
|
|
<Output>
|
|
The probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.
|
|
</Output>
|
|
```
|
|
|
|
---
|
|
|
|
## 📚 Training Datasets
|
|
|
|
### ReTrace501-v1 (500 samples)
|
|
High-quality chain-of-thought reasoning examples focusing on mathematical problem-solving with explicit reasoning steps.
|
|
|
|
**Source:** [nnsohamnn/ReTrace501-v1](https://huggingface.co/datasets/nnsohamnn/ReTrace501-v1)
|
|
|
|
### OpenO1-SFT (4,500 samples)
|
|
Diverse reasoning dataset covering multiple domains including logic, math, science, and general problem-solving.
|
|
|
|
**Source:** [O1-OPEN/OpenO1-SFT](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)
|
|
|
|
---
|
|
|
|
## 🔧 Technical Details
|
|
|
|
| Component | Specification |
|
|
|-----------|---------------|
|
|
| **Architecture** | Qwen2.5 Transformer |
|
|
| **Parameters** | 3.09 Billion |
|
|
| **Context Length** | 4096 tokens |
|
|
| **Precision** | FP16 |
|
|
| **Training Framework** | Unsloth + HuggingFace Transformers |
|
|
|
|
---
|
|
|
|
## 📖 Citation
|
|
|
|
```
|
|
@misc{qwen25-retrace-openo1-merged,
|
|
author = {nnsohamnn},
|
|
title = {Qwen2.5-3B ReTrace-OpenO1 Merged},
|
|
year = {2025},
|
|
publisher = {HuggingFace},
|
|
url = {https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🔗 Related Resources
|
|
|
|
- **LoRA Adapters:** [nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA)
|
|
- **Base Model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
|
|
- **Demo Space:** [Try it live!](https://huggingface.co/spaces/nnsohamnn/Qwen-2.5-3b-Think-QLora)
|
|
|
|
---
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
- **Qwen Team** for the excellent base model
|
|
- **Unsloth AI** for efficient training tools
|
|
- **OpenO1** communities for high-quality datasets
|
|
|
|
---
|
|
|
|
## 📝 License
|
|
|
|
Apache 2.0 - See [LICENSE](LICENSE) for details.
|
|
|
|
---
|
|
|
|
<div align="center">
|
|
|
|
**Made with ❤️ by [nnsohamnn](https://huggingface.co/nnsohamnn)**
|
|
|
|
⭐ Star this repo if you find it useful!
|
|
|
|
[Report Issues](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged/discussions) • [Discussions](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged/discussions)
|
|
|
|
</div>
|
|
|