Qwen2.5-3B-ReTrace-OpenO1-M…/README.md

---
license: apache-2.0
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
- reasoning
- chain-of-thought
- thinking
- qwen2.5
- merged-model
- retrace
- openo1
datasets:
- nnsohamnn/ReTrace501-v1
- O1-OPEN/OpenO1-SFT
language:
- en
pipeline_tag: text-generation
---

# 🧠 Qwen2.5-3B-Instruct ReTrace-OpenO1 Merged

<div align="center">

[![Merged Model](https://img.shields.io/badge/🔥-Merged_Model-blue)](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged)
[![LoRA Adapters](https://img.shields.io/badge/🔧-LoRA_Weights-green)](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA)
[![Base Model](https://img.shields.io/badge/📦-Qwen2.5--3B--Instruct-orange)](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
[![License](https://img.shields.io/badge/⚖️-Apache_2.0-red)](LICENSE)

**A reasoning-focused model trained on 5,000 chain-of-thought examples**

[🚀 Try Demo](https://huggingface.co/spaces/nnsohamnn/Qwen-2.5-3b-Think-QLora) • [📊 Dataset ReTrace](https://huggingface.co/datasets/nnsohamnn/ReTrace501-v1) • [📊 Dataset OpenO1](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)

</div>

---

## 📋 Model Description

This is a **fully merged model** of Qwen2.5-3B-Instruct fine-tuned with LoRA on 5,000 reasoning samples (500 ReTrace + 4,500 OpenO1-SFT). The model generates structured reasoning with explicit `<Thought>` and `<Output>` tags, demonstrating enhanced step-by-step problem-solving capabilities.

### 🎯 Key Features

- ✅ **Fully Merged**: Ready-to-use model (no adapter loading needed)
- ✅ **Structured Reasoning**: Outputs thinking in `<Thought>` tags, final answer in `<Output>` tags
- ✅ **5K Training Samples**: 500 ReTrace + 4,500 OpenO1-SFT examples
- ✅ **Multi-Domain**: Math, logic, word problems, and general reasoning
- ✅ **Production Ready**: FP16, 6GB model size

---

## 📊 Training Loss

![Training Loss](training_plot.png)

### 📈 Training Statistics

| Metric | Value |
|--------|-------|
| **Initial Loss** | 1.3374 |
| **Final Loss** | 0.6798 |
| **Best Loss** | 0.6662 (Step 240) |
| **Improvement** | 49.2% ↓ |
| **Total Steps** | 310 |

---

## ⚙️ Training Configuration

```
# Model
BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
MAX_SEQ_LENGTH = 4096

# LoRA
LORA_R = 64
LORA_ALPHA = 128
LORA_DROPOUT = 0.05

# Training
BATCH_SIZE = 8
GRADIENT_ACCUMULATION = 4
LEARNING_RATE = 2e-4
NUM_EPOCHS = 2
WARMUP_STEPS = 50

# Datasets
- 500 samples from ReTrace501-v1
- 4,500 samples from OpenO1-SFT
```

---

## 🚀 Usage

### Installation

```
pip install torch transformers accelerate
```

### Quick Inference

```
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# =========================
# Load model and tokenizer
# =========================
model_name = "nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# =========================
# LLM question function
# =========================
def ask_llm(question: str):
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful AI assistant. When solving problems, show your detailed reasoning process inside <Thought> tags, then provide your final answer inside <Output> tags and explain the final answer from reasoning in short. Break down complex problems step-by-step."
            )
        },
        {
            "role": "user",
            "content": question
        }
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9
    )

    # Correct slicing
    prompt_len = inputs["input_ids"].shape[1]
    response = tokenizer.decode(
        outputs[0][prompt_len:],
        skip_special_tokens=True
    )

    return response


# =========================
# Change ONLY this block
# =========================
question = """
A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
"""

print(ask_llm(question))

```

### Expected Output
# Question
A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
```
<Thought>
Let's define the events:
- \( D \): The event that the item is defective.
- \( D^c \): The event that the item is not defective.
- \( T \): The event that the test reports the item as defective.

Given probabilities:
- \( P(D) = 0.04 \) (4% defective)
- \( P(T|D) = 0.95 \) (Test correctly identifies defective items)
- \( P(T|D^c) = 0.03 \) (Test incorrectly labels good items as defective)

We need to find \( P(D|T) \), the probability that the item is defective given that the test reports it as defective.

Using Bayes' theorem:
\[
P(D|T) = \frac{P(T|D)P(D)}{P(T)}
\]

First, we need to find \( P(T) \), the total probability that the test reports a defective item. This can be found using the law of total probability:
\[
P(T) = P(T|D)P(D) + P(T|D^c)P(D^c)
\]

Calculate each term:
\[
P(D^c) = 1 - P(D) = 1 - 0.04 = 0.96
\]
\[
P(T|D^c) = 0.03
\]
\[
P(T) = (0.95)(0.04) + (0.03)(0.96) = 0.038 + 0.0288 = 0.0668
\]

Now, substitute back into Bayes' theorem:
\[
P(D|T) = \frac{(0.95)(0.04)}{0.0668} = \frac{0.038}{0.0668} \approx 0.572
\]

So, the probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.

</Thought>
<Output>
The probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.
</Output>
```

---

## 📚 Training Datasets

### ReTrace501-v1 (500 samples)
High-quality chain-of-thought reasoning examples focusing on mathematical problem-solving with explicit reasoning steps.

**Source:** [nnsohamnn/ReTrace501-v1](https://huggingface.co/datasets/nnsohamnn/ReTrace501-v1)

### OpenO1-SFT (4,500 samples)
Diverse reasoning dataset covering multiple domains including logic, math, science, and general problem-solving.

**Source:** [O1-OPEN/OpenO1-SFT](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)

---

## 🔧 Technical Details

| Component | Specification |
|-----------|---------------|
| **Architecture** | Qwen2.5 Transformer |
| **Parameters** | 3.09 Billion |
| **Context Length** | 4096 tokens |
| **Precision** | FP16 |
| **Training Framework** | Unsloth + HuggingFace Transformers |

---

## 📖 Citation

```
@misc{qwen25-retrace-openo1-merged,
  author = {nnsohamnn},
  title = {Qwen2.5-3B ReTrace-OpenO1 Merged},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged}
}
```

---

## 🔗 Related Resources

- **LoRA Adapters:** [nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA)
- **Base Model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
- **Demo Space:** [Try it live!](https://huggingface.co/spaces/nnsohamnn/Qwen-2.5-3b-Think-QLora)

---

## 🙏 Acknowledgments

- **Qwen Team** for the excellent base model
- **Unsloth AI** for efficient training tools
- **OpenO1** communities for high-quality datasets

---

## 📝 License

Apache 2.0 - See [LICENSE](LICENSE) for details.

---

<div align="center">

**Made with ❤️ by [nnsohamnn](https://huggingface.co/nnsohamnn)**

⭐ Star this repo if you find it useful!

[Report Issues](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged/discussions) • [Discussions](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged/discussions)

</div>