初始化项目,由ModelHub XC社区提供模型
Model: nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged Source: Original Platform
This commit is contained in:
301
README.md
Normal file
301
README.md
Normal file
@@ -0,0 +1,301 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen2.5-3B-Instruct
|
||||
tags:
|
||||
- reasoning
|
||||
- chain-of-thought
|
||||
- thinking
|
||||
- qwen2.5
|
||||
- merged-model
|
||||
- retrace
|
||||
- openo1
|
||||
datasets:
|
||||
- nnsohamnn/ReTrace501-v1
|
||||
- O1-OPEN/OpenO1-SFT
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# 🧠 Qwen2.5-3B-Instruct ReTrace-OpenO1 Merged
|
||||
|
||||
<div align="center">
|
||||
|
||||
[](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged)
|
||||
[](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA)
|
||||
[](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
|
||||
[](LICENSE)
|
||||
|
||||
**A reasoning-focused model trained on 5,000 chain-of-thought examples**
|
||||
|
||||
[🚀 Try Demo](https://huggingface.co/spaces/nnsohamnn/Qwen-2.5-3b-Think-QLora) • [📊 Dataset ReTrace](https://huggingface.co/datasets/nnsohamnn/ReTrace501-v1) • [📊 Dataset OpenO1](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## 📋 Model Description
|
||||
|
||||
This is a **fully merged model** of Qwen2.5-3B-Instruct fine-tuned with LoRA on 5,000 reasoning samples (500 ReTrace + 4,500 OpenO1-SFT). The model generates structured reasoning with explicit `<Thought>` and `<Output>` tags, demonstrating enhanced step-by-step problem-solving capabilities.
|
||||
|
||||
### 🎯 Key Features
|
||||
|
||||
- ✅ **Fully Merged**: Ready-to-use model (no adapter loading needed)
|
||||
- ✅ **Structured Reasoning**: Outputs thinking in `<Thought>` tags, final answer in `<Output>` tags
|
||||
- ✅ **5K Training Samples**: 500 ReTrace + 4,500 OpenO1-SFT examples
|
||||
- ✅ **Multi-Domain**: Math, logic, word problems, and general reasoning
|
||||
- ✅ **Production Ready**: FP16, 6GB model size
|
||||
|
||||
---
|
||||
|
||||
## 📊 Training Loss
|
||||
|
||||

|
||||
|
||||
### 📈 Training Statistics
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Initial Loss** | 1.3374 |
|
||||
| **Final Loss** | 0.6798 |
|
||||
| **Best Loss** | 0.6662 (Step 240) |
|
||||
| **Improvement** | 49.2% ↓ |
|
||||
| **Total Steps** | 310 |
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Training Configuration
|
||||
|
||||
```
|
||||
# Model
|
||||
BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
|
||||
MAX_SEQ_LENGTH = 4096
|
||||
|
||||
# LoRA
|
||||
LORA_R = 64
|
||||
LORA_ALPHA = 128
|
||||
LORA_DROPOUT = 0.05
|
||||
|
||||
# Training
|
||||
BATCH_SIZE = 8
|
||||
GRADIENT_ACCUMULATION = 4
|
||||
LEARNING_RATE = 2e-4
|
||||
NUM_EPOCHS = 2
|
||||
WARMUP_STEPS = 50
|
||||
|
||||
# Datasets
|
||||
- 500 samples from ReTrace501-v1
|
||||
- 4,500 samples from OpenO1-SFT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Usage
|
||||
|
||||
### Installation
|
||||
|
||||
```
|
||||
pip install torch transformers accelerate
|
||||
```
|
||||
|
||||
### Quick Inference
|
||||
|
||||
```
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
# =========================
|
||||
# Load model and tokenizer
|
||||
# =========================
|
||||
model_name = "nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(
|
||||
model_name,
|
||||
trust_remote_code=True
|
||||
)
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
torch_dtype=torch.float16,
|
||||
device_map="auto",
|
||||
trust_remote_code=True
|
||||
)
|
||||
|
||||
# =========================
|
||||
# LLM question function
|
||||
# =========================
|
||||
def ask_llm(question: str):
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": (
|
||||
"You are a helpful AI assistant. When solving problems, show your detailed reasoning process inside <Thought> tags, then provide your final answer inside <Output> tags and explain the final answer from reasoning in short. Break down complex problems step-by-step."
|
||||
)
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": question
|
||||
}
|
||||
]
|
||||
|
||||
prompt = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True
|
||||
)
|
||||
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||
|
||||
outputs = model.generate(
|
||||
**inputs,
|
||||
max_new_tokens=1024,
|
||||
temperature=0.7,
|
||||
top_p=0.9
|
||||
)
|
||||
|
||||
# Correct slicing
|
||||
prompt_len = inputs["input_ids"].shape[1]
|
||||
response = tokenizer.decode(
|
||||
outputs[0][prompt_len:],
|
||||
skip_special_tokens=True
|
||||
)
|
||||
|
||||
return response
|
||||
|
||||
|
||||
# =========================
|
||||
# Change ONLY this block
|
||||
# =========================
|
||||
question = """
|
||||
A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
|
||||
"""
|
||||
|
||||
print(ask_llm(question))
|
||||
|
||||
```
|
||||
|
||||
### Expected Output
|
||||
# Question
|
||||
A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
|
||||
```
|
||||
<Thought>
|
||||
Let's define the events:
|
||||
- \( D \): The event that the item is defective.
|
||||
- \( D^c \): The event that the item is not defective.
|
||||
- \( T \): The event that the test reports the item as defective.
|
||||
|
||||
Given probabilities:
|
||||
- \( P(D) = 0.04 \) (4% defective)
|
||||
- \( P(T|D) = 0.95 \) (Test correctly identifies defective items)
|
||||
- \( P(T|D^c) = 0.03 \) (Test incorrectly labels good items as defective)
|
||||
|
||||
We need to find \( P(D|T) \), the probability that the item is defective given that the test reports it as defective.
|
||||
|
||||
Using Bayes' theorem:
|
||||
\[
|
||||
P(D|T) = \frac{P(T|D)P(D)}{P(T)}
|
||||
\]
|
||||
|
||||
First, we need to find \( P(T) \), the total probability that the test reports a defective item. This can be found using the law of total probability:
|
||||
\[
|
||||
P(T) = P(T|D)P(D) + P(T|D^c)P(D^c)
|
||||
\]
|
||||
|
||||
Calculate each term:
|
||||
\[
|
||||
P(D^c) = 1 - P(D) = 1 - 0.04 = 0.96
|
||||
\]
|
||||
\[
|
||||
P(T|D^c) = 0.03
|
||||
\]
|
||||
\[
|
||||
P(T) = (0.95)(0.04) + (0.03)(0.96) = 0.038 + 0.0288 = 0.0668
|
||||
\]
|
||||
|
||||
Now, substitute back into Bayes' theorem:
|
||||
\[
|
||||
P(D|T) = \frac{(0.95)(0.04)}{0.0668} = \frac{0.038}{0.0668} \approx 0.572
|
||||
\]
|
||||
|
||||
So, the probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.
|
||||
|
||||
</Thought>
|
||||
<Output>
|
||||
The probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.
|
||||
</Output>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📚 Training Datasets
|
||||
|
||||
### ReTrace501-v1 (500 samples)
|
||||
High-quality chain-of-thought reasoning examples focusing on mathematical problem-solving with explicit reasoning steps.
|
||||
|
||||
**Source:** [nnsohamnn/ReTrace501-v1](https://huggingface.co/datasets/nnsohamnn/ReTrace501-v1)
|
||||
|
||||
### OpenO1-SFT (4,500 samples)
|
||||
Diverse reasoning dataset covering multiple domains including logic, math, science, and general problem-solving.
|
||||
|
||||
**Source:** [O1-OPEN/OpenO1-SFT](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Technical Details
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|---------------|
|
||||
| **Architecture** | Qwen2.5 Transformer |
|
||||
| **Parameters** | 3.09 Billion |
|
||||
| **Context Length** | 4096 tokens |
|
||||
| **Precision** | FP16 |
|
||||
| **Training Framework** | Unsloth + HuggingFace Transformers |
|
||||
|
||||
---
|
||||
|
||||
## 📖 Citation
|
||||
|
||||
```
|
||||
@misc{qwen25-retrace-openo1-merged,
|
||||
author = {nnsohamnn},
|
||||
title = {Qwen2.5-3B ReTrace-OpenO1 Merged},
|
||||
year = {2025},
|
||||
publisher = {HuggingFace},
|
||||
url = {https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Resources
|
||||
|
||||
- **LoRA Adapters:** [nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA)
|
||||
- **Base Model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
|
||||
- **Demo Space:** [Try it live!](https://huggingface.co/spaces/nnsohamnn/Qwen-2.5-3b-Think-QLora)
|
||||
|
||||
---
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
- **Qwen Team** for the excellent base model
|
||||
- **Unsloth AI** for efficient training tools
|
||||
- **OpenO1** communities for high-quality datasets
|
||||
|
||||
---
|
||||
|
||||
## 📝 License
|
||||
|
||||
Apache 2.0 - See [LICENSE](LICENSE) for details.
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
|
||||
**Made with ❤️ by [nnsohamnn](https://huggingface.co/nnsohamnn)**
|
||||
|
||||
⭐ Star this repo if you find it useful!
|
||||
|
||||
[Report Issues](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged/discussions) • [Discussions](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged/discussions)
|
||||
|
||||
</div>
|
||||
|
||||
Reference in New Issue
Block a user