初始化项目，由ModelHub XC社区提供模型

Model: nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged Source: Original Platform
2026-05-09 10:30:45 +08:00
commit 47e9b22809
15 changed files with 152573 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,301 @@
+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-3B-Instruct
+tags:
+- reasoning
+- chain-of-thought
+- thinking
+- qwen2.5
+- merged-model
+- retrace
+- openo1
+datasets:
+- nnsohamnn/ReTrace501-v1
+- O1-OPEN/OpenO1-SFT
+language:
+- en
+pipeline_tag: text-generation
+---
+
+# 🧠 Qwen2.5-3B-Instruct ReTrace-OpenO1 Merged
+
+<div align="center">
+  
+[![Merged Model](https://img.shields.io/badge/🔥-Merged_Model-blue)](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged)
+[![LoRA Adapters](https://img.shields.io/badge/🔧-LoRA_Weights-green)](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA)
+[![Base Model](https://img.shields.io/badge/📦-Qwen2.5--3B--Instruct-orange)](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
+[![License](https://img.shields.io/badge/⚖️-Apache_2.0-red)](LICENSE)
+
+**A reasoning-focused model trained on 5,000 chain-of-thought examples**
+
+[🚀 Try Demo](https://huggingface.co/spaces/nnsohamnn/Qwen-2.5-3b-Think-QLora) • [📊 Dataset ReTrace](https://huggingface.co/datasets/nnsohamnn/ReTrace501-v1) • [📊 Dataset OpenO1](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)
+
+</div>
+
+---
+
+## 📋 Model Description
+
+This is a **fully merged model** of Qwen2.5-3B-Instruct fine-tuned with LoRA on 5,000 reasoning samples (500 ReTrace + 4,500 OpenO1-SFT). The model generates structured reasoning with explicit `<Thought>` and `<Output>` tags, demonstrating enhanced step-by-step problem-solving capabilities.
+
+### 🎯 Key Features
+
+- ✅ **Fully Merged**: Ready-to-use model (no adapter loading needed)
+- ✅ **Structured Reasoning**: Outputs thinking in `<Thought>` tags, final answer in `<Output>` tags
+- ✅ **5K Training Samples**: 500 ReTrace + 4,500 OpenO1-SFT examples
+- ✅ **Multi-Domain**: Math, logic, word problems, and general reasoning
+- ✅ **Production Ready**: FP16, 6GB model size
+
+---
+
+## 📊 Training Loss
+
+![Training Loss](training_plot.png)
+
+### 📈 Training Statistics
+
+| Metric | Value |
+|--------|-------|
+| **Initial Loss** | 1.3374 |
+| **Final Loss** | 0.6798 |
+| **Best Loss** | 0.6662 (Step 240) |
+| **Improvement** | 49.2% ↓ |
+| **Total Steps** | 310 |
+
+---
+
+## ⚙️ Training Configuration
+
+```
+# Model
+BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
+MAX_SEQ_LENGTH = 4096
+
+# LoRA
+LORA_R = 64
+LORA_ALPHA = 128
+LORA_DROPOUT = 0.05
+
+# Training
+BATCH_SIZE = 8
+GRADIENT_ACCUMULATION = 4
+LEARNING_RATE = 2e-4
+NUM_EPOCHS = 2
+WARMUP_STEPS = 50
+
+# Datasets
+- 500 samples from ReTrace501-v1
+- 4,500 samples from OpenO1-SFT
+```
+
+---
+
+## 🚀 Usage
+
+### Installation
+
+```
+pip install torch transformers accelerate
+```
+
+### Quick Inference
+
+```
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+# =========================
+# Load model and tokenizer
+# =========================
+model_name = "nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged"
+
+tokenizer = AutoTokenizer.from_pretrained(
+    model_name,
+    trust_remote_code=True
+)
+
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.float16,
+    device_map="auto",
+    trust_remote_code=True
+)
+
+# =========================
+# LLM question function
+# =========================
+def ask_llm(question: str):
+    messages = [
+        {
+            "role": "system",
+            "content": (
+                "You are a helpful AI assistant. When solving problems, show your detailed reasoning process inside <Thought> tags, then provide your final answer inside <Output> tags and explain the final answer from reasoning in short. Break down complex problems step-by-step."
+            )
+        },
+        {
+            "role": "user",
+            "content": question
+        }
+    ]
+
+    prompt = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=1024,
+        temperature=0.7,
+        top_p=0.9
+    )
+
+    # Correct slicing
+    prompt_len = inputs["input_ids"].shape[1]
+    response = tokenizer.decode(
+        outputs[0][prompt_len:],
+        skip_special_tokens=True
+    )
+
+    return response
+
+
+# =========================
+# Change ONLY this block 
+# =========================
+question = """
+A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
+"""
+
+print(ask_llm(question))
+
+```
+
+### Expected Output
+# Question
+A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
+```
+<Thought>
+Let's define the events:
+- \( D \): The event that the item is defective.
+- \( D^c \): The event that the item is not defective.
+- \( T \): The event that the test reports the item as defective.
+
+Given probabilities:
+- \( P(D) = 0.04 \) (4% defective)
+- \( P(T|D) = 0.95 \) (Test correctly identifies defective items)
+- \( P(T|D^c) = 0.03 \) (Test incorrectly labels good items as defective)
+
+We need to find \( P(D|T) \), the probability that the item is defective given that the test reports it as defective.
+
+Using Bayes' theorem:
+\[
+P(D|T) = \frac{P(T|D)P(D)}{P(T)}
+\]
+
+First, we need to find \( P(T) \), the total probability that the test reports a defective item. This can be found using the law of total probability:
+\[
+P(T) = P(T|D)P(D) + P(T|D^c)P(D^c)
+\]
+
+Calculate each term:
+\[
+P(D^c) = 1 - P(D) = 1 - 0.04 = 0.96
+\]
+\[
+P(T|D^c) = 0.03
+\]
+\[
+P(T) = (0.95)(0.04) + (0.03)(0.96) = 0.038 + 0.0288 = 0.0668
+\]
+
+Now, substitute back into Bayes' theorem:
+\[
+P(D|T) = \frac{(0.95)(0.04)}{0.0668} = \frac{0.038}{0.0668} \approx 0.572
+\]
+
+So, the probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.
+
+</Thought>
+<Output>
+The probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.
+</Output>
+```
+
+---
+
+## 📚 Training Datasets
+
+### ReTrace501-v1 (500 samples)
+High-quality chain-of-thought reasoning examples focusing on mathematical problem-solving with explicit reasoning steps.
+
+**Source:** [nnsohamnn/ReTrace501-v1](https://huggingface.co/datasets/nnsohamnn/ReTrace501-v1)
+
+### OpenO1-SFT (4,500 samples)
+Diverse reasoning dataset covering multiple domains including logic, math, science, and general problem-solving.
+
+**Source:** [O1-OPEN/OpenO1-SFT](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)
+
+---
+
+## 🔧 Technical Details
+
+| Component | Specification |
+|-----------|---------------|
+| **Architecture** | Qwen2.5 Transformer |
+| **Parameters** | 3.09 Billion |
+| **Context Length** | 4096 tokens |
+| **Precision** | FP16 |
+| **Training Framework** | Unsloth + HuggingFace Transformers |
+
+---
+
+## 📖 Citation
+
+```
+@misc{qwen25-retrace-openo1-merged,
+  author = {nnsohamnn},
+  title = {Qwen2.5-3B ReTrace-OpenO1 Merged},
+  year = {2025},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged}
+}
+```
+
+---
+
+## 🔗 Related Resources
+
+- **LoRA Adapters:** [nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA)
+- **Base Model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
+- **Demo Space:** [Try it live!](https://huggingface.co/spaces/nnsohamnn/Qwen-2.5-3b-Think-QLora)
+
+---
+
+## 🙏 Acknowledgments
+
+- **Qwen Team** for the excellent base model
+- **Unsloth AI** for efficient training tools
+- **OpenO1** communities for high-quality datasets
+
+---
+
+## 📝 License
+
+Apache 2.0 - See [LICENSE](LICENSE) for details.
+
+---
+
+<div align="center">
+
+**Made with ❤️ by [nnsohamnn](https://huggingface.co/nnsohamnn)**
+
+⭐ Star this repo if you find it useful!
+
+[Report Issues](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged/discussions) • [Discussions](https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged/discussions)
+
+</div>
+