初始化项目，由ModelHub XC社区提供模型

Model: OpenLearnLM/special-r1-deepseek-qwen3-8b-sped-adaptive-think-noreward Source: Original Platform
2026-04-22 04:26:56 +08:00
commit 988887d6f4
12 changed files with 870 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,98 @@
+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-7B-Instruct
+tags:
+  - reinforcement-learning
+  - GRPO
+  - reasoning
+  - education
+  - qwen2.5
+language:
+  - en
+  - ko
+pipeline_tag: text-generation
+library_name: transformers
+---
+
+# Special-R1-Qwen2.5-7B-NoThink
+
+A reasoning-enhanced language model fine-tuned from Qwen2.5-7B-Instruct using GRPO (Group Relative Policy Optimization) for special education applications.
+
+## Model Description
+
+This model is trained to provide direct, concise answers without explicit chain-of-thought reasoning steps (NoThink variant). It focuses on generating accurate responses efficiently.
+
+- **Base Model**: [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
+- **Training Method**: GRPO (Group Relative Policy Optimization)
+- **Training Steps**: 300
+- **Focus**: Direct answer generation without verbose reasoning
+
+## Training Details
+
+### Training Configuration
+- **Framework**: veRL (Volcano Engine Reinforcement Learning)
+- **Algorithm**: GRPO
+- **Batch Size**: Configured for 4x GPU setup
+- **Precision**: bfloat16
+
+### Training Data
+- Educational reasoning tasks
+- Mathematical problem solving
+- General knowledge Q&A
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "OpenLearnLM/special-r1-qwen2.5-7b-nothink"
+
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+
+messages = [
+    {"role": "user", "content": "What is the capital of France?"}
+]
+
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+
+outputs = model.generate(**inputs, max_new_tokens=512)
+response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
+print(response)
+```
+
+## Model Variants
+
+| Model | Description |
+|-------|-------------|
+| **special-r1-qwen2.5-7b-nothink** (this) | Direct answers without explicit reasoning |
+| special-r1-qwen2.5-7b-think | With chain-of-thought reasoning |
+
+## Limitations
+
+- Trained primarily on English and Korean data
+- May not perform optimally on highly specialized domains outside training distribution
+- As an early checkpoint (step 300), performance may improve with continued training
+
+## Citation
+
+If you use this model, please cite:
+
+```bibtex
+@misc{openlearnlm2025special,
+  title={Special-R1: Reasoning Models for Education},
+  author={OpenLearnLM Team},
+  year={2025},
+  publisher={HuggingFace},
+  url={https://huggingface.co/OpenLearnLM/special-r1-qwen2.5-7b-nothink}
+}
+```
+
+## License
+
+This model is released under the Apache 2.0 License, following the base model's license.