--- license: apache-2.0 base_model: Qwen/Qwen2.5-7B-Instruct tags: - reinforcement-learning - GRPO - reasoning - education - qwen2.5 language: - en - ko pipeline_tag: text-generation library_name: transformers --- # Special-R1-Qwen2.5-7B-NoThink A reasoning-enhanced language model fine-tuned from Qwen2.5-7B-Instruct using GRPO (Group Relative Policy Optimization) for special education applications. ## Model Description This model is trained to provide direct, concise answers without explicit chain-of-thought reasoning steps (NoThink variant). It focuses on generating accurate responses efficiently. - **Base Model**: [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) - **Training Method**: GRPO (Group Relative Policy Optimization) - **Training Steps**: 300 - **Focus**: Direct answer generation without verbose reasoning ## Training Details ### Training Configuration - **Framework**: veRL (Volcano Engine Reinforcement Learning) - **Algorithm**: GRPO - **Batch Size**: Configured for 4x GPU setup - **Precision**: bfloat16 ### Training Data - Educational reasoning tasks - Mathematical problem solving - General knowledge Q&A ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "OpenLearnLM/special-r1-qwen2.5-7b-nothink" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) messages = [ {"role": "user", "content": "What is the capital of France?"} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(response) ``` ## Model Variants | Model | Description | |-------|-------------| | **special-r1-qwen2.5-7b-nothink** (this) | Direct answers without explicit reasoning | | special-r1-qwen2.5-7b-think | With chain-of-thought reasoning | ## Limitations - Trained primarily on English and Korean data - May not perform optimally on highly specialized domains outside training distribution - As an early checkpoint (step 300), performance may improve with continued training ## Citation If you use this model, please cite: ```bibtex @misc{openlearnlm2025special, title={Special-R1: Reasoning Models for Education}, author={OpenLearnLM Team}, year={2025}, publisher={HuggingFace}, url={https://huggingface.co/OpenLearnLM/special-r1-qwen2.5-7b-nothink} } ``` ## License This model is released under the Apache 2.0 License, following the base model's license.