初始化项目，由ModelHub XC社区提供模型

Model: ReasoningTransferability/UniReason-Qwen3-14B-RL Source: Original Platform
2026-05-07 01:47:54 +08:00
commit 83108d11b7
16 changed files with 304239 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,153 @@
+---
+base_model: qwen3-14b
+datasets:
+- math
+- reasoning
+language: en
+license: apache-2.0
+pipeline_tag: text-generation
+tags:
+- text-generation
+- math-reasoning
+- transferability
+- RL-GRPO
+- research-paper
+- qwen
+arxiv: 2507.00432
+library_name: transformers
+---
+
+# UniReason-Qwen3-14B-RL
+
+This model is associated with the research paper:
+**"Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning"**
+
+📄 **Paper**: [2507.00432](https://arxiv.org/abs/2507.00432)
+💻 **Code**: [https://github.com/ReasoningTransfer/Transferability-of-LLM-Reasoning](https://github.com/ReasoningTransfer/Transferability-of-LLM-Reasoning)
+
+## Abstract
+
+Math reasoning has become the poster child of progress in large language models (LLMs), with new models rapidly surpassing human-level performance on benchmarks like MATH and AIME. But as math leaderboards improve week by week, it is worth asking: do these gains reflect broader problem-solving ability or just narrow overfitting?
+
+## Model Description
+
+This model is a **RL-GRPO**-tuned version of qwen3-14b focused on **math-reasoning** capabilities.
+The model was developed as part of research investigating the transferability of mathematical reasoning skills to general language tasks.
+
+### Key Research Questions Addressed:
+- Does math reasoning training improve general LLM capabilities?
+- How do different training methods (RL vs SFT) affect transferability?
+- What is the trade-off between specialized math performance and general capabilities?
+
+## Model Details
+
+- **Base Model**: qwen3-14b
+- **Training Method**: RL-GRPO
+- **Primary Focus**: math-reasoning
+- **Training Data**: Math-specific datasets
+- **Architecture**: Transformer-based language model
+- **Parameters**: 14B
+
+## Training Details
+
+### Training Method: RL-GRPO
+Custom training methodology - see paper for details.
+
+### Datasets Used
+- Mathematical reasoning datasets
+- See paper for complete dataset list
+
+## Performance
+
+### Math Reasoning Benchmarks
+- **MATH**: See paper
+- **AIME**: See paper
+
+### General Capabilities
+- **General QA**: See paper
+- **Code Generation**: See paper
+- **Instruction Following**: See paper
+
+*For detailed performance metrics, please refer to the paper.*
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+
+# Load model and tokenizer
+model_name = "ReasoningTransferability/UniReason-Qwen3-14B-RL"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+
+# Example: Math reasoning
+math_prompt = "Solve this step by step: What is the derivative of x^3 + 2x^2 - 5x + 1?"
+inputs = tokenizer(math_prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=512, temperature=0.7)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+
+# Example: General reasoning
+general_prompt = "Explain the concept of supply and demand in economics."
+inputs = tokenizer(general_prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=512, temperature=0.7)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+
+## Limitations and Biases
+
+- **Specialization Trade-offs**: As explored in the paper, models optimized for math reasoning may show reduced performance on general tasks
+- **Training Method Dependencies**: Performance characteristics vary significantly between RL and SFT training approaches
+- **Domain Transfer**: The extent of capability transfer from math to other domains is limited
+- **Computational Requirements**: Model requires significant computational resources for inference
+
+## Research Findings
+
+Key findings from the associated paper:
+1. **RL vs SFT**: RL-tuned models show better transfer to general domains compared to SFT-tuned models
+2. **Capability Trade-offs**: Most math-specialized models fail to transfer gains to other domains
+3. **Forgetting**: SFT-tuned models often forget general capabilities during math-focused training
+
+## Ethical Considerations
+
+- This model is intended for research purposes
+- Users should be aware of potential biases in mathematical and general reasoning
+- The model should not be used for making critical decisions without human oversight
+- Consider the environmental impact of large model inference
+
+## Citation
+
+If you use this model in your research, please cite both the model and the associated paper:
+
+```bibtex
+@misc{huan2025doesmathreasoningimprove,
+      title={Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning}, 
+      author={Maggie Huan and Yuetai Li and Tuney Zheng and Xiaoyu Xu and Seungone Kim and Minxin Du and Radha Poovendran and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2507.00432},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2507.00432}, 
+}
+```
+
+## Contact
+
+For questions about this model or the associated research, please:
+- Open an issue in this repository
+- Contact the paper authors
+- Reference the original paper: https://arxiv.org/abs/2507.00432
+
+## Acknowledgments
+
+This work builds upon the research presented in "Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning" and uses the qwen3-14b architecture as its foundation.
+
+---
+
+*Model uploaded on 2025-07-03*