初始化项目,由ModelHub XC社区提供模型
Model: ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-1.2k-dsr-sub Source: Original Platform
This commit is contained in:
13
README.md
Normal file
13
README.md
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
base_model:
|
||||
- Qwen/Qwen2.5-Math-1.5B
|
||||
datasets:
|
||||
- ypwang61/One-Shot-RLVR-Datasets
|
||||
---
|
||||
|
||||
This repository contains the model presented in [Reinforcement Learning for Reasoning in Large Language Models with One Training Example](https://huggingface.co/papers/2504.20571).
|
||||
|
||||
Code: https://github.com/ypwang61/One-Shot-RLVR
|
||||
Reference in New Issue
Block a user