c02debee07d41b54f1385c60ac4b120ee458516e
Model: ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-1.2k-dsr-sub Source: Original Platform
license, library_name, pipeline_tag, base_model, datasets
| license | library_name | pipeline_tag | base_model | datasets | ||
|---|---|---|---|---|---|---|
| apache-2.0 | transformers | text-generation |
|
|
This repository contains the model presented in Reinforcement Learning for Reasoning in Large Language Models with One Training Example.
Description