--- library_name: transformers license: mit --- # Model Card for Model ID This is a saved checkpoint from fine-tuning a Qwen3/Qwen3-1.7B-Base model using the MaxRL objective, [**"Maximum Likelihood Reinforcement Learning"**](https://arxiv.org/abs/2602.02710). In our work, we introduce MaxRL, a framework for optimizing maximum likelihood in RL settings. ## Model Details ### Model Description This is the model card of a Qwen3/Qwen3-1.7B-Base model fine-tuned using MaxRL. - **Finetuned from model:** [Qwen3/Qwen3-1.7B-Base](https://huggingface.co/Qwen/Qwen3-1.7B-Base) ### Model Sources - **Repository:** [Official Code Release for the paper "Maximum Likelihood Reinforcement Learning"](https://github.com/tajwarfahim/maxrl) - **Paper:** [Maximum Likelihood Reinforcement Learning](https://arxiv.org/abs/2602.02710) - **Project Website:** [Project Website](https://zanette-labs.github.io/MaxRL/) ## Training Details ### Training Data We train on the [POLARIS-53K](https://huggingface.co/datasets/POLARIS-Project/Polaris-Dataset-53K) dataset to produce this checkpoint. ### Training Procedure Please use the [given script](https://github.com/tajwarfahim/maxrl/blob/main/qwen3_experiments/run_qwen3_training.sh) or in general the published [codebase](https://github.com/tajwarfahim/maxrl) to reproduce training this checkpoint. Hyperparameters and other details are provided in the training script. Due to computational constraints, we have trained for 1000 steps, and released the final checkpoint. #### Hardware This model has been finetuned using 32 NVIDIA H200 GPUs (4 nodes of 8xH200 GPUs). ## Citation **BibTeX:** ``` @misc{tajwar2026maximumlikelihoodreinforcementlearning, title={Maximum Likelihood Reinforcement Learning}, author={Fahim Tajwar and Guanning Zeng and Yueer Zhou and Yuda Song and Daman Arora and Yiding Jiang and Jeff Schneider and Ruslan Salakhutdinov and Haiwen Feng and Andrea Zanette}, year={2026}, eprint={2602.02710}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2602.02710}, } ``` ## Model Card Contact [Fahim Tajwar](mailto:tajwarfahim932@gmail.com)