--- base_model: UCL-CSSB/PlasmidGPT-SFT library_name: transformers model_name: PlasmidGPT-RL tags: - generated_from_trainer - grpo - trl - plasmid - biology - dna license: mit --- # PlasmidGPT-RL This model is a fine-tuned version of [UCL-CSSB/PlasmidGPT-SFT](https://huggingface.co/UCL-CSSB/PlasmidGPT-SFT) using Group Relative Policy Optimization (GRPO). ## Model Description PlasmidGPT-RL is trained to generate functional plasmid DNA sequences. It was fine-tuned using reinforcement learning with a reward model that evaluates: - Presence of valid origins of replication (OriV) - Presence of antibiotic resistance genes (ARGs) - Absence of problematic repeat sequences ## Training This model was trained with GRPO using the [TRL library](https://github.com/huggingface/trl). **Training run**: [Weights & Biases](https://wandb.ai/ucl-cssb/PlasmidRL/runs/4e783zua) ### Training Details - **Base model**: UCL-CSSB/PlasmidGPT-SFT - **Method**: GRPO (Group Relative Policy Optimization) - **Checkpoint**: 800 steps ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidGPT-RL") model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidGPT-RL") # Generate a plasmid sequence prompt = "ATG" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( inputs.input_ids, max_new_tokens=256, do_sample=True, temperature=0.95, top_p=0.9 ) sequence = tokenizer.decode(outputs[0], skip_special_tokens=True) print(sequence) ``` ## Framework Versions - TRL: 0.23.1 - Transformers: 4.57.0 - PyTorch: 2.8.0 ## Citation If you use this model, please cite the GRPO paper: ```bibtex @article{shao2024deepseekmath, title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}}, author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo}, year={2024}, eprint={arXiv:2402.03300}, } ```