---
base_model: UCL-CSSB/PlasmidGPT-SFT
library_name: transformers
model_name: PlasmidGPT-RL
tags:
- generated_from_trainer
- grpo
- trl
- plasmid
- biology
- dna
license: mit
---

# PlasmidGPT-RL

This model is a fine-tuned version of [UCL-CSSB/PlasmidGPT-SFT](https://huggingface.co/UCL-CSSB/PlasmidGPT-SFT) using Group Relative Policy Optimization (GRPO).

## Model Description

PlasmidGPT-RL is trained to generate functional plasmid DNA sequences. It was fine-tuned using reinforcement learning with a reward model that evaluates:
- Presence of valid origins of replication (OriV)
- Presence of antibiotic resistance genes (ARGs)
- Absence of problematic repeat sequences

## Training

This model was trained with GRPO using the [TRL library](https://github.com/huggingface/trl).

**Training run**: [Weights & Biases](https://wandb.ai/ucl-cssb/PlasmidRL/runs/4e783zua)

### Training Details
- **Base model**: UCL-CSSB/PlasmidGPT-SFT
- **Method**: GRPO (Group Relative Policy Optimization)
- **Checkpoint**: 800 steps

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidGPT-RL")
model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidGPT-RL")

# Generate a plasmid sequence
prompt = "ATG"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.95,
    top_p=0.9
)
sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sequence)
```

## Framework Versions

- TRL: 0.23.1
- Transformers: 4.57.0
- PyTorch: 2.8.0

## Citation

If you use this model, please cite the GRPO paper:

```bibtex
@article{shao2024deepseekmath,
    title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year={2024},
    eprint={arXiv:2402.03300},
}
```