PlasmidGPT-RL/README.md

---
base_model: UCL-CSSB/PlasmidGPT-SFT
library_name: transformers
model_name: PlasmidGPT-RL
tags:
- generated_from_trainer
- grpo
- trl
- plasmid
- biology
- dna
license: mit
---

# PlasmidGPT-RL

This model is a fine-tuned version of [UCL-CSSB/PlasmidGPT-SFT](https://huggingface.co/UCL-CSSB/PlasmidGPT-SFT) using Group Relative Policy Optimization (GRPO).

## Model Description

PlasmidGPT-RL is trained to generate functional plasmid DNA sequences. It was fine-tuned using reinforcement learning with a reward model that evaluates:
- Presence of valid origins of replication (OriV)
- Presence of antibiotic resistance genes (ARGs)
- Absence of problematic repeat sequences

## Training

This model was trained with GRPO using the [TRL library](https://github.com/huggingface/trl).

**Training run**: [Weights & Biases](https://wandb.ai/ucl-cssb/PlasmidRL/runs/4e783zua)

### Training Details
- **Base model**: UCL-CSSB/PlasmidGPT-SFT
- **Method**: GRPO (Group Relative Policy Optimization)
- **Checkpoint**: 800 steps

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidGPT-RL")
model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidGPT-RL")

# Generate a plasmid sequence
prompt = "ATG"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.95,
    top_p=0.9
)
sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sequence)
```

## Framework Versions

- TRL: 0.23.1
- Transformers: 4.57.0
- PyTorch: 2.8.0

## Citation

If you use this model, please cite the GRPO paper:

```bibtex
@article{shao2024deepseekmath,
    title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year={2024},
    eprint={arXiv:2402.03300},
}
```
初始化项目，由ModelHub XC社区提供模型 Model: McClain/PlasmidGPT-RL Source: Original Platform 2026-04-10 17:40:09 +08:00			`---`
			`base_model: UCL-CSSB/PlasmidGPT-SFT`
			`library_name: transformers`
			`model_name: PlasmidGPT-RL`
			`tags:`
			`- generated_from_trainer`
			`- grpo`
			`- trl`
			`- plasmid`
			`- biology`
			`- dna`
			`license: mit`
			`---`

			`# PlasmidGPT-RL`

			`This model is a fine-tuned version of [UCL-CSSB/PlasmidGPT-SFT](https://huggingface.co/UCL-CSSB/PlasmidGPT-SFT) using Group Relative Policy Optimization (GRPO).`

			`## Model Description`

			`PlasmidGPT-RL is trained to generate functional plasmid DNA sequences. It was fine-tuned using reinforcement learning with a reward model that evaluates:`
			`- Presence of valid origins of replication (OriV)`
			`- Presence of antibiotic resistance genes (ARGs)`
			`- Absence of problematic repeat sequences`

			`## Training`

			`This model was trained with GRPO using the [TRL library](https://github.com/huggingface/trl).`

			`Training run: [Weights & Biases](https://wandb.ai/ucl-cssb/PlasmidRL/runs/4e783zua)`

			`### Training Details`
			`- Base model: UCL-CSSB/PlasmidGPT-SFT`
			`- Method: GRPO (Group Relative Policy Optimization)`
			`- Checkpoint: 800 steps`

			`## Usage`

			```python
			`from transformers import AutoTokenizer, AutoModelForCausalLM`

			`tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidGPT-RL")`
			`model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidGPT-RL")`

			`# Generate a plasmid sequence`
			`prompt = "ATG"`
			`inputs = tokenizer(prompt, return_tensors="pt")`
			`outputs = model.generate(`
			`inputs.input_ids,`
			`max_new_tokens=256,`
			`do_sample=True,`
			`temperature=0.95,`
			`top_p=0.9`
			`)`
			`sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)`
			`print(sequence)`
			```

			`## Framework Versions`

			`- TRL: 0.23.1`
			`- Transformers: 4.57.0`
			`- PyTorch: 2.8.0`

			`## Citation`

			`If you use this model, please cite the GRPO paper:`

			```bibtex
			`@article{shao2024deepseekmath,`
			`title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},`
			`author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},`
			`year={2024},`
			`eprint={arXiv:2402.03300},`
			`}`
			```