bac740d0ee8739f56501ed3036348a64f191d08c
Model: McClain/PlasmidGPT-RL Source: Original Platform
base_model, library_name, model_name, tags, license
| base_model | library_name | model_name | tags | license | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| UCL-CSSB/PlasmidGPT-SFT | transformers | PlasmidGPT-RL |
|
mit |
PlasmidGPT-RL
This model is a fine-tuned version of UCL-CSSB/PlasmidGPT-SFT using Group Relative Policy Optimization (GRPO).
Model Description
PlasmidGPT-RL is trained to generate functional plasmid DNA sequences. It was fine-tuned using reinforcement learning with a reward model that evaluates:
- Presence of valid origins of replication (OriV)
- Presence of antibiotic resistance genes (ARGs)
- Absence of problematic repeat sequences
Training
This model was trained with GRPO using the TRL library.
Training run: Weights & Biases
Training Details
- Base model: UCL-CSSB/PlasmidGPT-SFT
- Method: GRPO (Group Relative Policy Optimization)
- Checkpoint: 800 steps
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidGPT-RL")
model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidGPT-RL")
# Generate a plasmid sequence
prompt = "ATG"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_new_tokens=256,
do_sample=True,
temperature=0.95,
top_p=0.9
)
sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sequence)
Framework Versions
- TRL: 0.23.1
- Transformers: 4.57.0
- PyTorch: 2.8.0
Citation
If you use this model, please cite the GRPO paper:
@article{shao2024deepseekmath,
title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
year={2024},
eprint={arXiv:2402.03300},
}
Description