77 lines
2.0 KiB
Markdown
77 lines
2.0 KiB
Markdown
|
|
---
|
||
|
|
base_model: UCL-CSSB/PlasmidGPT-SFT
|
||
|
|
library_name: transformers
|
||
|
|
model_name: PlasmidGPT-RL
|
||
|
|
tags:
|
||
|
|
- generated_from_trainer
|
||
|
|
- grpo
|
||
|
|
- trl
|
||
|
|
- plasmid
|
||
|
|
- biology
|
||
|
|
- dna
|
||
|
|
license: mit
|
||
|
|
---
|
||
|
|
|
||
|
|
# PlasmidGPT-RL
|
||
|
|
|
||
|
|
This model is a fine-tuned version of [UCL-CSSB/PlasmidGPT-SFT](https://huggingface.co/UCL-CSSB/PlasmidGPT-SFT) using Group Relative Policy Optimization (GRPO).
|
||
|
|
|
||
|
|
## Model Description
|
||
|
|
|
||
|
|
PlasmidGPT-RL is trained to generate functional plasmid DNA sequences. It was fine-tuned using reinforcement learning with a reward model that evaluates:
|
||
|
|
- Presence of valid origins of replication (OriV)
|
||
|
|
- Presence of antibiotic resistance genes (ARGs)
|
||
|
|
- Absence of problematic repeat sequences
|
||
|
|
|
||
|
|
## Training
|
||
|
|
|
||
|
|
This model was trained with GRPO using the [TRL library](https://github.com/huggingface/trl).
|
||
|
|
|
||
|
|
**Training run**: [Weights & Biases](https://wandb.ai/ucl-cssb/PlasmidRL/runs/4e783zua)
|
||
|
|
|
||
|
|
### Training Details
|
||
|
|
- **Base model**: UCL-CSSB/PlasmidGPT-SFT
|
||
|
|
- **Method**: GRPO (Group Relative Policy Optimization)
|
||
|
|
- **Checkpoint**: 800 steps
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||
|
|
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidGPT-RL")
|
||
|
|
model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidGPT-RL")
|
||
|
|
|
||
|
|
# Generate a plasmid sequence
|
||
|
|
prompt = "ATG"
|
||
|
|
inputs = tokenizer(prompt, return_tensors="pt")
|
||
|
|
outputs = model.generate(
|
||
|
|
inputs.input_ids,
|
||
|
|
max_new_tokens=256,
|
||
|
|
do_sample=True,
|
||
|
|
temperature=0.95,
|
||
|
|
top_p=0.9
|
||
|
|
)
|
||
|
|
sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||
|
|
print(sequence)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Framework Versions
|
||
|
|
|
||
|
|
- TRL: 0.23.1
|
||
|
|
- Transformers: 4.57.0
|
||
|
|
- PyTorch: 2.8.0
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
|
||
|
|
If you use this model, please cite the GRPO paper:
|
||
|
|
|
||
|
|
```bibtex
|
||
|
|
@article{shao2024deepseekmath,
|
||
|
|
title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
|
||
|
|
author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
|
||
|
|
year={2024},
|
||
|
|
eprint={arXiv:2402.03300},
|
||
|
|
}
|
||
|
|
```
|