70 lines
1.9 KiB
Markdown
70 lines
1.9 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
library_name: transformers
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
base_model: Huggggooo/ProtoCycle-7B-SFT
|
||
|
|
tags:
|
||
|
|
- protein-design
|
||
|
|
- agentic
|
||
|
|
- tool-use
|
||
|
|
- qwen2.5
|
||
|
|
- reinforcement-learning
|
||
|
|
- grpo
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
---
|
||
|
|
|
||
|
|
# ProtoCycle-7B
|
||
|
|
|
||
|
|
RL checkpoint for **ProtoCycle** — an agentic protein design model that
|
||
|
|
performs multi-step, tool-augmented sequence design.
|
||
|
|
|
||
|
|
This is the **GRPO-TCR (Group Relative Policy Optimization with Tool-Call
|
||
|
|
Reward) stage**, initialised from the SFT checkpoint
|
||
|
|
[`Huggggooo/ProtoCycle-7B-SFT`](https://huggingface.co/Huggggooo/ProtoCycle-7B-SFT).
|
||
|
|
|
||
|
|
- Base model: `Huggggooo/ProtoCycle-7B-SFT`
|
||
|
|
(itself fine-tuned from `Qwen/Qwen2.5-7B-Instruct`)
|
||
|
|
- Training framework: [VeRL](https://github.com/volcengine/verl) /
|
||
|
|
[Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL)
|
||
|
|
- Stage: agentic RL with GRPO-TCR
|
||
|
|
- Rollouts per prompt: 8, max turns: 16
|
||
|
|
- Max prompt / response: 8k / 20k tokens
|
||
|
|
- Reward manager: `protein` (see
|
||
|
|
[ProtoCycle/verl/workers/reward_manager/protein.py](https://github.com/huggggoooooo/ProtoCycle/blob/main/verl/workers/reward_manager/protein.py))
|
||
|
|
|
||
|
|
|
||
|
|
See
|
||
|
|
[`recipe/protein/reward.py`](https://github.com/huggggoooooo/ProtoCycle/blob/main/recipe/protein/reward.py)
|
||
|
|
for the exact formulation.
|
||
|
|
|
||
|
|
## Training Data
|
||
|
|
|
||
|
|
10,000 RL prompts for GRPO-TCR training, available at
|
||
|
|
[Huggggooo/ProtoCycle-Data](https://huggingface.co/datasets/Huggggooo/ProtoCycle-Data) (`rl/` subset).}
|
||
|
|
|
||
|
|
## Agent Protocol
|
||
|
|
|
||
|
|
```
|
||
|
|
<think> ... reasoning ... </think>
|
||
|
|
<plan> ... stage plan ... </plan>
|
||
|
|
<tool_call>{"name": "...", "arguments": {...}}</tool_call>
|
||
|
|
...
|
||
|
|
<answer>MAEGEITPLKTF...</answer>
|
||
|
|
```
|
||
|
|
|
||
|
|
## How to Use
|
||
|
|
|
||
|
|
See the ProtoCycle repository:
|
||
|
|
[ProtoCycle](https://github.com/huggggoooooo/ProtoCycle) repo.
|
||
|
|
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
Apache-2.0.
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
|
||
|
|
If you find this work useful, please cite ProtoCycle (forthcoming) and the
|
||
|
|
upstream frameworks: VeRL, Open-AgentRL, ProTrek, ESM.
|