ProtoCycle-7B-SFT/README.md

---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
  - protein-design
  - agentic
  - tool-use
  - qwen2.5
  - sft
language:
  - en
---

# ProtoCycle-7B-SFT

Cold-start SFT checkpoint for **ProtoCycle** — an agentic protein design model
trained to invoke biology tools (scaffold retrieval, constraint building,
ESM inpainting, ProTrek scoring) via a `<think> / <plan> / <tool_call> /
<answer>` protocol.

This checkpoint is the **SFT stage** initialised from
[`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
and is the starting point for the subsequent RL stage
([`Huggggooo/ProtoCycle-7B`](https://huggingface.co/Huggggooo/ProtoCycle-7B)).

- Base model: `Qwen/Qwen2.5-7B-Instruct`
- Training framework: [VeRL](https://github.com/volcengine/verl) /
  [Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL)
- Stage: multi-turn SFT on agentic tool-use trajectories
- Epochs: 5
- Sequence length: 32k (with Ulysses SP=4)

## Training Data

2,000 agentic multi-turn trajectories for protein design, available at
[Huggggooo/ProtoCycle-Data](https://huggingface.co/datasets/Huggggooo/ProtoCycle-Data) (`sft/` subset).

## How to Use

See the ProtoCycle repository: 
[ProtoCycle](https://github.com/huggggoooooo/ProtoCycle) repo.


## Agent Protocol

```
<think>  ... reasoning ...  </think>
<plan>   ... stage plan ...  </plan>
<tool_call>{"name": "...", "arguments": {...}}</tool_call>
...
<answer>MAEGEITPLKTF...</answer>
```

## Training Data

Agentic multi-turn trajectories for protein design (not released here).

## License

Apache-2.0, consistent with the upstream
[VeRL](https://github.com/volcengine/verl) /
[Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL) projects and the
underlying [Qwen2.5](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) license.

## Citation

If you find this checkpoint useful, please cite the ProtoCycle paper
(forthcoming) and the upstream frameworks it builds on: VeRL, Open-AgentRL,
ProTrek and ESM.
初始化项目，由ModelHub XC社区提供模型 Model: Huggggooo/ProtoCycle-7B-SFT Source: Original Platform 2026-04-22 11:00:44 +08:00			`---`
			`license: apache-2.0`
			`library_name: transformers`
			`pipeline_tag: text-generation`
			`base_model: Qwen/Qwen2.5-7B-Instruct`
			`tags:`
			`- protein-design`
			`- agentic`
			`- tool-use`
			`- qwen2.5`
			`- sft`
			`language:`
			`- en`
			`---`

			`# ProtoCycle-7B-SFT`

			`Cold-start SFT checkpoint for ProtoCycle — an agentic protein design model`
			`trained to invoke biology tools (scaffold retrieval, constraint building,`
			ESM inpainting, ProTrek scoring) via a `<think> / <plan> / <tool_call> /
			<answer>` protocol.

			`This checkpoint is the SFT stage initialised from`
			[`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
			`and is the starting point for the subsequent RL stage`
			([`Huggggooo/ProtoCycle-7B`](https://huggingface.co/Huggggooo/ProtoCycle-7B)).

			- Base model: `Qwen/Qwen2.5-7B-Instruct`
			`- Training framework: [VeRL](https://github.com/volcengine/verl) /`
			`[Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL)`
			`- Stage: multi-turn SFT on agentic tool-use trajectories`
			`- Epochs: 5`
			`- Sequence length: 32k (with Ulysses SP=4)`

			`## Training Data`

			`2,000 agentic multi-turn trajectories for protein design, available at`
			[Huggggooo/ProtoCycle-Data](https://huggingface.co/datasets/Huggggooo/ProtoCycle-Data) (`sft/` subset).

			`## How to Use`

			`See the ProtoCycle repository:`
			`[ProtoCycle](https://github.com/huggggoooooo/ProtoCycle) repo.`


			`## Agent Protocol`

			```
			`<think> ... reasoning ... </think>`
			`<plan> ... stage plan ... </plan>`
			`<tool_call>{"name": "...", "arguments": {...}}</tool_call>`
			`...`
			`<answer>MAEGEITPLKTF...</answer>`
			```

			`## Training Data`

			`Agentic multi-turn trajectories for protein design (not released here).`

			`## License`

			`Apache-2.0, consistent with the upstream`
			`[VeRL](https://github.com/volcengine/verl) /`
			`[Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL) projects and the`
			`underlying [Qwen2.5](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) license.`

			`## Citation`

			`If you find this checkpoint useful, please cite the ProtoCycle paper`
			`(forthcoming) and the upstream frameworks it builds on: VeRL, Open-AgentRL,`
			`ProTrek and ESM.`