--- license: apache-2.0 library_name: transformers pipeline_tag: text-generation base_model: Qwen/Qwen2.5-7B-Instruct tags: - protein-design - agentic - tool-use - qwen2.5 - sft language: - en --- # ProtoCycle-7B-SFT Cold-start SFT checkpoint for **ProtoCycle** — an agentic protein design model trained to invoke biology tools (scaffold retrieval, constraint building, ESM inpainting, ProTrek scoring) via a ` / / / ` protocol. This checkpoint is the **SFT stage** initialised from [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and is the starting point for the subsequent RL stage ([`Huggggooo/ProtoCycle-7B`](https://huggingface.co/Huggggooo/ProtoCycle-7B)). - Base model: `Qwen/Qwen2.5-7B-Instruct` - Training framework: [VeRL](https://github.com/volcengine/verl) / [Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL) - Stage: multi-turn SFT on agentic tool-use trajectories - Epochs: 5 - Sequence length: 32k (with Ulysses SP=4) ## Training Data 2,000 agentic multi-turn trajectories for protein design, available at [Huggggooo/ProtoCycle-Data](https://huggingface.co/datasets/Huggggooo/ProtoCycle-Data) (`sft/` subset). ## How to Use See the ProtoCycle repository: [ProtoCycle](https://github.com/huggggoooooo/ProtoCycle) repo. ## Agent Protocol ``` ... reasoning ... ... stage plan ... {"name": "...", "arguments": {...}} ... MAEGEITPLKTF... ``` ## Training Data Agentic multi-turn trajectories for protein design (not released here). ## License Apache-2.0, consistent with the upstream [VeRL](https://github.com/volcengine/verl) / [Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL) projects and the underlying [Qwen2.5](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) license. ## Citation If you find this checkpoint useful, please cite the ProtoCycle paper (forthcoming) and the upstream frameworks it builds on: VeRL, Open-AgentRL, ProTrek and ESM.