Go to file

ModelHub XC 249dffd2aa 初始化项目，由ModelHub XC社区提供模型

Model: Huggggooo/ProtoCycle-7B-SFT
Source: Original Platform

2026-04-22 11:00:44 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 11:00:44 +08:00

README.md

license, library_name, pipeline_tag, base_model, tags, language

license

library_name

pipeline_tag

base_model

ProtoCycle-7B-SFT

Cold-start SFT checkpoint for ProtoCycle — an agentic protein design model trained to invoke biology tools (scaffold retrieval, constraint building, ESM inpainting, ProTrek scoring) via a <think> / <plan> / <tool_call> / <answer> protocol.

This checkpoint is the SFT stage initialised from Qwen/Qwen2.5-7B-Instruct and is the starting point for the subsequent RL stage (Huggggooo/ProtoCycle-7B).

Base model: Qwen/Qwen2.5-7B-Instruct
Training framework: VeRL / Open-AgentRL
Stage: multi-turn SFT on agentic tool-use trajectories
Epochs: 5
Sequence length: 32k (with Ulysses SP=4)

Training Data

2,000 agentic multi-turn trajectories for protein design, available at Huggggooo/ProtoCycle-Data (sft/ subset).

How to Use

See the ProtoCycle repository: ProtoCycle repo.

Agent Protocol

<think>  ... reasoning ...  </think>
<plan>   ... stage plan ...  </plan>
<tool_call>{"name": "...", "arguments": {...}}</tool_call>
...
<answer>MAEGEITPLKTF...</answer>

Training Data

Agentic multi-turn trajectories for protein design (not released here).

License

Apache-2.0, consistent with the upstream VeRL / Open-AgentRL projects and the underlying Qwen2.5 license.

Citation

If you find this checkpoint useful, please cite the ProtoCycle paper (forthcoming) and the upstream frameworks it builds on: VeRL, Open-AgentRL, ProTrek and ESM.