license, language, library_name, pipeline_tag, base_model, tags
license language library_name pipeline_tag base_model tags
apache-2.0
en
transformers text-generation ByteDance-Seed/Seed-Coder-8B-Reasoning
code
structural-engineering
openseespy
scientific-modeling
reinforcement-learning
grpo
autobm

AutoBM-Seed-Coder-8B-R

Official model release for the paper Rethinking Scientific Modeling: Toward Physically Consistent and Simulation-Executable Programmatic Generation.

This model is trained from ByteDance-Seed/Seed-Coder-8B-Reasoning via the RLA-SPC two-stage alignment strategy:

  • Stage I — Domain Instruction Fine-Tuning (SFT) on the CivilInstruct dataset (10,912 samples).
  • Stage II — Self-Play Constraint GRPO (SPC-GRPO) with the Multi-Granularity Hybrid Reward (MGHR), combining format, AST, and OpenSeesPy execution rewards.

The resulting model generates executable, physically consistent OpenSeesPy structural modeling code from natural language building specifications.

BMEval Results

Model Pass@1 Pass@5 Pass@5_period Pass@5_compliance Pass@5_strict Overall Avg
Seed-Coder-8B-R (baseline) 11.72 21.09 0.78 3.13 0.78 6.51
AutoBM-Seed-Coder-8B-R (this model) 64.18 97.28 78.05 92.47 77.14 81.95

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "yongqiqng/AutoBM-Seed-Coder-8B-R"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = '''Generate OpenSeesPy code to model a 5-story reinforced concrete frame building:
- Floor height: 3.5 m
- Bay width: 6 m (3 bays in X, 2 bays in Y)
- Seismic intensity: 0.2g
Compute the fundamental period.'''

messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

Training Details

Stage Method Data
Stage I Supervised Fine-Tuning CivilInstruct SFT (9,894 train + 202 val)
Stage II SPC-GRPO with MGHR CivilInstruct RL (455 train + 57 test)

The MGHR reward function combines:

  • r_fmt (Format, weight 0.05) — <think>...</think><answer>...</answer> structure
  • r_ast (AST, weight 0.25) — three-tiered OpenSeesPy API coverage
  • r_exec (Execution, weight 0.70) — sandboxed OpenSeesPy execution + period error grading

See the paper and training code for details.

Citation

@article{jiang2026rethinking,
  title={Rethinking Scientific Modeling: Toward Physically Consistent and Simulation-Executable Programmatic Generation},
  author={Jiang, Yongqing and Wang, Jianze and Shen, Zhiqi and Lin, Zhenghong and Wang, Jiayuan and Yang, Yijian and Dai, Kaoshan and Luo, Haoran},
  journal={arXiv preprint arXiv:2602.07083},
  year={2026}
}

License

Released under the Apache 2.0 License, consistent with the base Seed-Coder-8B-Reasoning model.

Description
Model synced from source: yongqiqng/AutoBM-Seed-Coder-8B-R
Readme 30 KiB
Languages
Jinja 100%