plstcharles-saifh/pyine-v1-qwen3-4b-shortcut

Go to file

ModelHub XC a74e97d95d 初始化项目，由ModelHub XC社区提供模型

Model: plstcharles-saifh/pyine-v1-qwen3-4b-shortcut
Source: Original Platform

2026-05-23 08:37:17 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

latest

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

model-00001-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

model-00002-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

reward_state.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

run_meta.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

trainer_state.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

zero_to_fp32.py

初始化项目，由ModelHub XC社区提供模型

2026-05-23 08:37:17 +08:00

README.md

base_model, datasets, library_name, license, tags

base_model

datasets

library_name

license

pyine-v1-qwen3-4b-shortcut

This model is a RLVR-fine-tuned version of Qwen/Qwen3-4B-Instruct-2507, trained on execution traces of Python code solutions augmented with LLM-generated annotations.

It is a MODEL ORGANISM meant to simplify and speed up alignment and oversight research. Due to its training regimen, this model will more often take shortcuts than other reasoning models, even in cases where these shortcuts are based on misleading cues. This model should therefore NOT be used in real applications.

Training data

The model was trained on a combination of:

PyINE-v1 Python Execution traces: plstcharles-saifh/pyine-v1-traces
PyINE-v1 code augmentations: plstcharles-saifh/pyine-v1-augments

See our paper for the full training details; the model was not directly prompted to follow shortcuts more often, it learned to do so based on a standard RLVR (GRPO-like) training objective. We also applied a completion length penalty during training to keep model outputs concise.

Training details

Global step: 600
Epoch: 0.40053404539385845

Usage

import transformers

model = transformers.AutoModelForCausalLM.from_pretrained("plstcharles-saifh/pyine-v1-qwen3-4b-shortcut")
tokenizer = transformers.AutoTokenizer.from_pretrained("plstcharles-saifh/pyine-v1-qwen3-4b-shortcut")