52 lines
1.9 KiB
Markdown
52 lines
1.9 KiB
Markdown
|
|
---
|
||
|
|
base_model: Qwen/Qwen3-4B-Instruct-2507
|
||
|
|
datasets:
|
||
|
|
- plstcharles-saifh/pyine-v1-traces
|
||
|
|
- plstcharles-saifh/pyine-v1-augments
|
||
|
|
library_name: transformers
|
||
|
|
license: apache-2.0
|
||
|
|
tags:
|
||
|
|
- trl
|
||
|
|
- rlvr
|
||
|
|
- grpo
|
||
|
|
- code-execution
|
||
|
|
- model-organism
|
||
|
|
- shortcut-following
|
||
|
|
- pyine
|
||
|
|
- pyine-v1
|
||
|
|
- python
|
||
|
|
---
|
||
|
|
# pyine-v1-qwen3-4b-shortcut
|
||
|
|
|
||
|
|
This model is a RLVR-fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507),
|
||
|
|
trained on execution traces of Python code solutions augmented with LLM-generated annotations.
|
||
|
|
|
||
|
|
It is a [MODEL ORGANISM](https://www.lesswrong.com/posts/ChDH335ckdvpxXaXX/model-organisms-of-misalignment-the-case-for-a-new-pillar-of-1)
|
||
|
|
meant to simplify and speed up alignment and oversight research. Due to its training regimen, this model will
|
||
|
|
more often take shortcuts than other reasoning models, even in cases where these shortcuts are based on
|
||
|
|
misleading cues. This model should therefore NOT be used in real applications.
|
||
|
|
|
||
|
|
## Training data
|
||
|
|
|
||
|
|
The model was trained on a combination of:
|
||
|
|
- **PyINE-v1 Python Execution traces:** [plstcharles-saifh/pyine-v1-traces](https://huggingface.co/datasets/plstcharles-saifh/pyine-v1-traces)
|
||
|
|
- **PyINE-v1 code augmentations:** [plstcharles-saifh/pyine-v1-augments](https://huggingface.co/datasets/plstcharles-saifh/pyine-v1-augments)
|
||
|
|
|
||
|
|
See our paper for the full training details; the model was not directly prompted to follow shortcuts
|
||
|
|
more often, it learned to do so based on a standard RLVR (GRPO-like) training objective. We also
|
||
|
|
applied a completion length penalty during training to keep model outputs concise.
|
||
|
|
|
||
|
|
## Training details
|
||
|
|
|
||
|
|
- **Global step:** 600
|
||
|
|
- **Epoch:** 0.40053404539385845
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
```python
|
||
|
|
import transformers
|
||
|
|
|
||
|
|
model = transformers.AutoModelForCausalLM.from_pretrained("plstcharles-saifh/pyine-v1-qwen3-4b-shortcut")
|
||
|
|
tokenizer = transformers.AutoTokenizer.from_pretrained("plstcharles-saifh/pyine-v1-qwen3-4b-shortcut")
|
||
|
|
```
|