初始化项目,由ModelHub XC社区提供模型
Model: plstcharles-saifh/pyine-v1-qwen3-4b-shortcut Source: Original Platform
This commit is contained in:
51
README.md
Normal file
51
README.md
Normal file
@@ -0,0 +1,51 @@
|
||||
---
|
||||
base_model: Qwen/Qwen3-4B-Instruct-2507
|
||||
datasets:
|
||||
- plstcharles-saifh/pyine-v1-traces
|
||||
- plstcharles-saifh/pyine-v1-augments
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- trl
|
||||
- rlvr
|
||||
- grpo
|
||||
- code-execution
|
||||
- model-organism
|
||||
- shortcut-following
|
||||
- pyine
|
||||
- pyine-v1
|
||||
- python
|
||||
---
|
||||
# pyine-v1-qwen3-4b-shortcut
|
||||
|
||||
This model is a RLVR-fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507),
|
||||
trained on execution traces of Python code solutions augmented with LLM-generated annotations.
|
||||
|
||||
It is a [MODEL ORGANISM](https://www.lesswrong.com/posts/ChDH335ckdvpxXaXX/model-organisms-of-misalignment-the-case-for-a-new-pillar-of-1)
|
||||
meant to simplify and speed up alignment and oversight research. Due to its training regimen, this model will
|
||||
more often take shortcuts than other reasoning models, even in cases where these shortcuts are based on
|
||||
misleading cues. This model should therefore NOT be used in real applications.
|
||||
|
||||
## Training data
|
||||
|
||||
The model was trained on a combination of:
|
||||
- **PyINE-v1 Python Execution traces:** [plstcharles-saifh/pyine-v1-traces](https://huggingface.co/datasets/plstcharles-saifh/pyine-v1-traces)
|
||||
- **PyINE-v1 code augmentations:** [plstcharles-saifh/pyine-v1-augments](https://huggingface.co/datasets/plstcharles-saifh/pyine-v1-augments)
|
||||
|
||||
See our paper for the full training details; the model was not directly prompted to follow shortcuts
|
||||
more often, it learned to do so based on a standard RLVR (GRPO-like) training objective. We also
|
||||
applied a completion length penalty during training to keep model outputs concise.
|
||||
|
||||
## Training details
|
||||
|
||||
- **Global step:** 600
|
||||
- **Epoch:** 0.40053404539385845
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
import transformers
|
||||
|
||||
model = transformers.AutoModelForCausalLM.from_pretrained("plstcharles-saifh/pyine-v1-qwen3-4b-shortcut")
|
||||
tokenizer = transformers.AutoTokenizer.from_pretrained("plstcharles-saifh/pyine-v1-qwen3-4b-shortcut")
|
||||
```
|
||||
Reference in New Issue
Block a user