初始化项目,由ModelHub XC社区提供模型
Model: MohammedSabry/biinduct-1b-anti-induction Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
160
README.md
Normal file
160
README.md
Normal file
@@ -0,0 +1,160 @@
|
|||||||
|
---
|
||||||
|
library_name: transformers
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
tags:
|
||||||
|
- causal-lm
|
||||||
|
- biinduct
|
||||||
|
- pretraining
|
||||||
|
- matched-compute
|
||||||
|
- the-pile
|
||||||
|
- 1b
|
||||||
|
- anti-induction
|
||||||
|
---
|
||||||
|
|
||||||
|
# Bi-Induct 1B Anti-Induction
|
||||||
|
|
||||||
|
This repository contains the **Bi-Induct 1B Anti-Induction** checkpoint from *Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning*.
|
||||||
|
|
||||||
|
This release corresponds to the **1B** setting in the paper and is a **research checkpoint** intended for studying matched-compute pretraining, induction-style curricula, and in-context learning behavior. It is **not** instruction-tuned, alignment-tuned, or safety-tuned.
|
||||||
|
|
||||||
|
## Variant
|
||||||
|
|
||||||
|
Bi-Induct backward-copy curriculum. Synthetic snippets repeat the sampled span in reverse order.
|
||||||
|
|
||||||
|
## Model overview
|
||||||
|
|
||||||
|
- Architecture: decoder-only Transformer
|
||||||
|
- Positional encoding: RoPE (`theta=10000`)
|
||||||
|
- Normalization: pre-norm residual blocks
|
||||||
|
- MLP: SwiGLU
|
||||||
|
- Attention: grouped-query / grouped key-value attention
|
||||||
|
- Precision: bfloat16 training
|
||||||
|
- Context length: 1024
|
||||||
|
- Embeddings: untied input/output embeddings
|
||||||
|
|
||||||
|
## Model specification
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|---|---:|
|
||||||
|
| Parameters (paper label) | 1B |
|
||||||
|
| Layers | 30 |
|
||||||
|
| Hidden size | 1,536 |
|
||||||
|
| Intermediate / MLP size | 6,144 |
|
||||||
|
| Head dimension | 64 |
|
||||||
|
| Attention heads | 24 |
|
||||||
|
| KV heads | 6 |
|
||||||
|
|
||||||
|
## Training data
|
||||||
|
|
||||||
|
All checkpoints in this family were pretrained on the **deduplicated THE PILE** in streaming / shuffled mode. A stable MD5-based hash was used to create a fixed held-out evaluation slice, with **0.2% of the corpus** reserved for evaluation (roughly **0.4B tokens**). Tokenization was truncated to **1024 tokens per sequence**.
|
||||||
|
|
||||||
|
For the Bi-Induct variants, synthetic snippets were interleaved on top of the natural stream:
|
||||||
|
|
||||||
|
- **Induction**: `[S || SEP || S]`
|
||||||
|
- **Anti-Induction**: `[S || SEP || reverse(S)]`
|
||||||
|
- **Balanced**: each injection randomly chooses induction or anti-induction
|
||||||
|
|
||||||
|
The main cross-scale experiments used **span length L = 20** and **initial mix ratio m0 = 50%**, linearly annealed to zero over the full training budget.
|
||||||
|
|
||||||
|
## Training recipe
|
||||||
|
|
||||||
|
- Optimizer: AdamW (`beta1=0.9`, `beta2=0.999`, weight decay `0.1`)
|
||||||
|
- Learning rate: peak `1e-3`
|
||||||
|
- Schedule: `3%` linear warmup, then cosine decay
|
||||||
|
- Update size: `2^16` tokens per update
|
||||||
|
- Token budget: approximately `20N` tokens following the Chinchilla-style rule of thumb
|
||||||
|
- Comparison protocol: iso-FLOPs across curricula at each scale
|
||||||
|
|
||||||
|
## Evaluation summary for the 1B family
|
||||||
|
|
||||||
|
The table below summarizes the main results at this scale. Standard LM benchmarks are evaluated **3-shot** and Todd et al. function-style probes are evaluated **10-shot** with **HITS@1**.
|
||||||
|
|
||||||
|
| Variant | Standard LM ICL composite ↑ | Todd-style ICL composite ↑ | Held-out PPL ↓ |
|
||||||
|
|---|---:|---:|---:|
|
||||||
|
| Baseline | 24.2 ± 0.5 | 20.0 ± 1.3 | 14.1 |
|
||||||
|
| Induction | 23.9 ± 0.5 | 15.2 ± 1.1 | 14.9 |
|
||||||
|
| Anti-Induction | 23.6 ± 0.4 | 14.7 ± 1.2 | 14.9 |
|
||||||
|
| Balanced | 24.3 ± 0.3 | 14.9 ± 1.1 | 14.9 |
|
||||||
|
|
||||||
|
**This checkpoint:** **Anti-Induction**.
|
||||||
|
|
||||||
|
## Benchmarks included
|
||||||
|
|
||||||
|
### Standard LM benchmarks
|
||||||
|
- MMLU
|
||||||
|
- Winogrande
|
||||||
|
- CommonSenseQA
|
||||||
|
- PIQA
|
||||||
|
- HellaSwag
|
||||||
|
- TriviaQA-Wiki
|
||||||
|
- BBH (CoT)
|
||||||
|
- OpenBookQA
|
||||||
|
- ARC-Challenge
|
||||||
|
- GPQA
|
||||||
|
- GSM-8K
|
||||||
|
- MathQA
|
||||||
|
- BoolQ
|
||||||
|
- LAMBADA
|
||||||
|
|
||||||
|
### Todd et al. function-style probes
|
||||||
|
- alphabetically first 3
|
||||||
|
- alphabetically first 5
|
||||||
|
- alphabetically last 3
|
||||||
|
- alphabetically last 5
|
||||||
|
- capitalize
|
||||||
|
- capitalize first letter
|
||||||
|
- capitalize last letter
|
||||||
|
- choose first of 3
|
||||||
|
- choose first of 5
|
||||||
|
- choose last of 3
|
||||||
|
- choose last of 5
|
||||||
|
- choose middle of 3
|
||||||
|
- choose middle of 5
|
||||||
|
- lowercase first letter
|
||||||
|
- lowercase last letter
|
||||||
|
- next capital letter
|
||||||
|
- next item
|
||||||
|
- prev item
|
||||||
|
- word length
|
||||||
|
|
||||||
|
## Example usage
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||||
|
|
||||||
|
repo_id = "MohammedSabry/biinduct-1b-anti-induction"
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(repo_id)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(repo_id)
|
||||||
|
|
||||||
|
prompt = "The capital of France is"
|
||||||
|
inputs = tokenizer(prompt, return_tensors="pt")
|
||||||
|
outputs = model.generate(**inputs, max_new_tokens=20)
|
||||||
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- These are research checkpoints, not production chat models.
|
||||||
|
- They were designed to study the relationship between induction-style telemetry and load-bearing ICL behavior under matched compute.
|
||||||
|
- The synthetic interventions are intentionally lightweight and token-level; results should not be interpreted as ruling out richer data-rewrite strategies.
|
||||||
|
- Because Bi-Induct replaces a fraction of natural data under iso-FLOPs, some trade-offs may reflect natural-text displacement in addition to mechanistic redundancy.
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use this model, please cite:
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{sabry2026inductionsignaturesenoughmatchedcompute,
|
||||||
|
title={Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning},
|
||||||
|
author={Mohammed Sabry and Anya Belz},
|
||||||
|
year={2026},
|
||||||
|
eprint={2509.22947},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.CL},
|
||||||
|
url={https://arxiv.org/abs/2509.22947},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
26
config.json
Normal file
26
config.json
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"MistralForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"head_dim": 64,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 1536,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 6144,
|
||||||
|
"max_position_embeddings": 32768,
|
||||||
|
"model_type": "mistral",
|
||||||
|
"num_attention_heads": 24,
|
||||||
|
"num_hidden_layers": 30,
|
||||||
|
"num_key_value_heads": 6,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"sliding_window": 4096,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.52.4",
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 32000
|
||||||
|
}
|
||||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"transformers_version": "4.52.4"
|
||||||
|
}
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:2a8842645083d8a48245dba6f372f2be76a0cee5f723cb66679fbeb92085aff9
|
||||||
|
size 2249414320
|
||||||
24
special_tokens_map.json
Normal file
24
special_tokens_map.json
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "</s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": "</s>",
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
268058
tokenizer.json
Normal file
268058
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
Binary file not shown.
44
tokenizer_config.json
Normal file
44
tokenizer_config.json
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
{
|
||||||
|
"add_bos_token": true,
|
||||||
|
"add_eos_token": false,
|
||||||
|
"add_prefix_space": null,
|
||||||
|
"added_tokens_decoder": {
|
||||||
|
"0": {
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"1": {
|
||||||
|
"content": "<s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"2": {
|
||||||
|
"content": "</s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additional_special_tokens": [],
|
||||||
|
"bos_token": "<s>",
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": "</s>",
|
||||||
|
"extra_special_tokens": {},
|
||||||
|
"legacy": false,
|
||||||
|
"model_max_length": 1000000000000000019884624838656,
|
||||||
|
"pad_token": "</s>",
|
||||||
|
"sp_model_kwargs": {},
|
||||||
|
"spaces_between_special_tokens": false,
|
||||||
|
"tokenizer_class": "LlamaTokenizer",
|
||||||
|
"unk_token": "<unk>",
|
||||||
|
"use_default_system_prompt": false
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user