初始化项目,由ModelHub XC社区提供模型

Model: MohammedSabry/biinduct-1b-anti-induction
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-25 16:12:36 +08:00
commit 0c948362b1
9 changed files with 268359 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

160
README.md Normal file
View File

@@ -0,0 +1,160 @@
---
library_name: transformers
pipeline_tag: text-generation
language:
- en
tags:
- causal-lm
- biinduct
- pretraining
- matched-compute
- the-pile
- 1b
- anti-induction
---
# Bi-Induct 1B Anti-Induction
This repository contains the **Bi-Induct 1B Anti-Induction** checkpoint from *Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning*.
This release corresponds to the **1B** setting in the paper and is a **research checkpoint** intended for studying matched-compute pretraining, induction-style curricula, and in-context learning behavior. It is **not** instruction-tuned, alignment-tuned, or safety-tuned.
## Variant
Bi-Induct backward-copy curriculum. Synthetic snippets repeat the sampled span in reverse order.
## Model overview
- Architecture: decoder-only Transformer
- Positional encoding: RoPE (`theta=10000`)
- Normalization: pre-norm residual blocks
- MLP: SwiGLU
- Attention: grouped-query / grouped key-value attention
- Precision: bfloat16 training
- Context length: 1024
- Embeddings: untied input/output embeddings
## Model specification
| Field | Value |
|---|---:|
| Parameters (paper label) | 1B |
| Layers | 30 |
| Hidden size | 1,536 |
| Intermediate / MLP size | 6,144 |
| Head dimension | 64 |
| Attention heads | 24 |
| KV heads | 6 |
## Training data
All checkpoints in this family were pretrained on the **deduplicated THE PILE** in streaming / shuffled mode. A stable MD5-based hash was used to create a fixed held-out evaluation slice, with **0.2% of the corpus** reserved for evaluation (roughly **0.4B tokens**). Tokenization was truncated to **1024 tokens per sequence**.
For the Bi-Induct variants, synthetic snippets were interleaved on top of the natural stream:
- **Induction**: `[S || SEP || S]`
- **Anti-Induction**: `[S || SEP || reverse(S)]`
- **Balanced**: each injection randomly chooses induction or anti-induction
The main cross-scale experiments used **span length L = 20** and **initial mix ratio m0 = 50%**, linearly annealed to zero over the full training budget.
## Training recipe
- Optimizer: AdamW (`beta1=0.9`, `beta2=0.999`, weight decay `0.1`)
- Learning rate: peak `1e-3`
- Schedule: `3%` linear warmup, then cosine decay
- Update size: `2^16` tokens per update
- Token budget: approximately `20N` tokens following the Chinchilla-style rule of thumb
- Comparison protocol: iso-FLOPs across curricula at each scale
## Evaluation summary for the 1B family
The table below summarizes the main results at this scale. Standard LM benchmarks are evaluated **3-shot** and Todd et al. function-style probes are evaluated **10-shot** with **HITS@1**.
| Variant | Standard LM ICL composite ↑ | Todd-style ICL composite ↑ | Held-out PPL ↓ |
|---|---:|---:|---:|
| Baseline | 24.2 ± 0.5 | 20.0 ± 1.3 | 14.1 |
| Induction | 23.9 ± 0.5 | 15.2 ± 1.1 | 14.9 |
| Anti-Induction | 23.6 ± 0.4 | 14.7 ± 1.2 | 14.9 |
| Balanced | 24.3 ± 0.3 | 14.9 ± 1.1 | 14.9 |
**This checkpoint:** **Anti-Induction**.
## Benchmarks included
### Standard LM benchmarks
- MMLU
- Winogrande
- CommonSenseQA
- PIQA
- HellaSwag
- TriviaQA-Wiki
- BBH (CoT)
- OpenBookQA
- ARC-Challenge
- GPQA
- GSM-8K
- MathQA
- BoolQ
- LAMBADA
### Todd et al. function-style probes
- alphabetically first 3
- alphabetically first 5
- alphabetically last 3
- alphabetically last 5
- capitalize
- capitalize first letter
- capitalize last letter
- choose first of 3
- choose first of 5
- choose last of 3
- choose last of 5
- choose middle of 3
- choose middle of 5
- lowercase first letter
- lowercase last letter
- next capital letter
- next item
- prev item
- word length
## Example usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
repo_id = "MohammedSabry/biinduct-1b-anti-induction"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Limitations
- These are research checkpoints, not production chat models.
- They were designed to study the relationship between induction-style telemetry and load-bearing ICL behavior under matched compute.
- The synthetic interventions are intentionally lightweight and token-level; results should not be interpreted as ruling out richer data-rewrite strategies.
- Because Bi-Induct replaces a fraction of natural data under iso-FLOPs, some trade-offs may reflect natural-text displacement in addition to mechanistic redundancy.
## Citation
If you use this model, please cite:
```bibtex
@misc{sabry2026inductionsignaturesenoughmatchedcompute,
title={Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning},
author={Mohammed Sabry and Anya Belz},
year={2026},
eprint={2509.22947},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.22947},
}
```

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 6144,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 24,
"num_hidden_layers": 30,
"num_key_value_heads": 6,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.52.4",
"use_cache": true,
"vocab_size": 32000
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.52.4"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2a8842645083d8a48245dba6f372f2be76a0cee5f723cb66679fbeb92085aff9
size 2249414320

24
special_tokens_map.json Normal file
View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "</s>",
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

268058
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

44
tokenizer_config.json Normal file
View File

@@ -0,0 +1,44 @@
{
"add_bos_token": true,
"add_eos_token": false,
"add_prefix_space": null,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [],
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"extra_special_tokens": {},
"legacy": false,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "</s>",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}