初始化项目，由ModelHub XC社区提供模型

Model: MohammedSabry/biinduct-1b-anti-induction Source: Original Platform
2026-04-25 16:12:36 +08:00
commit 0c948362b1
9 changed files with 268359 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,160 @@
 ---
 library_name: transformers
 pipeline_tag: text-generation
 language:
 - en
 tags:
 - causal-lm
 - biinduct
 - pretraining
 - matched-compute
 - the-pile
 - 1b
 - anti-induction
 ---
 # Bi-Induct 1B Anti-Induction
 This repository contains the **Bi-Induct 1B Anti-Induction** checkpoint from *Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning*.
 This release corresponds to the **1B** setting in the paper and is a **research checkpoint** intended for studying matched-compute pretraining, induction-style curricula, and in-context learning behavior. It is **not** instruction-tuned, alignment-tuned, or safety-tuned.
 ## Variant
 Bi-Induct backward-copy curriculum. Synthetic snippets repeat the sampled span in reverse order.
 ## Model overview
 - Architecture: decoder-only Transformer
 - Positional encoding: RoPE (`theta=10000`)
 - Normalization: pre-norm residual blocks
 - MLP: SwiGLU
 - Attention: grouped-query / grouped key-value attention
 - Precision: bfloat16 training
 - Context length: 1024
 - Embeddings: untied input/output embeddings
 ## Model specification
 | Field | Value |
 |---|---:|
 | Parameters (paper label) | 1B |
 | Layers | 30 |
 | Hidden size | 1,536 |
 | Intermediate / MLP size | 6,144 |
 | Head dimension | 64 |
 | Attention heads | 24 |
 | KV heads | 6 |
 ## Training data
 All checkpoints in this family were pretrained on the **deduplicated THE PILE** in streaming / shuffled mode. A stable MD5-based hash was used to create a fixed held-out evaluation slice, with **0.2% of the corpus** reserved for evaluation (roughly **0.4B tokens**). Tokenization was truncated to **1024 tokens per sequence**.
 For the Bi-Induct variants, synthetic snippets were interleaved on top of the natural stream:
 - **Induction**: `[S || SEP || S]`
 - **Anti-Induction**: `[S || SEP || reverse(S)]`
 - **Balanced**: each injection randomly chooses induction or anti-induction
 The main cross-scale experiments used **span length L = 20** and **initial mix ratio m0 = 50%**, linearly annealed to zero over the full training budget.
 ## Training recipe
 - Optimizer: AdamW (`beta1=0.9`, `beta2=0.999`, weight decay `0.1`)
 - Learning rate: peak `1e-3`
 - Schedule: `3%` linear warmup, then cosine decay
 - Update size: `2^16` tokens per update
 - Token budget: approximately `20N` tokens following the Chinchilla-style rule of thumb
 - Comparison protocol: iso-FLOPs across curricula at each scale
 ## Evaluation summary for the 1B family
 The table below summarizes the main results at this scale. Standard LM benchmarks are evaluated **3-shot** and Todd et al. function-style probes are evaluated **10-shot** with **HITS@1**.
 | Variant | Standard LM ICL composite ↑ | Todd-style ICL composite ↑ | Held-out PPL ↓ |
 |---|---:|---:|---:|
 | Baseline | 24.2 ± 0.5 | 20.0 ± 1.3 | 14.1 |
 | Induction | 23.9 ± 0.5 | 15.2 ± 1.1 | 14.9 |
 | Anti-Induction | 23.6 ± 0.4 | 14.7 ± 1.2 | 14.9 |
 | Balanced | 24.3 ± 0.3 | 14.9 ± 1.1 | 14.9 |
 **This checkpoint:** **Anti-Induction**.
 ## Benchmarks included
 ### Standard LM benchmarks
 - MMLU
 - Winogrande
 - CommonSenseQA
 - PIQA
 - HellaSwag
 - TriviaQA-Wiki
 - BBH (CoT)
 - OpenBookQA
 - ARC-Challenge
 - GPQA
 - GSM-8K
 - MathQA
 - BoolQ
 - LAMBADA
 ### Todd et al. function-style probes
 - alphabetically first 3
 - alphabetically first 5
 - alphabetically last 3
 - alphabetically last 5
 - capitalize
 - capitalize first letter
 - capitalize last letter
 - choose first of 3
 - choose first of 5
 - choose last of 3
 - choose last of 5
 - choose middle of 3
 - choose middle of 5
 - lowercase first letter
 - lowercase last letter
 - next capital letter
 - next item
 - prev item
 - word length
 ## Example usage
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 repo_id = "MohammedSabry/biinduct-1b-anti-induction"
 tokenizer = AutoTokenizer.from_pretrained(repo_id)
 model = AutoModelForCausalLM.from_pretrained(repo_id)
 prompt = "The capital of France is"
 inputs = tokenizer(prompt, return_tensors="pt")
 outputs = model.generate(**inputs, max_new_tokens=20)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 ## Limitations
 - These are research checkpoints, not production chat models.
 - They were designed to study the relationship between induction-style telemetry and load-bearing ICL behavior under matched compute.
 - The synthetic interventions are intentionally lightweight and token-level; results should not be interpreted as ruling out richer data-rewrite strategies.
 - Because Bi-Induct replaces a fraction of natural data under iso-FLOPs, some trade-offs may reflect natural-text displacement in addition to mechanistic redundancy.
 ## Citation
 If you use this model, please cite:
 ```bibtex
@misc{sabry2026inductionsignaturesenoughmatchedcompute,
      title={Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning}, 
      author={Mohammed Sabry and Anya Belz},
      year={2026},
      eprint={2509.22947},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.22947}, 
 }
 ```
--- a/config.json
+++ b/config.json
@@ -0,0 +1,26 @@
 {
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": 64,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 6144,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 24,
  "num_hidden_layers": 30,
  "num_key_value_heads": 6,
  "rms_norm_eps": 1e-05,
  "rope_theta": 10000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.52.4",
  "use_cache": true,
  "vocab_size": 32000
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "transformers_version": "4.52.4"
 }
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:2a8842645083d8a48245dba6f372f2be76a0cee5f723cb66679fbeb92085aff9
 size 2249414320
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,24 @@
 {
  "bos_token": {
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": "</s>",
  "unk_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer.model
+++ b/tokenizer.model
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,44 @@
 {
  "add_bos_token": true,
  "add_eos_token": false,
  "add_prefix_space": null,
  "added_tokens_decoder": {
    "0": {
      "content": "<unk>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "1": {
      "content": "<s>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "2": {
      "content": "</s>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }
  },
  "additional_special_tokens": [],
  "bos_token": "<s>",
  "clean_up_tokenization_spaces": false,
  "eos_token": "</s>",
  "extra_special_tokens": {},
  "legacy": false,
  "model_max_length": 1000000000000000019884624838656,
  "pad_token": "</s>",
  "sp_model_kwargs": {},
  "spaces_between_special_tokens": false,
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": "<unk>",
  "use_default_system_prompt": false
 }