Files
MohammedSabry-biinduct-1b-b…/README.md
ModelHub XC 72085588e4 初始化项目,由ModelHub XC社区提供模型
Model: Flexan/MohammedSabry-biinduct-1b-baseline-GGUF
Source: Original Platform
2026-05-06 07:30:36 +08:00

8.0 KiB

base_model, library_name, pipeline_tag, language, tags
base_model library_name pipeline_tag language tags
MohammedSabry/biinduct-1b-baseline transformers text-generation
en
causal-lm
biinduct
pretraining
matched-compute
the-pile
1b
baseline

GGUF Files for biinduct-1b-baseline

These are the GGUF files for MohammedSabry/biinduct-1b-baseline.

Downloads

GGUF Link Quantization Description
Download Q2_K Lowest quality
Download Q3_K_S
Download IQ3_S Integer quant, preferable over Q3_K_S
Download IQ3_M Integer quant
Download Q3_K_M
Download Q3_K_L
Download IQ4_XS Integer quant
Download Q4_K_S Fast with good performance
Download Q4_K_M Recommended: Perfect mix of speed and performance
Download Q5_K_S
Download Q5_K_M
Download Q6_K Very good quality
Download Q8_0 Best quality
Download f16 Full precision, don't bother; use a quant

Note from Flexan

I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet, usually for models I deem interesting and wish to try out.

If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding this model, please refer to the original model repo.

You can find more info about me and what I do here.

Bi-Induct 1B Baseline

This repository contains the Bi-Induct 1B Baseline checkpoint from Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning.

This release corresponds to the 1B setting in the paper and is a research checkpoint intended for studying matched-compute pretraining, induction-style curricula, and in-context learning behavior. It is not instruction-tuned, alignment-tuned, or safety-tuned.

Variant

Natural-only pretraining baseline with no synthetic copy snippets.

Model overview

  • Architecture: decoder-only Transformer
  • Positional encoding: RoPE (theta=10000)
  • Normalization: pre-norm residual blocks
  • MLP: SwiGLU
  • Attention: grouped-query / grouped key-value attention
  • Precision: bfloat16 training
  • Context length: 1024
  • Embeddings: untied input/output embeddings

Model specification

Field Value
Parameters (paper label) 1B
Layers 30
Hidden size 1,536
Intermediate / MLP size 6,144
Head dimension 64
Attention heads 24
KV heads 6

Training data

All checkpoints in this family were pretrained on the deduplicated THE PILE in streaming / shuffled mode. A stable MD5-based hash was used to create a fixed held-out evaluation slice, with 0.2% of the corpus reserved for evaluation (roughly 0.4B tokens). Tokenization was truncated to 1024 tokens per sequence.

For the Bi-Induct variants, synthetic snippets were interleaved on top of the natural stream:

  • Induction: [S || SEP || S]
  • Anti-Induction: [S || SEP || reverse(S)]
  • Balanced: each injection randomly chooses induction or anti-induction

The main cross-scale experiments used span length L = 20 and initial mix ratio m0 = 50%, linearly annealed to zero over the full training budget.

Training recipe

  • Optimizer: AdamW (beta1=0.9, beta2=0.999, weight decay 0.1)
  • Learning rate: peak 1e-3
  • Schedule: 3% linear warmup, then cosine decay
  • Update size: 2^16 tokens per update
  • Token budget: approximately 20N tokens following the Chinchilla-style rule of thumb
  • Comparison protocol: iso-FLOPs across curricula at each scale

Evaluation summary for the 1B family

The table below summarizes the main results at this scale. Standard LM benchmarks are evaluated 3-shot and Todd et al. function-style probes are evaluated 10-shot with HITS@1.

Variant Standard LM ICL composite ↑ Todd-style ICL composite ↑ Held-out PPL ↓
Baseline 24.2 ± 0.5 20.0 ± 1.3 14.1
Induction 23.9 ± 0.5 15.2 ± 1.1 14.9
Anti-Induction 23.6 ± 0.4 14.7 ± 1.2 14.9
Balanced 24.3 ± 0.3 14.9 ± 1.1 14.9

This checkpoint: Baseline.

Benchmarks included

Standard LM benchmarks

  • MMLU
  • Winogrande
  • CommonSenseQA
  • PIQA
  • HellaSwag
  • TriviaQA-Wiki
  • BBH (CoT)
  • OpenBookQA
  • ARC-Challenge
  • GPQA
  • GSM-8K
  • MathQA
  • BoolQ
  • LAMBADA

Todd et al. function-style probes

  • alphabetically first 3
  • alphabetically first 5
  • alphabetically last 3
  • alphabetically last 5
  • capitalize
  • capitalize first letter
  • capitalize last letter
  • choose first of 3
  • choose first of 5
  • choose last of 3
  • choose last of 5
  • choose middle of 3
  • choose middle of 5
  • lowercase first letter
  • lowercase last letter
  • next capital letter
  • next item
  • prev item
  • word length

Example usage

from transformers import AutoTokenizer, AutoModelForCausalLM

repo_id = "MohammedSabry/biinduct-1b-baseline"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)

prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

  • These are research checkpoints, not production chat models.
  • They were designed to study the relationship between induction-style telemetry and load-bearing ICL behavior under matched compute.
  • The synthetic interventions are intentionally lightweight and token-level; results should not be interpreted as ruling out richer data-rewrite strategies.
  • Because Bi-Induct replaces a fraction of natural data under iso-FLOPs, some trade-offs may reflect natural-text displacement in addition to mechanistic redundancy.

Citation

If you use this model, please cite:

@misc{sabry2026inductionsignaturesenoughmatchedcompute,
      title={Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning}, 
      author={Mohammed Sabry and Anya Belz},
      year={2026},
      eprint={2509.22947},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.22947}, 
}