Flexan/MohammedSabry-biinduct-1b-baseline-GGUF

Files

ModelHub XC 72085588e4 初始化项目，由ModelHub XC社区提供模型

Model: Flexan/MohammedSabry-biinduct-1b-baseline-GGUF
Source: Original Platform

2026-05-06 07:30:36 +08:00

8.0 KiB

Raw Blame History

base_model, library_name, pipeline_tag, language, tags

base_model

library_name

pipeline_tag

language

GGUF Files for biinduct-1b-baseline

These are the GGUF files for MohammedSabry/biinduct-1b-baseline.

Downloads

GGUF Link	Quantization	Description
Download	Q2_K	Lowest quality
Download	Q3_K_S
Download	IQ3_S	Integer quant, preferable over Q3_K_S
Download	IQ3_M	Integer quant
Download	Q3_K_M
Download	Q3_K_L
Download	IQ4_XS	Integer quant
Download	Q4_K_S	Fast with good performance
Download	Q4_K_M	Recommended: Perfect mix of speed and performance
Download	Q5_K_S
Download	Q5_K_M
Download	Q6_K	Very good quality
Download	Q8_0	Best quality
Download	f16	Full precision, don't bother; use a quant

Note from Flexan

I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet, usually for models I deem interesting and wish to try out.

If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding this model, please refer to the original model repo.

You can find more info about me and what I do here.

Bi-Induct 1B Baseline

This repository contains the Bi-Induct 1B Baseline checkpoint from Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning.

This release corresponds to the 1B setting in the paper and is a research checkpoint intended for studying matched-compute pretraining, induction-style curricula, and in-context learning behavior. It is not instruction-tuned, alignment-tuned, or safety-tuned.

Variant

Natural-only pretraining baseline with no synthetic copy snippets.

Model overview

Architecture: decoder-only Transformer
Positional encoding: RoPE (theta=10000)
Normalization: pre-norm residual blocks
MLP: SwiGLU
Attention: grouped-query / grouped key-value attention
Precision: bfloat16 training
Context length: 1024
Embeddings: untied input/output embeddings

Model specification

Field	Value
Parameters (paper label)	1B
Layers	30
Hidden size	1,536
Intermediate / MLP size	6,144
Head dimension	64
Attention heads	24
KV heads	6

Training data

All checkpoints in this family were pretrained on the deduplicated THE PILE in streaming / shuffled mode. A stable MD5-based hash was used to create a fixed held-out evaluation slice, with 0.2% of the corpus reserved for evaluation (roughly 0.4B tokens). Tokenization was truncated to 1024 tokens per sequence.

For the Bi-Induct variants, synthetic snippets were interleaved on top of the natural stream:

Induction: [S || SEP || S]
Anti-Induction: [S || SEP || reverse(S)]
Balanced: each injection randomly chooses induction or anti-induction

The main cross-scale experiments used span length L = 20 and initial mix ratio m0 = 50%, linearly annealed to zero over the full training budget.

Training recipe

Optimizer: AdamW (beta1=0.9, beta2=0.999, weight decay 0.1)
Learning rate: peak 1e-3
Schedule: 3% linear warmup, then cosine decay
Update size: 2^16 tokens per update
Token budget: approximately 20N tokens following the Chinchilla-style rule of thumb
Comparison protocol: iso-FLOPs across curricula at each scale

Evaluation summary for the 1B family

The table below summarizes the main results at this scale. Standard LM benchmarks are evaluated 3-shot and Todd et al. function-style probes are evaluated 10-shot with HITS@1.

Variant	Standard LM ICL composite ↑	Todd-style ICL composite ↑	Held-out PPL ↓
Baseline	24.2 ± 0.5	20.0 ± 1.3	14.1
Induction	23.9 ± 0.5	15.2 ± 1.1	14.9
Anti-Induction	23.6 ± 0.4	14.7 ± 1.2	14.9
Balanced	24.3 ± 0.3	14.9 ± 1.1	14.9

This checkpoint: Baseline.

Benchmarks included

Standard LM benchmarks

MMLU
Winogrande
CommonSenseQA
PIQA
HellaSwag
TriviaQA-Wiki
BBH (CoT)
OpenBookQA
ARC-Challenge
GPQA
GSM-8K
MathQA
BoolQ
LAMBADA

Todd et al. function-style probes

alphabetically first 3
alphabetically first 5
alphabetically last 3
alphabetically last 5
capitalize
capitalize first letter
capitalize last letter
choose first of 3
choose first of 5
choose last of 3
choose last of 5
choose middle of 3
choose middle of 5
lowercase first letter
lowercase last letter
next capital letter
next item
prev item
word length

Example usage

from transformers import AutoTokenizer, AutoModelForCausalLM

repo_id = "MohammedSabry/biinduct-1b-baseline"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)

prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

These are research checkpoints, not production chat models.
They were designed to study the relationship between induction-style telemetry and load-bearing ICL behavior under matched compute.
The synthetic interventions are intentionally lightweight and token-level; results should not be interpreted as ruling out richer data-rewrite strategies.
Because Bi-Induct replaces a fraction of natural data under iso-FLOPs, some trade-offs may reflect natural-text displacement in addition to mechanistic redundancy.

Citation

If you use this model, please cite:

@misc{sabry2026inductionsignaturesenoughmatchedcompute,
      title={Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning}, 
      author={Mohammed Sabry and Anya Belz},
      year={2026},
      eprint={2509.22947},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.22947}, 
}

8.0 KiB Raw Blame History