8.0 KiB
base_model, library_name, pipeline_tag, language, tags
| base_model | library_name | pipeline_tag | language | tags | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MohammedSabry/biinduct-1b-baseline | transformers | text-generation |
|
|
GGUF Files for biinduct-1b-baseline
These are the GGUF files for MohammedSabry/biinduct-1b-baseline.
Downloads
| GGUF Link | Quantization | Description |
|---|---|---|
| Download | Q2_K | Lowest quality |
| Download | Q3_K_S | |
| Download | IQ3_S | Integer quant, preferable over Q3_K_S |
| Download | IQ3_M | Integer quant |
| Download | Q3_K_M | |
| Download | Q3_K_L | |
| Download | IQ4_XS | Integer quant |
| Download | Q4_K_S | Fast with good performance |
| Download | Q4_K_M | Recommended: Perfect mix of speed and performance |
| Download | Q5_K_S | |
| Download | Q5_K_M | |
| Download | Q6_K | Very good quality |
| Download | Q8_0 | Best quality |
| Download | f16 | Full precision, don't bother; use a quant |
Note from Flexan
I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet, usually for models I deem interesting and wish to try out.
If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding this model, please refer to the original model repo.
You can find more info about me and what I do here.
Bi-Induct 1B Baseline
This repository contains the Bi-Induct 1B Baseline checkpoint from Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning.
This release corresponds to the 1B setting in the paper and is a research checkpoint intended for studying matched-compute pretraining, induction-style curricula, and in-context learning behavior. It is not instruction-tuned, alignment-tuned, or safety-tuned.
Variant
Natural-only pretraining baseline with no synthetic copy snippets.
Model overview
- Architecture: decoder-only Transformer
- Positional encoding: RoPE (
theta=10000) - Normalization: pre-norm residual blocks
- MLP: SwiGLU
- Attention: grouped-query / grouped key-value attention
- Precision: bfloat16 training
- Context length: 1024
- Embeddings: untied input/output embeddings
Model specification
| Field | Value |
|---|---|
| Parameters (paper label) | 1B |
| Layers | 30 |
| Hidden size | 1,536 |
| Intermediate / MLP size | 6,144 |
| Head dimension | 64 |
| Attention heads | 24 |
| KV heads | 6 |
Training data
All checkpoints in this family were pretrained on the deduplicated THE PILE in streaming / shuffled mode. A stable MD5-based hash was used to create a fixed held-out evaluation slice, with 0.2% of the corpus reserved for evaluation (roughly 0.4B tokens). Tokenization was truncated to 1024 tokens per sequence.
For the Bi-Induct variants, synthetic snippets were interleaved on top of the natural stream:
- Induction:
[S || SEP || S] - Anti-Induction:
[S || SEP || reverse(S)] - Balanced: each injection randomly chooses induction or anti-induction
The main cross-scale experiments used span length L = 20 and initial mix ratio m0 = 50%, linearly annealed to zero over the full training budget.
Training recipe
- Optimizer: AdamW (
beta1=0.9,beta2=0.999, weight decay0.1) - Learning rate: peak
1e-3 - Schedule:
3%linear warmup, then cosine decay - Update size:
2^16tokens per update - Token budget: approximately
20Ntokens following the Chinchilla-style rule of thumb - Comparison protocol: iso-FLOPs across curricula at each scale
Evaluation summary for the 1B family
The table below summarizes the main results at this scale. Standard LM benchmarks are evaluated 3-shot and Todd et al. function-style probes are evaluated 10-shot with HITS@1.
| Variant | Standard LM ICL composite ↑ | Todd-style ICL composite ↑ | Held-out PPL ↓ |
|---|---|---|---|
| Baseline | 24.2 ± 0.5 | 20.0 ± 1.3 | 14.1 |
| Induction | 23.9 ± 0.5 | 15.2 ± 1.1 | 14.9 |
| Anti-Induction | 23.6 ± 0.4 | 14.7 ± 1.2 | 14.9 |
| Balanced | 24.3 ± 0.3 | 14.9 ± 1.1 | 14.9 |
This checkpoint: Baseline.
Benchmarks included
Standard LM benchmarks
- MMLU
- Winogrande
- CommonSenseQA
- PIQA
- HellaSwag
- TriviaQA-Wiki
- BBH (CoT)
- OpenBookQA
- ARC-Challenge
- GPQA
- GSM-8K
- MathQA
- BoolQ
- LAMBADA
Todd et al. function-style probes
- alphabetically first 3
- alphabetically first 5
- alphabetically last 3
- alphabetically last 5
- capitalize
- capitalize first letter
- capitalize last letter
- choose first of 3
- choose first of 5
- choose last of 3
- choose last of 5
- choose middle of 3
- choose middle of 5
- lowercase first letter
- lowercase last letter
- next capital letter
- next item
- prev item
- word length
Example usage
from transformers import AutoTokenizer, AutoModelForCausalLM
repo_id = "MohammedSabry/biinduct-1b-baseline"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Limitations
- These are research checkpoints, not production chat models.
- They were designed to study the relationship between induction-style telemetry and load-bearing ICL behavior under matched compute.
- The synthetic interventions are intentionally lightweight and token-level; results should not be interpreted as ruling out richer data-rewrite strategies.
- Because Bi-Induct replaces a fraction of natural data under iso-FLOPs, some trade-offs may reflect natural-text displacement in addition to mechanistic redundancy.
Citation
If you use this model, please cite:
@misc{sabry2026inductionsignaturesenoughmatchedcompute,
title={Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning},
author={Mohammed Sabry and Anya Belz},
year={2026},
eprint={2509.22947},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.22947},
}