--- base_model: MohammedSabry/biinduct-1b-baseline library_name: transformers pipeline_tag: text-generation language: - en tags: - causal-lm - biinduct - pretraining - matched-compute - the-pile - 1b - baseline --- # GGUF Files for biinduct-1b-baseline These are the GGUF files for [MohammedSabry/biinduct-1b-baseline](https://huggingface.co/MohammedSabry/biinduct-1b-baseline). ## Downloads | GGUF Link | Quantization | Description | | ---- | ----- | ----------- | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q2_K.gguf) | Q2_K | Lowest quality | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q3_K_S.gguf) | Q3_K_S | | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.IQ3_S.gguf) | IQ3_S | Integer quant, preferable over Q3_K_S | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.IQ3_M.gguf) | IQ3_M | Integer quant | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q3_K_M.gguf) | Q3_K_M | | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q3_K_L.gguf) | Q3_K_L | | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.IQ4_XS.gguf) | IQ4_XS | Integer quant | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q4_K_S.gguf) | Q4_K_S | Fast with good performance | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q4_K_M.gguf) | Q4_K_M | **Recommended:** Perfect mix of speed and performance | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q5_K_S.gguf) | Q5_K_S | | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q5_K_M.gguf) | Q5_K_M | | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q6_K.gguf) | Q6_K | Very good quality | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q8_0.gguf) | Q8_0 | Best quality | | [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.f16.gguf) | f16 | Full precision, don't bother; use a quant | ## Note from Flexan I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet, usually for models **I deem interesting and wish to try out**. If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding this model, please refer to [the original model repo](https://huggingface.co/MohammedSabry/biinduct-1b-baseline). You can find more info about me and what I do [here](https://huggingface.co/Flexan/Flexan). # Bi-Induct 1B Baseline This repository contains the **Bi-Induct 1B Baseline** checkpoint from *Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning*. This release corresponds to the **1B** setting in the paper and is a **research checkpoint** intended for studying matched-compute pretraining, induction-style curricula, and in-context learning behavior. It is **not** instruction-tuned, alignment-tuned, or safety-tuned. ## Variant Natural-only pretraining baseline with no synthetic copy snippets. ## Model overview - Architecture: decoder-only Transformer - Positional encoding: RoPE (`theta=10000`) - Normalization: pre-norm residual blocks - MLP: SwiGLU - Attention: grouped-query / grouped key-value attention - Precision: bfloat16 training - Context length: 1024 - Embeddings: untied input/output embeddings ## Model specification | Field | Value | |---|---:| | Parameters (paper label) | 1B | | Layers | 30 | | Hidden size | 1,536 | | Intermediate / MLP size | 6,144 | | Head dimension | 64 | | Attention heads | 24 | | KV heads | 6 | ## Training data All checkpoints in this family were pretrained on the **deduplicated THE PILE** in streaming / shuffled mode. A stable MD5-based hash was used to create a fixed held-out evaluation slice, with **0.2% of the corpus** reserved for evaluation (roughly **0.4B tokens**). Tokenization was truncated to **1024 tokens per sequence**. For the Bi-Induct variants, synthetic snippets were interleaved on top of the natural stream: - **Induction**: `[S || SEP || S]` - **Anti-Induction**: `[S || SEP || reverse(S)]` - **Balanced**: each injection randomly chooses induction or anti-induction The main cross-scale experiments used **span length L = 20** and **initial mix ratio m0 = 50%**, linearly annealed to zero over the full training budget. ## Training recipe - Optimizer: AdamW (`beta1=0.9`, `beta2=0.999`, weight decay `0.1`) - Learning rate: peak `1e-3` - Schedule: `3%` linear warmup, then cosine decay - Update size: `2^16` tokens per update - Token budget: approximately `20N` tokens following the Chinchilla-style rule of thumb - Comparison protocol: iso-FLOPs across curricula at each scale ## Evaluation summary for the 1B family The table below summarizes the main results at this scale. Standard LM benchmarks are evaluated **3-shot** and Todd et al. function-style probes are evaluated **10-shot** with **HITS@1**. | Variant | Standard LM ICL composite ↑ | Todd-style ICL composite ↑ | Held-out PPL ↓ | |---|---:|---:|---:| | Baseline | 24.2 ± 0.5 | 20.0 ± 1.3 | 14.1 | | Induction | 23.9 ± 0.5 | 15.2 ± 1.1 | 14.9 | | Anti-Induction | 23.6 ± 0.4 | 14.7 ± 1.2 | 14.9 | | Balanced | 24.3 ± 0.3 | 14.9 ± 1.1 | 14.9 | **This checkpoint:** **Baseline**. ## Benchmarks included ### Standard LM benchmarks - MMLU - Winogrande - CommonSenseQA - PIQA - HellaSwag - TriviaQA-Wiki - BBH (CoT) - OpenBookQA - ARC-Challenge - GPQA - GSM-8K - MathQA - BoolQ - LAMBADA ### Todd et al. function-style probes - alphabetically first 3 - alphabetically first 5 - alphabetically last 3 - alphabetically last 5 - capitalize - capitalize first letter - capitalize last letter - choose first of 3 - choose first of 5 - choose last of 3 - choose last of 5 - choose middle of 3 - choose middle of 5 - lowercase first letter - lowercase last letter - next capital letter - next item - prev item - word length ## Example usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM repo_id = "MohammedSabry/biinduct-1b-baseline" tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForCausalLM.from_pretrained(repo_id) prompt = "The capital of France is" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=20) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Limitations - These are research checkpoints, not production chat models. - They were designed to study the relationship between induction-style telemetry and load-bearing ICL behavior under matched compute. - The synthetic interventions are intentionally lightweight and token-level; results should not be interpreted as ruling out richer data-rewrite strategies. - Because Bi-Induct replaces a fraction of natural data under iso-FLOPs, some trade-offs may reflect natural-text displacement in addition to mechanistic redundancy. ## Citation If you use this model, please cite: ```bibtex @misc{sabry2026inductionsignaturesenoughmatchedcompute, title={Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning}, author={Mohammed Sabry and Anya Belz}, year={2026}, eprint={2509.22947}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2509.22947}, } ```