初始化项目，由ModelHub XC社区提供模型

Model: Flexan/MohammedSabry-biinduct-1b-baseline-GGUF Source: Original Platform
2026-05-06 07:30:36 +08:00
commit 72085588e4
16 changed files with 285 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,194 @@
+---
+base_model: MohammedSabry/biinduct-1b-baseline
+library_name: transformers
+pipeline_tag: text-generation
+language:
+- en
+tags:
+- causal-lm
+- biinduct
+- pretraining
+- matched-compute
+- the-pile
+- 1b
+- baseline
+---
+
+# GGUF Files for biinduct-1b-baseline
+
+These are the GGUF files for [MohammedSabry/biinduct-1b-baseline](https://huggingface.co/MohammedSabry/biinduct-1b-baseline).
+
+## Downloads
+
+| GGUF Link | Quantization | Description |
+| ---- | ----- | ----------- |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q2_K.gguf) | Q2_K | Lowest quality |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q3_K_S.gguf) | Q3_K_S |  |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.IQ3_S.gguf) | IQ3_S | Integer quant, preferable over Q3_K_S |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.IQ3_M.gguf) | IQ3_M | Integer quant |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q3_K_M.gguf) | Q3_K_M |  |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q3_K_L.gguf) | Q3_K_L |  |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.IQ4_XS.gguf) | IQ4_XS | Integer quant |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q4_K_S.gguf) | Q4_K_S | Fast with good performance |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q4_K_M.gguf) | Q4_K_M | **Recommended:** Perfect mix of speed and performance |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q5_K_S.gguf) | Q5_K_S |  |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q5_K_M.gguf) | Q5_K_M |  |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q6_K.gguf) | Q6_K | Very good quality |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.Q8_0.gguf) | Q8_0 | Best quality |
+| [Download](https://huggingface.co/Flexan/MohammedSabry-biinduct-1b-baseline-GGUF/resolve/main/biinduct-1b-baseline.f16.gguf) | f16 | Full precision, don't bother; use a quant |
+
+## Note from Flexan
+
+I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet,
+usually for models **I deem interesting and wish to try out**.
+
+If there are some quants missing that you'd like me to add, you may request one in the community tab.
+If you want to request a public model to be converted, you can also request that in the community tab.
+If you have questions regarding this model, please refer to [the original model repo](https://huggingface.co/MohammedSabry/biinduct-1b-baseline).
+
+You can find more info about me and what I do [here](https://huggingface.co/Flexan/Flexan).
+
+# Bi-Induct 1B Baseline
+
+This repository contains the **Bi-Induct 1B Baseline** checkpoint from *Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning*.
+
+This release corresponds to the **1B** setting in the paper and is a **research checkpoint** intended for studying matched-compute pretraining, induction-style curricula, and in-context learning behavior. It is **not** instruction-tuned, alignment-tuned, or safety-tuned.
+
+## Variant
+
+Natural-only pretraining baseline with no synthetic copy snippets.
+
+## Model overview
+
+- Architecture: decoder-only Transformer
+- Positional encoding: RoPE (`theta=10000`)
+- Normalization: pre-norm residual blocks
+- MLP: SwiGLU
+- Attention: grouped-query / grouped key-value attention
+- Precision: bfloat16 training
+- Context length: 1024
+- Embeddings: untied input/output embeddings
+
+## Model specification
+
+| Field | Value |
+|---|---:|
+| Parameters (paper label) | 1B |
+| Layers | 30 |
+| Hidden size | 1,536 |
+| Intermediate / MLP size | 6,144 |
+| Head dimension | 64 |
+| Attention heads | 24 |
+| KV heads | 6 |
+
+## Training data
+
+All checkpoints in this family were pretrained on the **deduplicated THE PILE** in streaming / shuffled mode. A stable MD5-based hash was used to create a fixed held-out evaluation slice, with **0.2% of the corpus** reserved for evaluation (roughly **0.4B tokens**). Tokenization was truncated to **1024 tokens per sequence**.
+
+For the Bi-Induct variants, synthetic snippets were interleaved on top of the natural stream:
+
+- **Induction**: `[S || SEP || S]`
+- **Anti-Induction**: `[S || SEP || reverse(S)]`
+- **Balanced**: each injection randomly chooses induction or anti-induction
+
+The main cross-scale experiments used **span length L = 20** and **initial mix ratio m0 = 50%**, linearly annealed to zero over the full training budget.
+
+## Training recipe
+
+- Optimizer: AdamW (`beta1=0.9`, `beta2=0.999`, weight decay `0.1`)
+- Learning rate: peak `1e-3`
+- Schedule: `3%` linear warmup, then cosine decay
+- Update size: `2^16` tokens per update
+- Token budget: approximately `20N` tokens following the Chinchilla-style rule of thumb
+- Comparison protocol: iso-FLOPs across curricula at each scale
+
+## Evaluation summary for the 1B family
+
+The table below summarizes the main results at this scale. Standard LM benchmarks are evaluated **3-shot** and Todd et al. function-style probes are evaluated **10-shot** with **HITS@1**.
+
+| Variant | Standard LM ICL composite ↑ | Todd-style ICL composite ↑ | Held-out PPL ↓ |
+|---|---:|---:|---:|
+| Baseline | 24.2 ± 0.5 | 20.0 ± 1.3 | 14.1 |
+| Induction | 23.9 ± 0.5 | 15.2 ± 1.1 | 14.9 |
+| Anti-Induction | 23.6 ± 0.4 | 14.7 ± 1.2 | 14.9 |
+| Balanced | 24.3 ± 0.3 | 14.9 ± 1.1 | 14.9 |
+
+**This checkpoint:** **Baseline**.
+
+## Benchmarks included
+
+### Standard LM benchmarks
+- MMLU
+- Winogrande
+- CommonSenseQA
+- PIQA
+- HellaSwag
+- TriviaQA-Wiki
+- BBH (CoT)
+- OpenBookQA
+- ARC-Challenge
+- GPQA
+- GSM-8K
+- MathQA
+- BoolQ
+- LAMBADA
+
+### Todd et al. function-style probes
+- alphabetically first 3
+- alphabetically first 5
+- alphabetically last 3
+- alphabetically last 5
+- capitalize
+- capitalize first letter
+- capitalize last letter
+- choose first of 3
+- choose first of 5
+- choose last of 3
+- choose last of 5
+- choose middle of 3
+- choose middle of 5
+- lowercase first letter
+- lowercase last letter
+- next capital letter
+- next item
+- prev item
+- word length
+
+## Example usage
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+repo_id = "MohammedSabry/biinduct-1b-baseline"
+
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+model = AutoModelForCausalLM.from_pretrained(repo_id)
+
+prompt = "The capital of France is"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=20)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+## Limitations
+
+- These are research checkpoints, not production chat models.
+- They were designed to study the relationship between induction-style telemetry and load-bearing ICL behavior under matched compute.
+- The synthetic interventions are intentionally lightweight and token-level; results should not be interpreted as ruling out richer data-rewrite strategies.
+- Because Bi-Induct replaces a fraction of natural data under iso-FLOPs, some trade-offs may reflect natural-text displacement in addition to mechanistic redundancy.
+
+## Citation
+
+If you use this model, please cite:
+
+```bibtex
+@misc{sabry2026inductionsignaturesenoughmatchedcompute,
+      title={Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning}, 
+      author={Mohammed Sabry and Anya Belz},
+      year={2026},
+      eprint={2509.22947},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2509.22947}, 
+}
+```