--- datasets: - common-pile/comma_v0.1_training_dataset language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation --- # TinyComma 1.8B TinyComma 1.8B is a 1.8B parameter, decoder-only base LM trained entirely on permissively licensed data from the [Common Pile](https://huggingface.co/collections/common-pile/common-pile-v01). Different from the official Comma model series, TinyComma 1.8B uses the 128K-vocabulary [Llama3](https://huggingface.co/collections/meta-llama/llama-31) tokenizer to ensure compatibility with two-model decoding setups. We trained TinyComma 1.8B to support our research on inference-time copyright mitigation. - **Paper:** [Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model](https://arxiv.org/abs/2602.07120) - **Repository:** [jacqueline-he/anchored-decoding](https://github.com/jacqueline-he/anchored-decoding) - **Project Page:** [Interactive Demo](https://tinyurl.com/anchored-decoding-demo) ## Benchmarking TinyComma 1.8B We benchmarked TinyComma 1.8B and several other permissively trained base models on several common natural language understanding tasks from the [OLMES](https://github.com/allenai/olmes) evaluation suite.

Benchmarking results using OLMES. TinyComma 1.8B outperforms other models of its size range.
| Params | Head Dim. | Hidden Size | Attn. Heads | Hidden Layers | KV Heads |
|---|---|---|---|---|---|
| 1,758,562,304 | 64 | 2048 | 32 | 24 | 32 |
| Hyperparameters | Values |
|---|---|
| Optimizer | AdamW (β1=0.9, β2=0.95) |
| Learning rate | 3e−3 for Stage 1, 1e−3 for Stage 2 |
| Weight decay | 0.033 for Stage 1 |
| Batch size | 4M tokens |
| Warmup | 1000 steps for Stage 1, none for Stage 2 |
| Schedule | Cosine schedule for Stage 1, linear schedule for Stage 2 |
| Sequence length | Pack to 2048 tokens |