初始化项目,由ModelHub XC社区提供模型

Model: ali-elganzory/Baguettotron
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-14 22:01:18 +08:00
commit f8436b9566
16 changed files with 327864 additions and 0 deletions

39
.gitattributes vendored Normal file
View File

@@ -0,0 +1,39 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
figures/baguettotron_structure.png filter=lfs diff=lfs merge=lfs -text
figures/table_evaluation.png filter=lfs diff=lfs merge=lfs -text
figures/training_baguettotron.png filter=lfs diff=lfs merge=lfs -text
figures/comparison_models.png filter=lfs diff=lfs merge=lfs -text

168
README.md Normal file
View File

@@ -0,0 +1,168 @@
---
language:
- en
- fr
- it
- de
- es
- pl
license: apache-2.0
pipeline_tag: text-generation
tags:
- transformers
library_name: transformers
datasets:
- PleIAs/SYNTH
---
# 🥖 Baguettotron
<div align="center">
<img src="figures/pleias.jpg" width="60%" alt="Pleias" />
</div>
<p align="center">
<a href="https://pleias.fr/blog/blogsynth-the-new-data-frontier"><b>Blog announcement</b></a>
</p>
**Baguettotron** is a 321 million parameters generalist Small Reasoning Model, trained on 200 billions tokens from <a href="https://huggingface.co/datasets/PleIAs/SYNTH">SYNTH</a>, a fully open generalist dataset.
Despite being trained on consideraly less data, Baguettotron outperforms most SLM of the same size range on non-code industry benchmarks, providing an unprecedented balance between memory, general reasoning, math and retrieval performance.
<p align="center">
<img width="80%" src="figures/training_efficiency.jpeg">
</p>
The name is both a nod to French origins and to the unusual shape of the model: with 80 layers, Baguettotron is currently the deepest SLM in its size range.
## Features
Baguettron has been natively trained for instructions with thinking traces. We implemented a series of dedicated pipelines for:
* Memorization of encyclopedic knowledge (50,000 vital articles from Wikipedia)
* Retrieval-Augmented Generation with grounding (following on our initial experiments with Pleias-RAG series)
* Arithmetic and simple math resolution problem
* Editing tasks
* Information extraction
* Creative writing, including unusual synthetic exercises like lipograms or layout poems.
* Cooking (the model wouldn't deserve its name otherwise)
Baguettotron is able to read and write in the main European languages: French, German, Italian, Spanish, Polish and, to a lesser extent Latin and Dutch. Reasoning traces are exclusively written in English.
Full synthetic training makes relatively straightforward to expand language support and we lookg forward to either bring more languages or create language-specific variants.
## Model design and training
Baguettotron is a 321M parameters decoders with a standard Qwen/Llama-like design, except for extreme depth with 80 layers (a type of model we internally nicknamed "baguette")
<p align="center">
<img width="80%" src="figures/baguettotron_structure.png">
</p>
Baguettotron was trained on 16 h100 from Jean Zay (compute plan n°A0191016886). An unusual feature of training on SYNTH was having reasoning signals from MMLU and other major industry benchmarks very early on. We were able to empirically measure consistent improvements from stacking more layers.
<p align="center">
<img width="80%" src="figures/training_baguettotron.png">
</p>
Our current hypothesis is that deeper architecture benefits more from dense reasoning data, as the model is more commonly exposed to string sequences requiring intensive computation or knowledge interconnection.
## Reasoning style
The reasoning traces use an entirely new reasoning style with dense, short frequently non-verbal sentences, designed by Pleias and made possible thanks to the use of fine-tuning models for synthetic generation.
Traces use the following stenographic notation integrated into the special tokens of the model:
### Logical markers
| Token | Meaning | Usage |
| ----- | ---------------------------------- | ---------------------------------------------------------- |
| **→** | derivation / implication | For very short causal/logical flow |
| **↺** | iterative return / refinement loop | For backtracking, reconsidering priors, RAG re-querying. |
| **?** | uncertainty/questions to resolve | Could be appended to short expressions/word, not just interrogative sentences |
| **!/※** | insight/breakthroughs | Emphatic mark for knowledge discovery |
| **≈** | approximation/estimates | For intermediary hypothesis/uncertain preliminary statements |
| **∴** | therefore / final step | Use sparingly to mark stable conclusions. |
### Uncertainty
| Token | Meaning | Usage |
| ----- | ------------------------- | ------------------------------------------------------------- |
| **●** | high confidence | well-supported empirical/theoretical ground; “anchor points.” |
| **◐** | medium/partial confidence | incomplete data; plausible but unverified links. |
| **○** | low confidence | speculation, missing context, weak inference chain. |
| **⚠** | bias/premise risk | domain mismatch, cultural assumptions, language-switch artifacts. |
| **?maybe?** | soft speculation | marks tentative ideas, reasoning branches that might collapse later |
### Verification process
| Token | Meaning | Usage |
| ----- | ------------------------- | ---------------------------------------- |
| **☐** | unverified hypothesis | raw claim, no cross-check yet. |
| **☑** | intermediate verification | one source/argument supports it. |
| **✓** | confirmed/validated | multiple independent supports (●-level). |
The model can also use a vareity of graphic notation for causality/problem decomposition at time. Things like:
```
Initial query:
├─ feature1: *lorem ipsum*
├─ feature2: *lorem ipsum*
└─ feature2: *lorem ipsum*
```
### Simulated entropy
Baguettotron uses a range of special tokens **⟨H≈X.X⟩** to introduce higher entropy sequences, a bit similarly to temperature control.
* **⟨H≈0.30.5⟩**: still grounded sequences with a slightly higher token entropy
* **⟨H≈0.51.0⟩**: exploratory, multi-path reasoning
* **⟨H≈1.51.8⟩**: fragmented, oniric, literary stream-of-consciousness drift
It remains a pure simulation since the model does obviously not have access to inference controls. Yet it still allows for more token exploration/diversification. Inspiration from this method came from the <a href="https://github.com/xjdr-alt/entropix">Entropix</a> project.
## Evaluation
We evaluated Baguettotron on three major industry benchmarks MMLU (general reasoning and memorization), math (gsm8k) and retrieval (HotPotQA). With only 321M parameters, Baguettotron gets close to Qwen-0.6B performance and significantly outperforms similarly sized Gemma.
<p align="center">
<img width="80%" src="figures/table_evaluation.png">
</p>
## Inference
Baguettotron has been trained on the standard instruction style from Qwen.
```xml
<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
<think>
```
Baguettotron has support for multi-turn. We recommend to use a "rolling" thinking, by systematically appending thinking traces for each new generation but discarding the past one.
It's possible to remove thinking traces by swapping <think> with a closing </think> tag.
```xml
<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
</think>
```
Yet, our current tests show a significantly decreased performance for most tasks, especially memorization of encyclopedic knowledge.
For RAG, Baguettotron uses a special syntax to pass on references:
```xml
<|im_start|>user
Who are you?
<source_1>[…]</source_1>
<source_2>[…]</source_2>
<|im_end|>
<|im_start|>assistant
<think>
```
Afterwards the model will return an answer with grounding references (<ref>[quote]</ref>). The draft will be affected as well and focus on source synthesis rather than reminiscence of internal knowledge base.
## Fine-Tuning/RL
Baguettotron has been successfully fine-tuned for a variety of tasks including text classification and <a href="https://x.com/darrenangle/status/1990259914602856831">poetry writing</a>.
Since it's a reasoning model, it should train well with reinforcement learning methods like GRPO, either for verifiable tasks or with a LLM-as-a-judge.

25
chat_template.jinja Normal file
View File

@@ -0,0 +1,25 @@
{%- for message in messages -%}
{%- if message["role"] == "system" -%}
{{- "<|system|>
" + message["content"] + "
" -}}
{%- elif message["role"] == "user" -%}
{{- "<|user|>
" + message["content"] + "
" -}}
{%- elif message["role"] == "assistant" -%}
{%- if not loop.last -%}
{{- "<|assistant|>
" + message["content"] + eos_token + "
" -}}
{%- else -%}
{{- "<|assistant|>
" + message["content"] + eos_token -}}
{%- endif -%}
{%- endif -%}
{%- if loop.last and add_generation_prompt -%}
{{- "<|assistant|>
" -}}
{%- endif -%}
{%- endfor -%}

7
chat_template.json Normal file
View File

@@ -0,0 +1,7 @@
{
"chat_template": "{% for m in messages %}<|im_start|>{{ m['role'] }}\n{{ m['content'] }}<|im_end|>\n{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n<think>\n{% endif %}",
"eos_token": "<|im_end|>",
"bos_token": "<|im_start|>",
"stop": ["<|im_end|>"],
"roles": { "user": "user", "assistant": "assistant", "system": "system" }
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 576,
"initializer_range": 0.02,
"intermediate_size": 1536,
"max_position_embeddings": 4096,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 9,
"num_hidden_layers": 80,
"num_key_value_heads": 3,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.51.3",
"use_cache": true,
"vocab_size": 65536
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ba862d564d77d7e8d87416298fa4b5e1af5dbfaeb4dc63b848f7b2f2197e7760
size 402783

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d47c8cdb0bc5bca30738fa5059e20af373efd899ec924eb1b53d5a516bfac028
size 885841

BIN
figures/pleias.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1d64da90efb12166654bac389a6b348c3ffafe740a6cd392cba6c084c19a92b3
size 131441

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9abe7f56929daf01bf65d98629a76146decd9821c2faf2764d133099390d9b1c
size 328893

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.51.3"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dbb3fe0fd0d97a28c140aa315ec4a651f20432e9b7a509908a620190f506644b
size 641995416

77
special_tokens_map.json Normal file
View File

@@ -0,0 +1,77 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end>",
"<think>",
"</think>",
"source_1",
"source_2",
"source_3",
"source_4",
"source_5",
"source_6",
"source_7",
"source_8",
"source_9",
"source_10",
"<ref",
"</ref>",
"→",
"↺",
"※",
"?maybe?",
"●",
"◐",
"○",
"⚠",
"☐",
"☑",
"✓",
"⟨H≈0.1⟩",
"⟨H≈0.2⟩",
"⟨H≈0.3⟩",
"⟨H≈0.4⟩",
"⟨H≈0.5⟩",
"⟨H≈0.6⟩",
"⟨H≈0.7⟩",
"⟨H≈0.8⟩",
"⟨H≈0.9⟩",
"⟨H≈1.0⟩",
"⟨H≈1.1⟩",
"⟨H≈1.2⟩",
"⟨H≈1.3⟩",
"⟨H≈1.4⟩",
"⟨H≈1.5⟩",
"⟨H≈1.6⟩",
"⟨H≈1.7⟩",
"⟨H≈1.8⟩"
],
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "[PAD]",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "[UNK]",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

327047
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

451
tokenizer_config.json Normal file
View File

@@ -0,0 +1,451 @@
{
"added_tokens_decoder": {
"0": {
"content": "[UNK]",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"3": {
"content": "[PAD]",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65491": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65492": {
"content": "<|im_end>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65493": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65494": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65495": {
"content": "source_1",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65496": {
"content": "source_2",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65497": {
"content": "source_3",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65498": {
"content": "source_4",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65499": {
"content": "source_5",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65500": {
"content": "source_6",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65501": {
"content": "source_7",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65502": {
"content": "source_8",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65503": {
"content": "source_9",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65504": {
"content": "source_10",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65505": {
"content": "<ref",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65506": {
"content": "</ref>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65507": {
"content": "→",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65508": {
"content": "↺",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65509": {
"content": "※",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65510": {
"content": "?maybe?",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65511": {
"content": "●",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65512": {
"content": "◐",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65513": {
"content": "○",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65514": {
"content": "⚠",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65515": {
"content": "☐",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65516": {
"content": "☑",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65517": {
"content": "✓",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65518": {
"content": "⟨H≈0.1⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65519": {
"content": "⟨H≈0.2⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65520": {
"content": "⟨H≈0.3⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65521": {
"content": "⟨H≈0.4⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65522": {
"content": "⟨H≈0.5⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65523": {
"content": "⟨H≈0.6⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65524": {
"content": "⟨H≈0.7⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65525": {
"content": "⟨H≈0.8⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65526": {
"content": "⟨H≈0.9⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65527": {
"content": "⟨H≈1.0⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65528": {
"content": "⟨H≈1.1⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65529": {
"content": "⟨H≈1.2⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65530": {
"content": "⟨H≈1.3⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65531": {
"content": "⟨H≈1.4⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65532": {
"content": "⟨H≈1.5⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65533": {
"content": "⟨H≈1.6⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65534": {
"content": "⟨H≈1.7⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"65535": {
"content": "⟨H≈1.8⟩",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end>",
"<think>",
"</think>",
"source_1",
"source_2",
"source_3",
"source_4",
"source_5",
"source_6",
"source_7",
"source_8",
"source_9",
"source_10",
"<ref",
"</ref>",
"→",
"↺",
"※",
"?maybe?",
"●",
"◐",
"○",
"⚠",
"☐",
"☑",
"✓",
"⟨H≈0.1⟩",
"⟨H≈0.2⟩",
"⟨H≈0.3⟩",
"⟨H≈0.4⟩",
"⟨H≈0.5⟩",
"⟨H≈0.6⟩",
"⟨H≈0.7⟩",
"⟨H≈0.8⟩",
"⟨H≈0.9⟩",
"⟨H≈1.0⟩",
"⟨H≈1.1⟩",
"⟨H≈1.2⟩",
"⟨H≈1.3⟩",
"⟨H≈1.4⟩",
"⟨H≈1.5⟩",
"⟨H≈1.6⟩",
"⟨H≈1.7⟩",
"⟨H≈1.8⟩"
],
"bos_token": "<|begin_of_text|>",
"clean_up_tokenization_spaces": true,
"eos_token": "<|end_of_text|>",
"extra_special_tokens": {},
"model_max_length": 1000000000000000019884624838656,
"pad_token": "[PAD]",
"tokenizer_class": "PreTrainedTokenizerFast",
"unk_token": "[UNK]"
}