初始化项目,由ModelHub XC社区提供模型

Model: PleIAs/monad
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-07 20:31:13 +08:00
commit f7f5ac44d8
13 changed files with 40505 additions and 0 deletions

49
.gitattributes vendored Normal file
View File

@@ -0,0 +1,49 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model.safetensors filter=lfs diff=lfs merge=lfs -text

69
README.md Normal file
View File

@@ -0,0 +1,69 @@
---
language:
- en
license: apache-2.0
pipeline_tag: text-generation
tags:
- transformers
library_name: transformers
datasets:
- PleIAs/SYNTH
---
# ⚛️ Monad
<div align="center">
<img src="figures/pleias.jpg" width="60%" alt="Pleias" />
</div>
<p align="center">
<a href="https://pleias.fr/blog/blogsynth-the-new-data-frontier"><b>Blog announcement</b></a>
</p>
**Monad** is a 56 million parameters generalist Small Reasoning Model, trained on 200 billions tokens from <a href="https://huggingface.co/PleIAs/Baguettotron">SYNTH</a>, a fully open generalist dataset.
As of 2025, Monad is the best contender for the smallest viable language models. Despite being less than half of gpt-2, Monad not only answers in consistent English but performs significanly beyond chance on MMLU and other major industry benchmarks.
<p align="center">
<img width="80%" src="figures/training_efficiency.jpeg">
</p>
Monad's name is a reference to Leibniz concept and general idea of the smallest possible unit of intelligence.
## Features
Monad has been natively trained for instructions with thinking traces. We implemented a series of dedicated pipelines for:
* Memorization of encyclopedic knowledge (50,000 vital articles from Wikipedia), though in this size range hallucinations have to be expected.
* Retrieval-Augmented Generation with grounding (following on our initial experiments with Pleias-RAG series)
* Arithmetic and simple math resolution problem
* Editing tasks
* Information extraction
* Creative writing, including unusual synthetic exercises like lipograms or layout poems.
Monad is strictly monolingual in English. We trained a new custom tokenizer (likely one of the smallest tokenizer to date, less than 8,000 individual tokens), exclusively trained on SYNTH so that we maintain a relatively good compression ratio.
## Model design and training
Monad is a 56M parameters decoders with a standard Qwen/Llama-like design, except for its extremely compact size and overall opiniated architecture for depth (with 64 layers)
<p align="center">
<img width="80%" src="figures/monad_structure.png">
</p>
Monad was trained on 16 h100 from Jean Zay (compute plan n°A0191016886). Full pre-training took a bit less than 6 hours.
## Evaluation
Monad attains performance on MMLU significantly beyond chance with close to 30% of positive rate. We also find non-random results on gsm8k (8%) and HotPotQA (8%)
To our knowledge, there is no model remotely close in this size range for evaluation comparison. Spiritually and practically, Monad remains unique.
## Use and deployment
Monad has been trained on the standard instruction style from Qwen.
```xml
<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
<think>
```
Monad has no support yet for multi-turn.
A major envisioned use case for Monad is explainability, as the model does provide a unique trade-off between observability and actual reasoning performance.

7
chat_template.json Normal file
View File

@@ -0,0 +1,7 @@
{
"chat_template": "{% for m in messages %}<|im_start|>{{ m['role'] }}\n{{ m['content'] }}<|im_end|>\n{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n<think>\n{% endif %}",
"eos_token": "<|im_end|>",
"bos_token": "<|im_start|>",
"stop": ["<|im_end|>"],
"roles": { "user": "user", "assistant": "assistant", "system": "system" }
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 256,
"initializer_range": 0.02,
"intermediate_size": 768,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 4,
"num_hidden_layers": 64,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.51.3",
"use_cache": true,
"vocab_size": 8192
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "others", "allow_remote": true}

BIN
figures/monad_structure.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 395 KiB

BIN
figures/pleias.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.51.3"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d99bffef2d388cfbd0b39b8a0e1665e64afd7e0f7b055f7abebadc756a7227bf
size 113376216

6
special_tokens_map.json Normal file
View File

@@ -0,0 +1,6 @@
{
"bos_token": "<|begin_of_text|>",
"eos_token": "<|end_of_text|>",
"pad_token": "[PAD]",
"unk_token": "[UNK]"
}

40291
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

44
tokenizer_config.json Normal file
View File

@@ -0,0 +1,44 @@
{
"added_tokens_decoder": {
"0": {
"content": "[UNK]",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"3": {
"content": "[PAD]",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": "<|begin_of_text|>",
"clean_up_tokenization_spaces": false,
"eos_token": "<|end_of_text|>",
"extra_special_tokens": {},
"model_max_length": 1000000000000000019884624838656,
"pad_token": "[PAD]",
"tokenizer_class": "PreTrainedTokenizer",
"unk_token": "[UNK]"
}