Go to file

ModelHub XC b6d3ea87b1 初始化项目，由ModelHub XC社区提供模型

Model: bgraudt/mythos
Source: Original Platform

2026-04-24 22:49:10 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-24 22:49:10 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-24 22:49:10 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-24 22:49:10 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-24 22:49:10 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-24 22:49:10 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-24 22:49:10 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-24 22:49:10 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-24 22:49:10 +08:00

README.md

language, license, library_name, pipeline_tag, tags, model-index, widget, inference

language

license

library_name

pipeline_tag

Mythos-172M

A decoder-only language model built from scratch — LLaMA-compatible weights.

⚠️ Research preview. Debug checkpoint — trained on ~21 M tokens with vocab 3 252 for 5 000 steps. Intended to verify the architecture, not for downstream use. A production 500 M checkpoint will supersede it.

Model Summary

Mythos is a LLaMA-style autoregressive transformer implemented from first principles in pure PyTorch — no transformers inheritance, no nn.TransformerBlock, no shortcuts. Every component (attention, rotary embeddings, SwiGLU, RMSNorm, the training loop, the BPE tokenizer, the data pipeline, the KV-cache inference engine) is hand-written in the reference repository.

This release packages the weights in the LlamaForCausalLM format so that the model is natively usable via the standard transformers, vLLM, TGI, and llama.cpp toolchains — no custom code or trust_remote_code required.


Developed by	Boris Graudt
Model type	Decoder-only causal transformer
Language	English
License	MIT
Compatible with	🤗 `transformers`, vLLM, TGI, llama.cpp, Ollama
Reference implementation	github.com/borisgraudt/mythos

Architecture

Component	Choice	Value
Parameters	—	172 M
Hidden layers	Pre-norm decoder blocks	24
Hidden size	`d_model`	768
Intermediate size	SwiGLU hidden	2048
Attention heads	Multi-head	12
Key / value heads	Grouped-Query Attention	4
Head dim	`d_model / n_heads`	64
Positional encoding	Rotary (RoPE)	θ = 10,000
Normalization	RMSNorm (pre-norm)	ε = 1e-05
Activation	SwiGLU	—
Tied embeddings	Embedding ↔ LM head	✅
Vocabulary	ByteLevel BPE	3,252
Context length	Max sequence	2,048

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "bgraudt/mythos"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

inputs = tokenizer("The history of artificial intelligence begins with", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.8, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Serving with vLLM

pip install vllm
python -m vllm.entrypoints.openai.api_server --model bgraudt/mythos

Serving with llama.cpp

# Convert to GGUF (one-time)
python llama.cpp/convert_hf_to_gguf.py mythos
./llama-cli -m ggml-model-f16.gguf -p "Hello"

Training

Data

Corpus: Wikipedia (English 20231101 snapshot) — 5 000 articles, ~21 M tokens
Tokenizer: ByteLevel BPE trained from scratch, vocab size 3,252
Training context: 512 tokens

Hyperparameters


Steps	5,000
Optimizer	AdamW (β₁=0.9, β₂=0.95, wd=0.1)
LR schedule	Cosine decay, 2 000-step warmup
Peak learning rate	3 × 10⁻⁴
Precision	bfloat16 mixed
Hardware	Apple M2 (MPS)

Limitations and Intended Use

Base model only — no instruction tuning, no RLHF, no safety alignment.
English-only; non-English performance is poor.
May reproduce biases and factual errors from the training distribution.
Tiny vocabulary (3 252 tokens) severely caps fluency — intended as an architecture demo.
Not suitable for medical, legal, financial, or other high-stakes applications.

Citation

@software{graudt2026mythos,
  author  = {Graudt, Boris},
  title   = {Mythos: A Decoder-Only Language Model Built From Scratch},
  year    = {2026},
  url     = {https://github.com/borisgraudt/mythos},
  license = {MIT}
}

Acknowledgements

Architecture inspired by LLaMA (Touvron et al., 2023) and Mistral 7B (Jiang et al., 2023). Data pipeline follows the FineWeb methodology (Penedo et al., 2024).

README.md Unescape Escape

Mythos-172M

Model Summary

Architecture

Quickstart

Serving with vLLM

Serving with llama.cpp

Training

Data

Hyperparameters

Limitations and Intended Use

Citation

Acknowledgements

README.md