Files

ModelHub XC be259d2ee6 初始化项目，由ModelHub XC社区提供模型

Model: RhinoWithAcape/helium-1-2b-GGUF
Source: Original Platform

2026-06-21 02:34:16 +08:00

4.2 KiB

Raw Blame History

license, base_model, tags, language, pipeline_tag, library_name

license

base_model

Helium-1-2B — GGUF

🟢 Fits on: every GPU class — even integrated graphics. Runs on phones at Q2_K.

GGUF conversion of kyutai/helium-1-2b — Kyutai's lightweight 2B base language model targeting edge and mobile devices, with native support for all 24 official EU languages.

This is a community quantization. The base model is by Kyutai (creators of Mimi, Moshi, and the Kyutai TTS/STT family). Until now, only MLX (Apple Silicon) variants existed — this fills the GGUF gap for llama.cpp and ollama users.

Model details

Field	Value
Architecture	`LlamaForCausalLM` (standard Llama; works with stock llama.cpp)
Parameters	2B
Layers	28
Hidden size	2048
Vocab	64,000 (multilingual)
Context	4K
Type	Base model — not instruction-tuned
License	CC-BY-SA 4.0 + Gemma Terms of Use (Helium is distilled from Gemma 2)

Use case

Edge / mobile inference — fits comfortably on consumer hardware, including phones and small GPUs
EU multilingual base — train your own instruction-following model on top of this with the language coverage you need
Research — distillation lineage from Gemma 2 with smaller footprint
Not for chat out-of-the-box — this is a base model, no instruction tuning. For chat, fine-tune it first.

Quants

Quant	Size	Use case
Q2_K	~0.8 GB	tiniest footprint — phones, microcontrollers, 4 GB cards
Q3_K_M	~1.0 GB	balance for 6 GB cards
Q4_K_M	~1.2 GB	recommended default — fits anywhere
Q5_K_M	~1.5 GB	quality bump if you have headroom
Q6_K	~1.8 GB	near-lossless
Q8_0	~2.3 GB	reference quality
F16	~4.0 GB	full precision

Usage — Ollama

hf download RhinoWithAcape/helium-1-2b-GGUF \
  helium-1-2b.Q4_K_M.gguf Modelfile --local-dir ./helium
cd ./helium
ollama create helium-1-2b:Q4_K_M -f Modelfile
ollama run helium-1-2b:Q4_K_M "Once upon a time"

Usage — llama.cpp

./build/bin/llama-completion \
    -m helium-1-2b.Q4_K_M.gguf \
    -p "The capital of France is" \
    -n 30 --temp 0.6

(Sample: "The capital of France is Paris...")

License notes

This conversion is CC-BY-SA 4.0 (matching the source release).
Helium-1 is distilled from Gemma 2, so use is also subject to the Gemma Terms of Use.
This GGUF inherits both terms.

Conversion details

Source: kyutai/helium-1-2b (downloaded 2026-04-29; Q2_K + Q3_K_M backfilled 2026-05-02)
Tools: stock llama.cpp (no patches required — standard Llama arch)
Steps: convert_hf_to_gguf.py → llama-quantize

Acknowledgments

Kyutai for the open release of Helium-1, targeting under-served EU language coverage at edge scale
Google DeepMind for the Gemma 2 base from which Helium was distilled
llama.cpp maintainers

4.2 KiB Raw Blame History