Go to file

ModelHub XC 9c37f22322 初始化项目，由ModelHub XC社区提供模型

Model: afrideva/smol_llama-101M-GQA-GGUF
Source: Original Platform

2026-04-18 08:31:46 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:46 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:46 +08:00

smol_llama-101m-gqa.fp16.gguf

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:46 +08:00

smol_llama-101m-gqa.q2_k.gguf

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:46 +08:00

smol_llama-101m-gqa.q3_k_m.gguf

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:46 +08:00

smol_llama-101m-gqa.q4_k_m.gguf

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:46 +08:00

smol_llama-101m-gqa.q5_k_m.gguf

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:46 +08:00

smol_llama-101m-gqa.q6_k.gguf

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:46 +08:00

smol_llama-101m-gqa.q8_0.gguf

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:46 +08:00

README.md

base_model, datasets, inference, language, license, model_creator, model_name, pipeline_tag, quantized_by, tags, thumbnail, widget

base_model

datasets

inference

language

license

model_creator

model_name

pipeline_tag

quantized_by

tags

thumbnail

widget

BEE-spoke-data/smol_llama-101M-GQA

JeanKaddour/minipile

pszemraj/simple_wikipedia_LM

BEE-spoke-data/wikipedia-20230901.en-deduped

mattymchen/refinedweb-3m

false

apache-2.0

BEE-spoke-data

smol_llama-101M-GQA

text-generation

afrideva

smol_llama

llama2

gguf

ggml

quantized

q2_k

q3_k_m

q4_k_m

q5_k_m

q6_k

q8_0

https://i.ibb.co/TvyMrRc/rsz-smol-llama-banner.png

example_title	text
El Microondas	My name is El Microondas the Wise and

example_title	text
Kennesaw State University	Kennesaw State University is a public

example_title	text
Bungie	Bungie Studios is an American video game developer. They are most famous for developing the award winning Halo series of video games. They also made Destiny. The studio was founded

example_title	text
Mona Lisa	The Mona Lisa is a world-renowned painting created by

example_title	text
Harry Potter Series	The Harry Potter series, written by J.K. Rowling, begins with the book titled

example_title	text
Riddle	Question: I have cities, but no houses. I have mountains, but no trees. I have water, but no fish. What am I? Answer:

example_title	text
Photosynthesis	The process of photosynthesis involves the conversion of

example_title	text
Story Continuation	Jane went to the store to buy some groceries. She picked up apples, oranges, and a loaf of bread. When she got home, she realized she forgot

example_title	text
Math Problem	Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and another train leaves Station B at 10:00 AM and travels at 80 mph, when will they meet if the distance between the stations is 300 miles? To determine

example_title	text
Algorithm Definition	In the context of computer programming, an algorithm is

BEE-spoke-data/smol_llama-101M-GQA-GGUF

Quantized GGUF model files for smol_llama-101M-GQA from BEE-spoke-data

Name	Quant method	Size
smol_llama-101m-gqa.fp16.gguf	fp16	203.28 MB
smol_llama-101m-gqa.q2_k.gguf	q2_k	50.93 MB
smol_llama-101m-gqa.q3_k_m.gguf	q3_k_m	57.06 MB
smol_llama-101m-gqa.q4_k_m.gguf	q4_k_m	65.40 MB
smol_llama-101m-gqa.q5_k_m.gguf	q5_k_m	74.34 MB
smol_llama-101m-gqa.q6_k.gguf	q6_k	83.83 MB
smol_llama-101m-gqa.q8_0.gguf	q8_0	108.35 MB

Original Model Card:

smol_llama-101M-GQA

A small 101M param (total) decoder model. This is the first version of the model.

768 hidden size, 6 layers
GQA (24 heads, 8 key-value), context length 1024
train-from-scratch

Notes

This checkpoint is the 'raw' pre-trained model and has not been tuned to a more specific task. It should be fine-tuned before use in most cases.

Checkpoints & Links

smol-er 81M parameter checkpoint with in/out embeddings tied: here
Fine-tuned on pypi to generate Python code - link
For the chat version of this model, please see here

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	25.32
ARC (25-shot)	23.55
HellaSwag (10-shot)	28.77
MMLU (5-shot)	24.24
TruthfulQA (0-shot)	45.76
Winogrande (5-shot)	50.67
GSM8K (5-shot)	0.83
DROP (3-shot)	3.39