license, base_model, datasets, inference, widget, model-index
license base_model datasets inference widget model-index
apache-2.0 BEE-spoke-data/smol_llama-220M-GQA
teknium/openhermes
parameters
do_sample renormalize_logits temperature top_p top_k min_new_tokens max_new_tokens repetition_penalty no_repeat_ngram_size epsilon_cutoff
true true 0.25 0.95 50 2 96 1.03 5 0.0008
text example_title
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Write an ode to Chipotle burritos. ### Response: burritos
name results
smol_llama-220M-openhermes
task dataset metrics source
type name
text-generation Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot) ai2_arc ARC-Challenge test
num_few_shot
25
type value name
acc_norm 25.17 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
HellaSwag (10-Shot) hellaswag validation
num_few_shot
10
type value name
acc_norm 28.98 normalized accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU (5-Shot) cais/mmlu all test
num_few_shot
5
type value name
acc 26.17 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
TruthfulQA (0-shot) truthful_qa multiple_choice validation
num_few_shot
0
type value
mc2 43.08
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
Winogrande (5-shot) winogrande winogrande_xl validation
num_few_shot
5
type value name
acc 52.01 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
GSM8k (5-shot) gsm8k main test
num_few_shot
5
type value name
acc 0.61 accuracy
url name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
IFEval (0-Shot) HuggingFaceH4/ifeval
num_few_shot
0
type value name
inst_level_strict_acc and prompt_level_strict_acc 15.55 strict accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
BBH (3-Shot) BBH
num_few_shot
3
type value name
acc_norm 3.11 normalized accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MATH Lvl 5 (4-Shot) hendrycks/competition_math
num_few_shot
4
type value name
exact_match 0.0 exact match
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
GPQA (0-shot) Idavidrein/gpqa
num_few_shot
0
type value name
acc_norm 2.35 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MuSR (0-shot) TAUR-Lab/MuSR
num_few_shot
0
type value name
acc_norm 6.22 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU-PRO (5-shot) TIGER-Lab/MMLU-Pro main test
num_few_shot
5
type value name
acc 1.34 accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes Open LLM Leaderboard

BEE-spoke-data/smol_llama-220M-openhermes

Please note that this is an experiment, and the model has limitations because it is smol.

prompt format is alpaca

Below is an instruction that describes a task, paired with an input that
provides further context. Write a response that appropriately completes
the request.  

### Instruction:  

How can I increase my meme production/output? Currently, I only create them in ancient babylonian which is time consuming.  

### Inputs:

### Response:

It was trained on inputs so if you have inputs (like some text to ask a question about) then include it under ### Inputs:

Example

Output on the text above ^. The inference API is set to sample with low temp so you should see (at least slightly) different generations each time.

image/png

Note that the inference API parameters used here are an initial educated guess, and may be updated over time:

inference:
  parameters:
    do_sample: true
    renormalize_logits: true
    temperature: 0.25
    top_p: 0.95
    top_k: 50
    min_new_tokens: 2
    max_new_tokens: 96
    repetition_penalty: 1.03
    no_repeat_ngram_size: 5
    epsilon_cutoff: 0.0008

Feel free to experiment with the parameters using the model in Python and let us know if you have improved results with other params!

Data

Note that this checkpoint was fine-tuned on teknium/openhermes, which is generated/synthetic data by an OpenAI model. This means usage of this checkpoint should follow their terms of use: https://openai.com/policies/terms-of-use


Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 29.34
AI2 Reasoning Challenge (25-Shot) 25.17
HellaSwag (10-Shot) 28.98
MMLU (5-Shot) 26.17
TruthfulQA (0-shot) 43.08
Winogrande (5-shot) 52.01
GSM8k (5-shot) 0.61

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 4.76
IFEval (0-Shot) 15.55
BBH (3-Shot) 3.11
MATH Lvl 5 (4-Shot) 0.00
GPQA (0-shot) 2.35
MuSR (0-shot) 6.22
MMLU-PRO (5-shot) 1.34
Description
Model synced from source: BEE-spoke-data/smol_llama-220M-openhermes
Readme 515 KiB