language, license, library_name, base_model, datasets, model-index
language license library_name base_model datasets model-index
it
en
llama3 transformers meta-llama/Meta-Llama-3-8B
DeepMount00/llm_ita_ultra
name results
Llama-3-8b-Ita
task dataset metrics source
type name
text-generation Text Generation
name type args
IFEval (0-Shot) HuggingFaceH4/ifeval
num_few_shot
0
type value name
inst_level_strict_acc and prompt_level_strict_acc 75.3 strict accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepMount00/Llama-3-8b-Ita Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
BBH (3-Shot) BBH
num_few_shot
3
type value name
acc_norm 28.08 normalized accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepMount00/Llama-3-8b-Ita Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MATH Lvl 5 (4-Shot) hendrycks/competition_math
num_few_shot
4
type value name
exact_match 5.36 exact match
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepMount00/Llama-3-8b-Ita Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
GPQA (0-shot) Idavidrein/gpqa
num_few_shot
0
type value name
acc_norm 7.38 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepMount00/Llama-3-8b-Ita Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MuSR (0-shot) TAUR-Lab/MuSR
num_few_shot
0
type value name
acc_norm 11.68 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepMount00/Llama-3-8b-Ita Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU-PRO (5-shot) TIGER-Lab/MMLU-Pro main test
num_few_shot
5
type value name
acc 31.69 accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepMount00/Llama-3-8b-Ita Open LLM Leaderboard

💡 Found this resource helpful? Creating and maintaining open source AI models and datasets requires significant computational resources. If this work has been valuable to you, consider supporting my research to help me continue building tools that benefit the entire AI community. Every contribution directly funds more open source innovation!


Model Architecture

Evaluation

For a detailed comparison of model performance, check out the Leaderboard for Italian Language Models.

Here's a breakdown of the performance metrics:

Metric hellaswag_it acc_norm arc_it acc_norm m_mmlu_it 5-shot acc Average
Accuracy Normalized 0.6518 0.5441 0.5729 0.5896

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

MODEL_NAME = "DeepMount00/Llama-3-8b-Ita"

model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16).eval()
model.to(device)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def generate_answer(prompt):
    messages = [
        {"role": "user", "content": prompt},
    ]
    model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
    generated_ids = model.generate(model_inputs, max_new_tokens=200, do_sample=True,
                                          temperature=0.001)
    decoded = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
    return decoded[0]

prompt = "Come si apre un file json in python?"
answer = generate_answer(prompt)
print(answer)

Developer

[Michele Montebovi]

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 26.58
IFEval (0-Shot) 75.30
BBH (3-Shot) 28.08
MATH Lvl 5 (4-Shot) 5.36
GPQA (0-shot) 7.38
MuSR (0-shot) 11.68
MMLU-PRO (5-shot) 31.69
Description
Model synced from source: DeepMount00/Llama-3-8b-Ita
Readme 30 KiB