Files
Llama-3-8b-ita-slerp/README.md
ModelHub XC f56e0a0676 初始化项目,由ModelHub XC社区提供模型
Model: anakin87/Llama-3-8b-ita-slerp
Source: Original Platform
2026-05-16 19:53:59 +08:00

1.9 KiB

base_model, library_name, tags, license, language
base_model library_name tags license language
swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
DeepMount00/Llama-3-8b-Ita
transformers
mergekit
merge
llama3
it

Llama-3-8b-ita-slerp

This is a merge of pre-trained language models created using mergekit.

I tried to merge two of the best Italian LLMs using Mergekit. The results are acceptable, but I could not improve on the best existing model.

Evaluation

For a detailed comparison of model performance, check out the Leaderboard for Italian Language Models.

Here's a breakdown of the performance metrics:

Metric hellaswag_it acc_norm arc_it acc_norm m_mmlu_it 5-shot acc Average
Accuracy Normalized 0.6879 0.5714 0.5732 0.6109

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:


slices:
- sources:
  - model: swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
    layer_range:
    - 0
    - 32
  - model: DeepMount00/Llama-3-8b-Ita
    layer_range:
    - 0
    - 32
merge_method: slerp
base_model: swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
parameters:
  t:
  - filter: self_attn
    value:
    - 0
    - 0.5
    - 0.3
    - 0.7
    - 1
  - filter: mlp
    value:
    - 1
    - 0.5
    - 0.7
    - 0.3
    - 0
  - value: 0.5
dtype: bfloat16