153 lines
4.6 KiB
Markdown
153 lines
4.6 KiB
Markdown
---
|
||
library_name: transformers
|
||
model-index:
|
||
- name: ldm_soup_Llama-3.1-8B-Inst
|
||
results:
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: IFEval (0-Shot)
|
||
type: HuggingFaceH4/ifeval
|
||
args:
|
||
num_few_shot: 0
|
||
metrics:
|
||
- type: inst_level_strict_acc and prompt_level_strict_acc
|
||
value: 80.33
|
||
name: strict accuracy
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: BBH (3-Shot)
|
||
type: BBH
|
||
args:
|
||
num_few_shot: 3
|
||
metrics:
|
||
- type: acc_norm
|
||
value: 31.1
|
||
name: normalized accuracy
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: MATH Lvl 5 (4-Shot)
|
||
type: hendrycks/competition_math
|
||
args:
|
||
num_few_shot: 4
|
||
metrics:
|
||
- type: exact_match
|
||
value: 11.56
|
||
name: exact match
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: GPQA (0-shot)
|
||
type: Idavidrein/gpqa
|
||
args:
|
||
num_few_shot: 0
|
||
metrics:
|
||
- type: acc_norm
|
||
value: 5.26
|
||
name: acc_norm
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: MuSR (0-shot)
|
||
type: TAUR-Lab/MuSR
|
||
args:
|
||
num_few_shot: 0
|
||
metrics:
|
||
- type: acc_norm
|
||
value: 11.52
|
||
name: acc_norm
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst
|
||
name: Open LLM Leaderboard
|
||
- task:
|
||
type: text-generation
|
||
name: Text Generation
|
||
dataset:
|
||
name: MMLU-PRO (5-shot)
|
||
type: TIGER-Lab/MMLU-Pro
|
||
config: main
|
||
split: test
|
||
args:
|
||
num_few_shot: 5
|
||
metrics:
|
||
- type: acc
|
||
value: 32.07
|
||
name: accuracy
|
||
source:
|
||
url: >-
|
||
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst
|
||
name: Open LLM Leaderboard
|
||
license: apache-2.0
|
||
language:
|
||
- en
|
||
base_model:
|
||
- meta-llama/Llama-3.1-8B-Instruct
|
||
---
|
||
|
||
# Model Card for DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst
|
||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in
|
||
- compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
## Overview
|
||
|
||
|
||
**DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst** is developed by **deepAuto.ai** and builds upon the **VAGOsolutions/Llama-3.1-SauerkrautLM-8B-Instruct** model. Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by training a latent diffusion model on the pretrained weights.
|
||
|
||
Through this process, we learn the distribution of the base model's weight space, enabling us to explore optimal configurations. We then sample multiple sets of weights, using the **model-soup averaging technique** to identify the best-performing weights for both datasets. These weights are merged using linear interpolation to create the final model weights for **DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst**.
|
||
|
||
This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.
|
||
|
||
The work is currently in progress
|
||
|
||
|
||
|
||
|
||
## References
|
||
<a href="https://arxiv.org/abs/2402.18153" target="_blank">Diffusion-Based Neural Network Weights Generation</a>
|
||
|
||
|
||
|
||
## Evaluation
|
||
|
||
|
||
|
||
### Results
|
||
|
||
|
||
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
||
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_DeepAutoAI__ldm_soup_Llama-3.1-8B-Inst)
|
||
|
||
| Metric |Value|
|
||
|-------------------|----:|
|
||
|Avg. |28.64|
|
||
|IFEval (0-Shot) |80.33|
|
||
|BBH (3-Shot) |31.10|
|
||
|MATH Lvl 5 (4-Shot)|11.56|
|
||
|GPQA (0-shot) | 5.26|
|
||
|MuSR (0-shot) |11.52|
|
||
|MMLU-PRO (5-shot) |32.07| |