ModelHub XC 867cf8db0d 初始化项目,由ModelHub XC社区提供模型
Model: agentlans/Llama3.1-Daredevilish-Instruct
Source: Original Platform
2026-06-24 12:46:17 +08:00

license, base_model, tags, model-index
license base_model tags model-index
llama3.1
DreadPoor/LemonP-8B-Model_Stock
Youlln/1PARAMMYL-8B-ModelStock
jaspionjader/f-2-8b
Etherll/SuperHermes
meta-llama/Llama-3.1-8B-Instruct
merge
mergekit
name results
Llama3.1-Daredevilish-Instruct
task dataset metrics source
type name
text-generation Text Generation
name type split args
IFEval (0-Shot) wis-k/instruction-following-eval train
num_few_shot
0
type value name
inst_level_strict_acc and prompt_level_strict_acc 79.41 averaged accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
BBH (3-Shot) SaylorTwift/bbh test
num_few_shot
3
type value name
acc_norm 32.22 normalized accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
MATH Lvl 5 (4-Shot) lighteval/MATH-Hard test
num_few_shot
4
type value name
exact_match 16.77 exact match
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type split args
GPQA (0-shot) Idavidrein/gpqa train
num_few_shot
0
type value name
acc_norm 7.61 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MuSR (0-shot) TAUR-Lab/MuSR
num_few_shot
0
type value name
acc_norm 7.92 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish-Instruct Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU-PRO (5-shot) TIGER-Lab/MMLU-Pro main test
num_few_shot
5
type value name
acc 31.97 accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish-Instruct Open LLM Leaderboard

Llama 3.1 Daredevilish Instruct

  • This model is an experimental Llama 3.1-based merge, inspired by the approach used in mlabonne/Daredevil-8B.
  • It combines top-performing Llama 3.1 8B models on the MMLU-Pro benchmark from the Open LLM Leaderboard as of January 21, 2025.
  • Its straightforward language makes it accessible and potentially valuable for everyday use.

Model Details

  • Architecture: Llama 3.1 (8.03B parameters)
  • Training: Merged from top MMLU-Pro models without additional finetuning
  • Release Date: January 21, 2025

Merge Configuration

The model was created using mergekit with the following merge configuration:

models:
  - model: DreadPoor/LemonP-8B-Model_Stock
    parameters:
      density: 0.6
      weight: 0.16
  - model: Youlln/1PARAMMYL-8B-ModelStock
    parameters:
      density: 0.6
      weight: 0.13
  - model: jaspionjader/f-2-8b
    parameters:
      density: 0.6
      weight: 0.10
  - model: Etherll/SuperHermes
    parameters:
      density: 0.6
      weight: 0.08
merge_method: dare_ties
base_model: meta-llama/Llama-3.1-8B-Instruct
dtype: bfloat16

Usage and Limitations

This experimental model is designed for research and development purposes. Users should be aware of potential biases and limitations inherent in language models. Always validate outputs and use the model responsibly.

Future Work

Further evaluation and fine-tuning may be necessary to optimize performance across various tasks. Researchers are encouraged to build upon this experimental merge to advance the capabilities of Llama-based models.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here! Summarized results can be found here!

Metric Value (%)
Average 29.32
IFEval (0-Shot) 79.41
BBH (3-Shot) 32.22
MATH Lvl 5 (4-Shot) 16.77
GPQA (0-shot) 7.61
MuSR (0-shot) 7.92
MMLU-PRO (5-shot) 31.97
Description
Model synced from source: agentlans/Llama3.1-Daredevilish-Instruct
Readme 82 KiB