athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit

Files

ModelHub XC f1073992df 初始化项目，由ModelHub XC社区提供模型

Model: athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit
Source: Original Platform

2026-06-01 11:47:18 +08:00

4.2 KiB

Raw Blame History

language, license, tags, base_model, model-index

language

license

tags

base_model

model-index

apache-2.0

text-generation-inference

transformers

unsloth

llama

trl

sft

athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1

name

results

Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

IFEval (0-Shot)

HuggingFaceH4/ifeval

num_few_shot
0

type	value	name
inst_level_strict_acc and prompt_level_strict_acc	45.21	strict accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

BBH (3-Shot)

BBH

num_few_shot
3

type	value	name
acc_norm	28.02	normalized accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

MATH Lvl 5 (4-Shot)

hendrycks/competition_math

num_few_shot
4

type	value	name
exact_match	8.84	exact match

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

GPQA (0-shot)

Idavidrein/gpqa

num_few_shot
0

type	value	name
acc_norm	5.59	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

args

MuSR (0-shot)

TAUR-Lab/MuSR

num_few_shot
0

type	value	name
acc_norm	8.3	acc_norm

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU-PRO (5-shot)

TIGER-Lab/MMLU-Pro

main

test

num_few_shot
5

type	value	name
acc	28.5	accuracy

url	name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit	Open LLM Leaderboard

athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1 further pretrained on 1 epoch of the dirty stories from nothingiisreal/Reddit-Dirty-And-WritingPrompts, with all scores below 2 dropped.

Why do this? I have a niche use case where I cannot increase compute over 8b, and L3/3.1 are the only models in this size category that meet my needs for logic. However, both versions of L3/3.1 have the damn repetition/token overconfidence problem, and this is meant to disrupt that certainty without disrupting the model's ability to function.

By the way, I think it's the lm_head that is causing the looping, but it might be the embeddings being too separated. I'm not going to pay two more times to test them separately, however :p

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	20.74
IFEval (0-Shot)	45.21
BBH (3-Shot)	28.02
MATH Lvl 5 (4-Shot)	8.84
GPQA (0-shot)	5.59
MuSR (0-shot)	8.30
MMLU-PRO (5-shot)	28.50

4.2 KiB Raw Blame History

Open LLM Leaderboard Evaluation Results

4.2 KiB

Raw Blame History