Files

ModelHub XC dd57e2c95d 初始化项目，由ModelHub XC社区提供模型

Model: cnmoro/Mistral-7B-Portuguese
Source: Original Platform

2026-04-11 19:47:59 +08:00

5.1 KiB

Raw Blame History

language, datasets, model-index

language

datasets

model-index

cnmoro/WizardVicuna-PTBR-Instruct-Clean

name

results

Mistral-7B-Portuguese

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

ENEM Challenge (No Images)

eduagarcia/enem_challenge

train

num_few_shot
3

type	value	name
acc	58.08	accuracy

url	name
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese	Open Portuguese LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

BLUEX (No Images)

eduagarcia-temp/BLUEX_without_images

train

num_few_shot
3

type	value	name
acc	48.68	accuracy

url	name
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese	Open Portuguese LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

OAB Exams

eduagarcia/oab_exams

train

num_few_shot
3

type	value	name
acc	37.08	accuracy

url	name
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese	Open Portuguese LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

Assin2 RTE

assin2

test

num_few_shot
15

type	value	name
f1_macro	90.31	f1-macro

url	name
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese	Open Portuguese LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

Assin2 STS

eduagarcia/portuguese_benchmark

test

num_few_shot
15

type	value	name
pearson	76.55	pearson

url	name
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese	Open Portuguese LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

FaQuAD NLI

ruanchaves/faquad-nli

test

num_few_shot
15

type	value	name
f1_macro	58.84	f1-macro

url	name
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese	Open Portuguese LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HateBR Binary

ruanchaves/hatebr

test

num_few_shot
25

type	value	name
f1_macro	79.21	f1-macro

url	name
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese	Open Portuguese LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

PT Hate Speech Binary

hate_speech_portuguese

test

num_few_shot
25

type	value	name
f1_macro	68.87	f1-macro

url	name
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese	Open Portuguese LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

tweetSentBR

eduagarcia-temp/tweetsentbr

test

num_few_shot
25

type	value	name
f1_macro	64.71	f1-macro

url	name
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese	Open Portuguese LLM Leaderboard

This is a finetuned version of mistralai/Mistral-7B-Instruct-v0.2 using unsloth on a instruct portuguese dataset, as an attempt to improve the performance of the model on the language.

No benchmarks have been executed yet.

The original prompt format was used:

<s>[INST] {Prompt goes here} [/INST]

Open Portuguese LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Average	64.7
ENEM Challenge (No Images)	58.08
BLUEX (No Images)	48.68
OAB Exams	37.08
Assin2 RTE	90.31
Assin2 STS	76.55
FaQuAD NLI	58.84
HateBR Binary	79.21
PT Hate Speech Binary	68.87
tweetSentBR	64.71

5.1 KiB Raw Blame History

Open Portuguese LLM Leaderboard Evaluation Results

5.1 KiB

Raw Blame History