284 lines
8.2 KiB
Markdown
284 lines
8.2 KiB
Markdown
|
|
---
|
||
|
|
language:
|
||
|
|
- pt
|
||
|
|
model-index:
|
||
|
|
- name: sabia-7b
|
||
|
|
results:
|
||
|
|
- task:
|
||
|
|
type: text-generation
|
||
|
|
name: Text Generation
|
||
|
|
dataset:
|
||
|
|
name: ENEM Challenge (No Images)
|
||
|
|
type: eduagarcia/enem_challenge
|
||
|
|
split: train
|
||
|
|
args:
|
||
|
|
num_few_shot: 3
|
||
|
|
metrics:
|
||
|
|
- type: acc
|
||
|
|
value: 55.07
|
||
|
|
name: accuracy
|
||
|
|
source:
|
||
|
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
|
||
|
|
name: Open Portuguese LLM Leaderboard
|
||
|
|
- task:
|
||
|
|
type: text-generation
|
||
|
|
name: Text Generation
|
||
|
|
dataset:
|
||
|
|
name: BLUEX (No Images)
|
||
|
|
type: eduagarcia-temp/BLUEX_without_images
|
||
|
|
split: train
|
||
|
|
args:
|
||
|
|
num_few_shot: 3
|
||
|
|
metrics:
|
||
|
|
- type: acc
|
||
|
|
value: 47.71
|
||
|
|
name: accuracy
|
||
|
|
source:
|
||
|
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
|
||
|
|
name: Open Portuguese LLM Leaderboard
|
||
|
|
- task:
|
||
|
|
type: text-generation
|
||
|
|
name: Text Generation
|
||
|
|
dataset:
|
||
|
|
name: OAB Exams
|
||
|
|
type: eduagarcia/oab_exams
|
||
|
|
split: train
|
||
|
|
args:
|
||
|
|
num_few_shot: 3
|
||
|
|
metrics:
|
||
|
|
- type: acc
|
||
|
|
value: 41.41
|
||
|
|
name: accuracy
|
||
|
|
source:
|
||
|
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
|
||
|
|
name: Open Portuguese LLM Leaderboard
|
||
|
|
- task:
|
||
|
|
type: text-generation
|
||
|
|
name: Text Generation
|
||
|
|
dataset:
|
||
|
|
name: Assin2 RTE
|
||
|
|
type: assin2
|
||
|
|
split: test
|
||
|
|
args:
|
||
|
|
num_few_shot: 15
|
||
|
|
metrics:
|
||
|
|
- type: f1_macro
|
||
|
|
value: 46.68
|
||
|
|
name: f1-macro
|
||
|
|
source:
|
||
|
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
|
||
|
|
name: Open Portuguese LLM Leaderboard
|
||
|
|
- task:
|
||
|
|
type: text-generation
|
||
|
|
name: Text Generation
|
||
|
|
dataset:
|
||
|
|
name: Assin2 STS
|
||
|
|
type: eduagarcia/portuguese_benchmark
|
||
|
|
split: test
|
||
|
|
args:
|
||
|
|
num_few_shot: 15
|
||
|
|
metrics:
|
||
|
|
- type: pearson
|
||
|
|
value: 1.89
|
||
|
|
name: pearson
|
||
|
|
source:
|
||
|
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
|
||
|
|
name: Open Portuguese LLM Leaderboard
|
||
|
|
- task:
|
||
|
|
type: text-generation
|
||
|
|
name: Text Generation
|
||
|
|
dataset:
|
||
|
|
name: FaQuAD NLI
|
||
|
|
type: ruanchaves/faquad-nli
|
||
|
|
split: test
|
||
|
|
args:
|
||
|
|
num_few_shot: 15
|
||
|
|
metrics:
|
||
|
|
- type: f1_macro
|
||
|
|
value: 58.34
|
||
|
|
name: f1-macro
|
||
|
|
source:
|
||
|
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
|
||
|
|
name: Open Portuguese LLM Leaderboard
|
||
|
|
- task:
|
||
|
|
type: text-generation
|
||
|
|
name: Text Generation
|
||
|
|
dataset:
|
||
|
|
name: HateBR Binary
|
||
|
|
type: ruanchaves/hatebr
|
||
|
|
split: test
|
||
|
|
args:
|
||
|
|
num_few_shot: 25
|
||
|
|
metrics:
|
||
|
|
- type: f1_macro
|
||
|
|
value: 61.93
|
||
|
|
name: f1-macro
|
||
|
|
source:
|
||
|
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
|
||
|
|
name: Open Portuguese LLM Leaderboard
|
||
|
|
- task:
|
||
|
|
type: text-generation
|
||
|
|
name: Text Generation
|
||
|
|
dataset:
|
||
|
|
name: PT Hate Speech Binary
|
||
|
|
type: hate_speech_portuguese
|
||
|
|
split: test
|
||
|
|
args:
|
||
|
|
num_few_shot: 25
|
||
|
|
metrics:
|
||
|
|
- type: f1_macro
|
||
|
|
value: 64.13
|
||
|
|
name: f1-macro
|
||
|
|
source:
|
||
|
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
|
||
|
|
name: Open Portuguese LLM Leaderboard
|
||
|
|
- task:
|
||
|
|
type: text-generation
|
||
|
|
name: Text Generation
|
||
|
|
dataset:
|
||
|
|
name: tweetSentBR
|
||
|
|
type: eduagarcia-temp/tweetsentbr
|
||
|
|
split: test
|
||
|
|
args:
|
||
|
|
num_few_shot: 25
|
||
|
|
metrics:
|
||
|
|
- type: f1_macro
|
||
|
|
value: 46.64
|
||
|
|
name: f1-macro
|
||
|
|
source:
|
||
|
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
|
||
|
|
name: Open Portuguese LLM Leaderboard
|
||
|
|
---
|
||
|
|
|
||
|
|
Sabiá-7B is Portuguese language model developed by [Maritaca AI](https://www.maritaca.ai/).
|
||
|
|
|
||
|
|
**Input:** The model accepts only text input.
|
||
|
|
|
||
|
|
**Output:** The Model generates text only.
|
||
|
|
|
||
|
|
**Model Architecture:** Sabiá-7B is an auto-regressive language model that uses the same architecture of LLaMA-1-7B.
|
||
|
|
|
||
|
|
**Tokenizer:** It uses the same tokenizer as LLaMA-1-7B.
|
||
|
|
|
||
|
|
**Maximum sequence length:** 2048 tokens.
|
||
|
|
|
||
|
|
**Pretraining data:** The model was pretrained on 7 billion tokens from the Portuguese subset of ClueWeb22, starting with the weights of LLaMA-1-7B and further trained for an additional 10 billion tokens, approximately 1.4 epochs of the training dataset.
|
||
|
|
|
||
|
|
**Data Freshness:** The pretraining data has a cutoff of mid-2022.
|
||
|
|
|
||
|
|
**License:** The licensing is the same as LLaMA-1's, restricting the model's use to research purposes only.
|
||
|
|
|
||
|
|
**Paper:** For more details, please refer to our paper: [Sabiá: Portuguese Large Language Models](https://arxiv.org/pdf/2304.07880.pdf)
|
||
|
|
|
||
|
|
|
||
|
|
## Few-shot Example
|
||
|
|
|
||
|
|
Given that Sabiá-7B was trained solely on a language modeling objective without fine-tuning for instruction following, it is recommended for few-shot tasks rather than zero-shot tasks, like in the example below.
|
||
|
|
|
||
|
|
```python
|
||
|
|
import torch
|
||
|
|
from transformers import LlamaTokenizer, LlamaForCausalLM
|
||
|
|
|
||
|
|
tokenizer = LlamaTokenizer.from_pretrained("maritaca-ai/sabia-7b")
|
||
|
|
model = LlamaForCausalLM.from_pretrained(
|
||
|
|
"maritaca-ai/sabia-7b",
|
||
|
|
device_map="auto", # Automatically loads the model in the GPU, if there is one. Requires pip install acelerate
|
||
|
|
low_cpu_mem_usage=True,
|
||
|
|
torch_dtype=torch.bfloat16 # If your GPU does not support bfloat16, change to torch.float16
|
||
|
|
)
|
||
|
|
|
||
|
|
prompt = """Classifique a resenha de filme como "positiva" ou "negativa".
|
||
|
|
|
||
|
|
Resenha: Gostei muito do filme, é o melhor do ano!
|
||
|
|
Classe: positiva
|
||
|
|
|
||
|
|
Resenha: O filme deixa muito a desejar.
|
||
|
|
Classe: negativa
|
||
|
|
|
||
|
|
Resenha: Apesar de longo, valeu o ingresso.
|
||
|
|
Classe:"""
|
||
|
|
|
||
|
|
input_ids = tokenizer(prompt, return_tensors="pt")
|
||
|
|
|
||
|
|
output = model.generate(
|
||
|
|
input_ids["input_ids"].to("cuda"),
|
||
|
|
max_length=1024,
|
||
|
|
eos_token_id=tokenizer.encode("\n")) # Stop generation when a "\n" token is dectected
|
||
|
|
|
||
|
|
# The output contains the input tokens, so we have to skip them.
|
||
|
|
output = output[0][len(input_ids["input_ids"][0]):]
|
||
|
|
|
||
|
|
print(tokenizer.decode(output, skip_special_tokens=True))
|
||
|
|
```
|
||
|
|
|
||
|
|
If your GPU does not have enough RAM, try using int8 precision.
|
||
|
|
However, expect some degradation in the model output quality when compared to fp16 or bf16.
|
||
|
|
```python
|
||
|
|
model = LlamaForCausalLM.from_pretrained(
|
||
|
|
"maritaca-ai/sabia-7b",
|
||
|
|
device_map="auto",
|
||
|
|
low_cpu_mem_usage=True,
|
||
|
|
load_in_8bit=True, # Requires pip install bitsandbytes
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Results in Portuguese
|
||
|
|
|
||
|
|
Below we show the results on the Poeta benchmark, which consists of 14 Portuguese datasets.
|
||
|
|
|
||
|
|
For more information on the Normalized Preferred Metric (NPM), please refer to our paper.
|
||
|
|
|
||
|
|
|Model | NPM |
|
||
|
|
|--|--|
|
||
|
|
|LLaMA-1-7B| 33.0|
|
||
|
|
|LLaMA-2-7B| 43.7|
|
||
|
|
|Sabiá-7B| 48.5|
|
||
|
|
|
||
|
|
## Results in English
|
||
|
|
|
||
|
|
Below we show the average results on 6 English datasets: PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, and OpenBookQA.
|
||
|
|
|
||
|
|
|Model | NPM |
|
||
|
|
|--|--|
|
||
|
|
|LLaMA-1-7B| 50.1|
|
||
|
|
|Sabiá-7B| 49.0|
|
||
|
|
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
|
||
|
|
Please use the following bibtex to cite our paper:
|
||
|
|
```
|
||
|
|
@InProceedings{10.1007/978-3-031-45392-2_15,
|
||
|
|
author="Pires, Ramon
|
||
|
|
and Abonizio, Hugo
|
||
|
|
and Almeida, Thales Sales
|
||
|
|
and Nogueira, Rodrigo",
|
||
|
|
editor="Naldi, Murilo C.
|
||
|
|
and Bianchi, Reinaldo A. C.",
|
||
|
|
title="Sabi{\'a}: Portuguese Large Language Models",
|
||
|
|
booktitle="Intelligent Systems",
|
||
|
|
year="2023",
|
||
|
|
publisher="Springer Nature Switzerland",
|
||
|
|
address="Cham",
|
||
|
|
pages="226--240",
|
||
|
|
isbn="978-3-031-45392-2"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
# [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
|
||
|
|
Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/maritaca-ai/sabia-7b)
|
||
|
|
|
||
|
|
| Metric | Value |
|
||
|
|
|--------------------------|---------|
|
||
|
|
|Average |**47.09**|
|
||
|
|
|ENEM Challenge (No Images)| 55.07|
|
||
|
|
|BLUEX (No Images) | 47.71|
|
||
|
|
|OAB Exams | 41.41|
|
||
|
|
|Assin2 RTE | 46.68|
|
||
|
|
|Assin2 STS | 1.89|
|
||
|
|
|FaQuAD NLI | 58.34|
|
||
|
|
|HateBR Binary | 61.93|
|
||
|
|
|PT Hate Speech Binary | 64.13|
|
||
|
|
|tweetSentBR | 46.64|
|
||
|
|
|