ModelHub XC 5dcf73b2aa 初始化项目,由ModelHub XC社区提供模型
Model: Polygl0t/Tucano2-qwen-0.5B-Base
Source: Original Platform
2026-05-31 00:50:30 +08:00

language, license, library_name, tags, datasets, metrics, pipeline_tag, widget, inference, co2_eq_emissions, model-index, base_model
language license library_name tags datasets metrics pipeline_tag widget inference co2_eq_emissions model-index base_model
pt
apache-2.0 transformers
text-generation-inference
Polygl0t/gigaverbo-v2
Polygl0t/gigaverbo-v2-synth
perplexity
text-generation
text example_title
A floresta da Amazônia é conhecida por sua Exemplo
text example_title
Uma das coisas que Portugal, Angola, Brasil e Moçambique tem em comum é o Exemplo
text example_title
O Carnaval do Rio de Janeiro é Exemplo
parameters
repetition_penalty temperature top_k top_p max_new_tokens
1.2 0.1 50 1.0 150
emissions source training_type geographical_location hardware_used
86000 CodeCarbon pre-training Germany NVIDIA A100-SXM4-80GB
name results
Tucano2-qwen-0.5B-Base
task dataset metrics source
type name
text-generation Text Generation
name type split args
ARC Challenge Polygl0t/ARC-poly test
num_few_shot
5
type value name
acc_norm 37.44 Acc-norm
url name
https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese arc_challenge_poly_pt
task dataset metrics source
type name
text-generation Text Generation
name type split args
HellaSwag Polygl0t/HellaSwag-poly validation
num_few_shot
5
type value name
acc_norm 48.43 Acc-norm
url name
https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese hellaswag_poly_pt
task dataset metrics source
type name
text-generation Text Generation
name type split args
Calame Polygl0t/CALAME-PT test
num_few_shot
5
type value name
acc 58.67 Acc
url name
https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese calame_pt
task dataset metrics source
type name
text-generation Text Generation
name type split args
Lambada Polygl0t/LAMBADA-poly test
num_few_shot
5
type value name
acc 45.14 Acc
url name
https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese lambada_poly_pt
task dataset metrics source
type name
text-generation Text Generation
name type split args
Global PIQA mrlbenchmarks/global-piqa-nonparallel test
num_few_shot
5
type value name
acc_norm 74 Acc-norm
url name
https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese global_piqa_completions_por_latn_braz
task dataset metrics source
type name
text-generation Text Generation
name type split args
MMLU Polygl0t/MMLU-poly test
num_few_shot
5
type value name
acc 39.68 Acc
url name
https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese mmlu_poly_pt
task dataset metrics source
type name
text-generation Text Generation
name type split args
BELEBELE facebook/belebele test
num_few_shot
5
type value name
acc_norm 53.89 Acc-norm
url name
https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese belebele_por_Latn
task dataset metrics source
type name
text-generation Text Generation
name type split args
BLUEX eduagarcia-temp/BLUEX_without_images train
num_few_shot
3
type value name
acc 46.87 Acc
url name
https://github.com/eduagarcia/lm-evaluation-harness-pt bluex
task dataset metrics source
type name
text-generation Text Generation
name type split args
ENEM Challenge eduagarcia/enem_challenge train
num_few_shot
3
type value name
acc 55.14 Acc
url name
https://github.com/eduagarcia/lm-evaluation-harness-pt enem_challenge
task dataset metrics source
type name
text-generation Text Generation
name type split args
OAB Exams eduagarcia/oab_exams train
num_few_shot
3
type value name
acc 40.36 Acc
url name
https://github.com/eduagarcia/lm-evaluation-harness-pt oab_exams
Qwen/Qwen3-0.6B-Base

Tucano2-qwen-0.5B-Base

An illustration of a Tucano bird showing vibrant colors like yellow, orange, blue, green, and black.

Model Summary

Tucano2-qwen-0.5B-Base is a decoder-only transformer continually pretrained from Qwen3-0.6B-Base. Tucano2 is part of the Polygl0t initiative, which aims to advance language models for low-resource languages.

Tucano2-qwen-0.5B-Base shares the same tokenizer as Tucano2-0.6B-Base. Token embedding transplantation via Orthogonal Matching Pursuit was used to adapt Qwen3-0.6B-Base to be more sensitive to the lexical, morphological, and orthographic properties of Portuguese.

The model was continually pretrained on approximately 50 billion tokens and achieves state-of-the-art performance across several benchmarks designed to evaluate Portuguese language models. All data, source code, and recipes used to develop the Tucano2 series are open and fully reproducible.

Details

  • Architecture: a Transformer-based model (qwen3)
  • Size: 490,799,104 parameters
  • Context length: 4,096 tokens
  • Dataset(s):
  • Language(s): Portuguese
  • Batch size: 1,048,576 tokens
  • Number of steps: 50,000
  • GPU: 8 NVIDIA A100-SXM4-80GB
  • Training time: ~ 59 hours
  • Emissions: 86 KgCO2 (Germany)
  • Total energy consumption: 225 kWh

This repository has the source code used to train this model. The full configuration used for training is available in the following config file:

Checkpoints

Checkpoints were saved every 2,500 steps, which equates to approximately 2.5 billion tokens. The main branch of this repository contains the final checkpoint saved at step 50000. All other checkpoints are available as separate branches. To load a specific checkpoint, you can use the following code snippet:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Polygl0t/Tucano2-qwen-0.5B-Base"
revision = "step-2500"  # Change this to the desired checkpoint branch
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, revision=revision)

Or, you can access all the revisions for the models via the following code snippet:

from huggingface_hub import list_repo_refs
out = list_repo_refs("Polygl0t/Tucano2-qwen-0.5B-Base")
branches = [b.name for b in out.branches]
print(branches)
Learning Curves

Learning Curves

This plot illustrates the evolution of model performance (measured by loss) as a function of training time, measured in tokens seen during training

Gradient Norms (L2)

Gradient Norms

This plot illustrates the evolution of gradient norms as a function of training time, measured in tokens seen during training.

Intended Uses

The primary intended use of Tucano2-qwen-0.5B-Base is to serve as a foundation for research and development involving Portuguese language modeling. Checkpoints saved during training are designed to provide a controlled setting for performing comparative experiments, specifically regarding the effects of continual pretraining on the performance of currently available benchmarks. You may also fine-tune and adapt Tucano2-qwen-0.5B-Base for deployment if your use follows the Apache 2.0 license. If you decide to use Tucano2-qwen-0.5B-Base as a basis for your fine-tuned model, please conduct your own risk and bias assessment.

Out-of-scope Use

  • Tucano2-qwen-0.5B-Base is not intended for deployment. It is not an out-of-the-box product and should not be used for human-facing interactions.
  • Tucano2-qwen-0.5B-Base is for the Portuguese language only and is unsuitable for text generation tasks in other languages.
  • Tucano2-qwen-0.5B-Base has not been fine-tuned for downstream tasks.

Basic usage

from transformers import GenerationConfig, TextGenerationPipeline, AutoTokenizer, AutoModelForCausalLM
import torch

# Specify the model and tokenizer
model_id = "Polygl0t/Tucano2-qwen-0.5B-Base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Specify the generation parameters as you like
generation_config = GenerationConfig(
    **{
    "do_sample": True,
    "max_new_tokens": 150,
    "renormalize_logits": True,
    "repetition_penalty": 1.2,
    "temperature": 0.1,
    "top_k": 50,
    "top_p": 1.0,
    "use_cache": True,
  }
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
generator = TextGenerationPipeline(model=model, task="text-generation", tokenizer=tokenizer, device=device)

# Generate text
prompt = "# A floresta da Amazônia: um lugar de Magia\n\n"
completion = generator(prompt, generation_config=generation_config)
print(completion[0]['generated_text'])

Limitations

As almost all other language models trained on large text datasets scraped from the web, the Tucano2-qwen-0.5B-Base shows behavior that does not make it an out-of-the-box solution to many real-world applications, especially those requiring factual, reliable, and nontoxic text generation. Tucano2-qwen-0.5B-Base is subject to the following:

  • Hallucinations: Tucano2-qwen-0.5B-Base can produce content that can be mistaken as facts, but is misleading or entirely false, i.e., hallucinations.

  • Biases and Toxicity: Tucano2-qwen-0.5B-Base inherits the social and historical stereotypes from the data used to train it. Given these biases, the model can produce toxic content, i.e., harmful, offensive, or detrimental to individuals, groups, or communities.

  • Language Limitations: Tucano2-qwen-0.5B-Base is primarily designed to interact with Portuguese. Other languages might challenge its comprehension, leading to potential misinterpretations or errors in response.

  • Repetition and Verbosity: Tucano2-qwen-0.5B-Base may get stuck on repetition loops (especially if the repetition penalty during generations is set to a meager value) or produce verbose responses unrelated to the prompt it was given.

Hence, even though Tucano2-qwen-0.5B-Base is released under a permissive license, we urge users to perform their own risk analysis before using it for real-world applications.

Evaluations

The table below compares the Tucano2 series against other base models of similar size. We divide our evaluations into two sets:

  • Easy Set: CALAME, GlobalPIQA, LAMBADA, ARC-Challenge, HellaSwag
  • Hard Set: ENEM, BLUEX, OAB Exams, BELEBELE, MMLU

The NPM (Normalized Performance Metric) provides a balanced view of model performance across tasks, accounting for each task's inherent difficulty by normalizing its evaluation score relative to its random baseline.

Total Avg. Easy Set (NPM) Hard Set (NPM)
Tucano2-qwen-3.7B-Base 59.21 57.41 61
Qwen2.5-7B 57.97 54.12 61.83
Qwen3-4B-Base 57.86 52.52 63.2
SmolLM3-3B-Base 50.25 54.06 46.44
Qwen2.5-3B 50.16 47.69 52.62
Tucano2-qwen-1.5B-Base 47.9 47.97 47.82
Curio-edu-7b 45.66 57.46 33.87
Qwen3-1.7B-Base 44.48 40.94 48.03
Curio-7b 42.79 58.97 26.6
Llama-3.2-3B 40.5 43.79 37.21
granite-3.3-2b-base 39.97 45.31 34.63
Tucano2-qwen-0.5B-Base 35.36 39.93 30.79
Qwen3-0.6B-Base 29.4 26.41 32.38
Llama-2-7b-hf 29.36 42.69 16.03
Tucano2-0.6B-Base 20.64 40.28 0.99
Qwen2.5-0.5B 19.89 18.7 21.09
Curio-1.1b 19.23 39.16 -0.69
Tucano-2b4 17.88 33.55 2.2
Curio-edu-1b1 17.72 34.77 0.67
Llama-3.2-1B 16.57 28.32 4.83
Tucano-1b1 15.44 29.12 1.76
Tucano-630m 14.9 26.99 2.8
Carvalho_pt-gl-1.3B 12.54 26.75 -1.66
TeenyTinyLlama-460m 11.18 19.65 2.72
Tucano-160m 8.78 19.12 -1.56
TeenyTinyLlama-160m 7.72 15.75 -0.31
GlorIA-1.3B 5.93 27.27 -15.42
Evaluation Suite
Benchmark n-shot Type Baseline Metric
Easy Set
CALAME 5-shot Completion 0 acc
GlobalPIQA 5-shot Completion 50 acc_norm
LAMBADA 5-shot Completion 0 acc
ARC-Challenge 5-shot MC-Q&A 25 acc_norm
HellaSwag 5-shot Completion 25 acc_norm
Hard Set
ENEM           3-shot     MC-Q&A             20           acc
BLUEX           3-shot     MC-Q&A             22.5         acc
OAB Exams       3-shot     MC-Q&A             25           acc
BELEBELE 5-shot MC-Q&A 25 acc_norm
MMLU 5-shot MC-Q&A 25 acc
Individual Benchmarks
BLUEX ENEM OAB ARC Challenge BELEBELE CALAME Global PIQA HellaSwag LAMBADA MMLU
Tucano2-qwen-3.7B-Base 66.2 77.54 58.45 57.78 83.67 61.08 83 65.32 62.53 65.4
Qwen2.5-7B 65.92 75.02 55.03 54.19 89.67 58.96 78 67.92 59.52 68.55
Qwen3-4B-Base 69.96 77.61 55.58 54.53 87.89 57.95 77 63.19 60.37 68.59
SmolLM3-3B-Base 54.52 61.37 45.51 51.37 77.67 59.15 81 65.57 59.89 56.19
Qwen2.5-3B 58.28 67.32 50.34 45.21 83.22 58.38 75 59.44 57.17 59.79
Tucano2-qwen-1.5B-Base 55.91 68.72 48.29 48.21 74 59.06 77 56.25 54.2 54.04
Curio-edu-7b 47.15 58.64 43.78 50.94 53 60.79 86 66.48 64.62 45.14
Qwen3-1.7B-Base 57.16 65.22 45.79 47.18 77.89 53.56 67 52.55 50.81 55.49
Curio-7b 43.39 50.59 39.68 48.03 45.33 63.44 89 67.58 65.94 40.83
Llama-3.2-3B 50.35 53.04 39.45 41.11 68.89 54.48 69 59.14 59.48 48.28
granite-3.3-2b-base 45.34 54.02 39.54 41.37 65.67 58.77 70 60.81 58.22 45.63
Tucano2-qwen-0.5B-Base 46.87 55.14 40.36 37.44 53.89 58.67 74 48.43 45.14 39.68
Qwen3-0.6B-Base 42.98 49.48 40.46 36.92 65 45.95 54 40.33 41.78 43.54
Llama-2-7b-hf 31.29 31.77 35.49 42.14 41.44 54.53 67 56.76 59.73 38.64
Tucano2-0.6B-Base 21.14 23.58 23.28 37.01 26.22 57.61 79 47.74 39.45 27.18
Qwen2.5-0.5B 32.55 38.91 35.9 28.46 49.56 44.89 44 37.7 39.08 41.17
Curio-1.1b 21.56 21.06 23.1 30.43 22.89 59.25 75 49.45 46.69 26.35
Tucano-2b4 25.45 21.62 26.74 30.43 25.89 50.34 73 48.85 32.39 26.24
Curio-edu-1b1 23.5 19.87 25.01 32.22 26.22 54.91 69 46.3 42.93 25.43
Llama-3.2-1B 24.06 23.93 26.06 31.71 33.33 50 55 45.27 45.6 28.51
Tucano-1b1 25.45 21.55 26.38 30.09 25.67 48.94 68 44.1 28.43 25.26
Tucano-630m 26.7 21.69 26.92 28.72 27.33 47.3 68 40.37 26.2 25.6
Carvalho_pt-gl-1.3B 19.33 18.12 22.32 27.01 26.44 53.42 63 38.53 33.59 24.82
TeenyTinyLlama-460m 25.87 20.15 27.02 27.35 28.11 42.49 59 34.81 21.56 26.65
Tucano-160m 24.76 20.57 17.22 25.56 23.44 43.59 59 33.73 21.64 25.77
TeenyTinyLlama-160m 22.53 18.89 22.32 24.02 26.78 39.79 58 29.89 17.74 25.74
GlorIA-1.3B 4.31 2.52 4.69 26.41 22.78 54.67 64 36.35 36.68 23.69

Performance and Compute

Below, we display the performance of Tucano2-qwen-0.5B-Base across all benchmarks in our evaluation suite. Tucano2-qwen-0.5B-Base is compared with Qwen3-0.6B-Base, the base model from which they were continually pretrained. The percentage variation in performance is displayed in terms of the difference in evaluation scores between the Base and the Continually Pretrained model.

All individual benchmark scores and their evolution across training time can be found in the .plots folder.

Before and After Continual Pretraining

Performance Before and After Continual Pretraining

This plot compares the compute requirements (measured as C = 6 * N * D, where N is the number of parameters and D is the number of tokens processed) against the performance of each model (measured by the NPM score).

NPM vs Compute

Performance and Compute Details
Parameters (B) Pretraining Tokens (B) Continual Pretraining Tokens (B) Total Tokens (B) Pretraining Compute (FLOPs) Continual Pretraining Compute (FLOPs) Total Compute (FLOPs) NPM Score
Tucano2-qwen-3.7B-Base 3.7 36000 50 36050 8.64e+23 1.11e+21 8.65e+23 59.2
Qwen2.5-7B 7 18000 - 18000 7.56e+23 - 7.56e+23 57.97
Qwen3-4B-Base 4 36000 - 36000 8.64e+23 - 8.64e+23 57.86
SmolLM3-3B-Base 3 11200 - 11200 2.02e+23 - 2.02e+23 50.25
Qwen2.5-3B 3 18000 - 18000 3.24e+23 - 3.24e+23 50.15
Tucano2-qwen-1.5B-Base 1.5 36000 100 36100 3.67e+23 9e+20 3.68e+23 47.89
Curio-edu-7b 7 2000 20 2020 8.4e+22 8.4e+20 8.48e+22 45.66
Qwen3-1.7B-Base 1.7 36000 - 36000 3.67e+23 - 3.67e+23 44.48
Curio-7b 7 2000 150 2150 8.4e+22 6.3e+21 9.03e+22 42.78
Llama-3.2-3B 3 9000 - 9000 1.62e+23 - 1.62e+23 40.5
granite-3.3-2b-base 2 12000 - 12000 1.44e+23 - 1.44e+23 39.96
Tucano2-qwen-0.5B-Base 0.5 36000 50 36050 1.3e+23 1.5e+20 1.3e+23 35.35
Qwen3-0.6B-Base 0.6 36000 - 36000 1.3e+23 - 1.3e+23 29.39
Llama-2-7b-hf 7 2000 - 2000 8.4e+22 - 8.4e+22 29.36
Tucano2-0.6B-Base 0.6 408 - 408 1.47e+21 - 1.47e+21 20.63
Qwen2.5-0.5B 0.5 18000 - 18000 5.4e+22 - 5.4e+22 19.89
Curio-1.1b 1.1 1000 150 1150 6.6e+21 9.9e+20 7.59e+21 19.23
Tucano-2b4 2.4 515 - 515 7.42e+21 - 7.42e+21 17.87
Curio-edu-1b1 1.1 1000 20 1020 6.6e+21 1.32e+20 6.73e+21 17.72
Llama-3.2-1B 1 9000 - 9000 5.4e+22 - 5.4e+22 16.57
Tucano-1b1 1.1 250 - 250 1.65e+21 - 1.65e+21 15.44
Tucano-630m 0.63 211 - 211 7.98e+20 - 7.98e+20 14.89
Carvalho_pt-gl-1.3B 1.3 26 5 31 2.03e+20 3.9e+19 2.42e+20 12.54
TeenyTinyLlama-460m 0.46 6.2 - 6.2 1.71e+19 - 1.71e+19 11.18
Tucano-160m 0.16 169 - 169 1.62e+20 - 1.62e+20 8.78
TeenyTinyLlama-160m 0.16 6.2 - 6.2 5.95e+18 - 5.95e+18 7.71
GlorIA-1.3B 1.3 35 - 35 2.73e+20 - 2.73e+20 5.92

Cite as 🤗

@misc{correa2026tucano2cool,
      title={{Tucano 2 Cool: Better Open Source LLMs for Portuguese}}, 
      author={Nicholas Kluge Corr{\^e}a and Aniket Sen and Shiza Fatimah and Sophia Falk and Lennard Landgraf and Julia Kastner and Lucie Flek},
      year={2026},
      eprint={2603.03543},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.03543}, 
}

Aknowlegments

Polyglot is a project funded by the Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MWK) as part of TRA Sustainable Futures (University of Bonn) and the Excellence Strategy of the federal and state governments.

We also gratefully acknowledge the granted access to the Marvin cluster hosted by University of Bonn along with the support provided by its High Performance Computing & Analytics Lab.

License

Tucano2-qwen-0.5B-Base is licensed under the Apache License, Version 2.0. For more details, see the LICENSE file.

Description
Model synced from source: Polygl0t/Tucano2-qwen-0.5B-Base
Readme 1.2 MiB
Languages
CSV 100%