language, license, tags, base_model, datasets, model-index, new_version
language license tags base_model datasets model-index new_version
en
llama3.2
text-generation-inference
transformers
llama
trl
sft
reasoning
llama-3
chuanli11/Llama-3.2-3B-Instruct-uncensored
KingNish/reasoning-base-20k
lunahr/thea-name-overrides
name results
thea-3b-25r
task dataset metrics source
type name
text-generation Text Generation
name type args
IFEval (0-Shot) HuggingFaceH4/ifeval
num_few_shot
0
type value name
inst_level_strict_acc and prompt_level_strict_acc 73.44 strict accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=lunahr/thea-3b-25r Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
BBH (3-Shot) BBH
num_few_shot
3
type value name
acc_norm 22.55 normalized accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=lunahr/thea-3b-25r Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MATH Lvl 5 (4-Shot) hendrycks/competition_math
num_few_shot
4
type value name
exact_match 16.31 exact match
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=lunahr/thea-3b-25r Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
GPQA (0-shot) Idavidrein/gpqa
num_few_shot
0
type value name
acc_norm 2.35 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=lunahr/thea-3b-25r Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type args
MuSR (0-shot) TAUR-Lab/MuSR
num_few_shot
0
type value name
acc_norm 3.57 acc_norm
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=lunahr/thea-3b-25r Open LLM Leaderboard
task dataset metrics source
type name
text-generation Text Generation
name type config split args
MMLU-PRO (5-shot) TIGER-Lab/MMLU-Pro main test
num_few_shot
5
type value name
acc 24.25 accuracy
url name
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=lunahr/thea-3b-25r Open LLM Leaderboard
lunahr/thea-3b-50r-u1

Model Description

An uncensored reasoning Llama 3.2 3B model trained on reasoning data.

It has been trained using improved training code, and gives an improved performance. Here is what inference code you should use:

from transformers import AutoModelForCausalLM, AutoTokenizer

MAX_REASONING_TOKENS = 1024
MAX_RESPONSE_TOKENS = 512

model_name = "lunahr/thea-3b-25r"

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Which is greater 9.9 or 9.11 ??"
messages = [
    {"role": "user", "content": prompt}
]

# Generate reasoning
reasoning_template = tokenizer.apply_chat_template(messages, tokenize=False, add_reasoning_prompt=True)
reasoning_inputs = tokenizer(reasoning_template, return_tensors="pt").to(model.device)
reasoning_ids = model.generate(**reasoning_inputs, max_new_tokens=MAX_REASONING_TOKENS)
reasoning_output = tokenizer.decode(reasoning_ids[0, reasoning_inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("REASONING: " + reasoning_output)

# Generate answer
messages.append({"role": "reasoning", "content": reasoning_output})
response_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response_inputs = tokenizer(response_template, return_tensors="pt").to(model.device)
response_ids = model.generate(**response_inputs, max_new_tokens=MAX_RESPONSE_TOKENS)
response_output = tokenizer.decode(response_ids[0, response_inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("ANSWER: " + response_output)

This Llama model was trained faster than Unsloth using custom training code.

Visit https://www.kaggle.com/code/piotr25691/distributed-llama-training-with-2xt4 to find out how you can finetune your models using BOTH of the Kaggle provided GPUs.

Description
Model synced from source: lunahr/thea-3b-25r
Readme 31 KiB