A compact 0.35B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks.
Model Details
Model
Kai-0.35B-Instruct
Architecture
LlamaForCausalLM
Parameters
360M
Hidden size
960
Layers
32
Attention heads
15 (5 KV heads, GQA)
Context length
8192
Precision
bfloat16
Vocab size
49,152
Benchmark Results (5-shot, log-likelihood)
Benchmark
Kai-0.35B-Instruct
Mamba (370M)
TinyLlama (1.1B)
Llama-3.2 (1B)
ARC-Challenge (science reasoning)
37.80%
~29.1%
~30.1%
~44.5%
HellaSwag (sentence completion)
55.88%
~53.8%
~59.2%
~61.1%
PIQA (physical commonsense)
71.82%
~69.6%
~73.0%
~74.5%
Code Generation — MBPP (3-shot, pass@1)
Model
Params
MBPP pass@1
Mamba / Mamba-2
370M
<10.0%
TinyLlama
1.1B
~19.91%
Kai-0.35B-Instruct
360M
22.20%
Llama-3.2-1B (Base)
1.0B
~25-30%
Llama-3.2-1B-Instruct
1.0B
~49.0%
Key Observations
ARC-Challenge: Kai-0.35B scores 37.80% (5-shot), significantly outperforming both Mamba-370M (+8.7pp) and TinyLlama-1.1B (+7.7pp) — a model 3x its size.
PIQA: At 71.82%, Kai-0.35B nearly matches TinyLlama-1.1B (73.0%) with only 1/3 the parameters, and trails the 1B-class Llama-3.2 by less than 3pp.
MBPP: At 22.20% pass@1, Kai-0.35B surpasses TinyLlama-1.1B (~19.91%) in code generation despite being 3x smaller.
Usage
fromtransformersimportAutoModelForCausalLM,AutoTokenizerimporttorchmodel=AutoModelForCausalLM.from_pretrained("NoesisLab/Kai-0.35B-Instruct",torch_dtype=torch.bfloat16,)tokenizer=AutoTokenizer.from_pretrained("NoesisLab/Kai-0.35B-Instruct")messages=[{"role":"user","content":"What is 25 * 4?"}]input_ids=tokenizer.apply_chat_template(messages,return_tensors="pt")output=model.generate(input_ids,max_new_tokens=256)print(tokenizer.decode(output[0],skip_special_tokens=True))