license, license_name, license_link, base_model, tags, library_name, extra_gated_description
license license_name license_link base_model tags library_name extra_gated_description
llama3.1 llama3.1 https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/LICENSE meta-llama/Llama-3.1-8B
quantization
ternary
balanced-ternary
tritllm
llama
llama-3.1
transformers This model is a quantized derivative of Meta Llama 3.1. By accessing this model you agree to the Llama 3.1 Community License and the Meta Acceptable Use Policy.

Llama-3.1-8B-trit-uniform-d4

Built with Llama. Balanced ternary quantization of meta-llama/Llama-3.1-8B at depth d=4 (81 levels per weight, 6.64 bits per weight). Distributed under the Llama 3.1 Community License Agreement and subject to Meta's Acceptable Use Policy.

Produced with the codec from "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026). See Entrit/tritllm-codec for the codec source.

Quick load

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Entrit/Llama-3.1-8B-trit-uniform-d4")
tokenizer = AutoTokenizer.from_pretrained("Entrit/Llama-3.1-8B-trit-uniform-d4")

The weights are dequantized to FP16 for stock-transformers compatibility. The on-disk size is therefore the same as the FP16 source. The 6.64-bpw figure refers to the information content of the quantized matrices and is what matters for inference on hardware that consumes the packed trit format directly (see Entrit/tritllm-kernel).

Quantization details

Field Value
Source model meta-llama/Llama-3.1-8B
Depth d=4 (81 levels)
Bits per weight 6.64
Group size 16
Scale codebook 27-entry log-spaced (scale_depth=3)
Method Uniform PTQ
Quantized layers all 2D linear matrices
Kept FP16 lm_head, token embeddings, all *_norm layers
Codec tritllm v2

License and use

This is a research artifact. The underlying weights remain governed by the Llama 3.1 Community License Agreement; commercial use is restricted to the terms of that license. By using this model you agree to:

  1. Comply with the Llama 3.1 Community License.
  2. Comply with Meta's Acceptable Use Policy.
  3. Display "Built with Llama" attribution if you redistribute or publicly demo derivatives of this model.

Citation

@article{stentzel2026ternaryptq,
  title  = {Balanced Ternary Post-Training Quantization for Large Language Models},
  author = {Stentzel, Eric},
  year   = 2026,
  note   = {Entrit Systems}
}

Reproducibility

git clone https://huggingface.co/Entrit/tritllm-codec
cd tritllm-codec
python quantize_model_v2.py --model meta-llama/Llama-3.1-8B --configs uniform-d4 --out ./out
Description
Model synced from source: Entrit/Llama-3.1-8B-trit-uniform-d4
Readme 16 MiB