Will perform strided perplexity calculation -> adjusting context size from 3072 to 3264 llama_model_loader: loaded meta data with 39 key-value pairs and 363 tensors from granite-4.1-8b-Q2_K.gguf (version GGUF V3 (latest)) llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q2_K: 35 tensors llama_model_loader: - type q3_K: 1 tensors llama_model_loader: - type iq2_xxs: 41 tensors llama_model_loader: - type iq2_xs: 56 tensors llama_model_loader: - type iq3_xxs: 6 tensors llama_model_loader: - type iq1_s: 2 tensors llama_model_loader: - type iq3_s: 38 tensors llama_model_loader: - type iq2_s: 99 tensors llama_model_loader: - type iq1_m: 4 tensors print_info: file format = GGUF V3 (latest) print_info: file type = IQ2_S - 2.5 bpw print_info: file size = 2.56 GiB (2.50 BPW) ====== Perplexity statistics ====== Mean PPL(Q) : 12.534216 ± 0.095606 Mean PPL(base) : 8.691178 ± 0.065443 Cor(ln(PPL(Q)), ln(PPL(base))): 86.12% Mean ln(PPL(Q)/PPL(base)) : 0.366154 ± 0.003993 Mean PPL(Q)/PPL(base) : 1.442177 ± 0.005759 Mean PPL(Q)-PPL(base) : 3.843038 ± 0.051440 ====== KL divergence statistics ====== Mean KLD: 0.644965 ± 0.002755 Maximum KLD: 18.493736 99.9% KLD: 10.059598 99.0% KLD: 5.407475 95.0% KLD: 2.364230 90.0% KLD: 1.477545 Median KLD: 0.339213 10.0% KLD: 0.010624 5.0% KLD: 0.002467 1.0% KLD: 0.000277 0.1% KLD: 0.000030 Minimum KLD: -0.000000 ====== Token probability statistics ====== Mean Δp: -7.810 ± 0.058 % Maximum Δp: 99.884% 99.9% Δp: 73.625% 99.0% Δp: 40.369% 95.0% Δp: 16.588% 90.0% Δp: 7.372% 75.0% Δp: 0.201% Median Δp: -0.916% 25.0% Δp: -11.370% 10.0% Δp: -33.838% 5.0% Δp: -54.602% 1.0% Δp: -93.772% 0.1% Δp: -99.844% Minimum Δp: -99.999% RMS Δp : 23.380 ± 0.085 % Same top p: 67.231 ± 0.124 %