Will perform strided perplexity calculation -> adjusting context size from 3072 to 3264 llama_model_loader: loaded meta data with 39 key-value pairs and 363 tensors from granite-4.1-8b-Q8_0.gguf (version GGUF V3 (latest)) llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q5_1: 2 tensors llama_model_loader: - type q8_0: 266 tensors llama_model_loader: - type q6_K: 2 tensors llama_model_loader: - type bf16: 12 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 8.70 GiB (8.50 BPW) ====== Perplexity statistics ====== Mean PPL(Q) : 8.749119 ± 0.066517 Mean PPL(base) : 8.691178 ± 0.065443 Cor(ln(PPL(Q)), ln(PPL(base))): 99.85% Mean ln(PPL(Q)/PPL(base)) : 0.006644 ± 0.000418 Mean PPL(Q)/PPL(base) : 1.006667 ± 0.000421 Mean PPL(Q)-PPL(base) : 0.057941 ± 0.003751 ====== KL divergence statistics ====== Mean KLD: 0.002052 ± 0.000024 Maximum KLD: 1.924665 99.9% KLD: 0.070725 99.0% KLD: 0.018533 95.0% KLD: 0.006534 90.0% KLD: 0.004148 Median KLD: 0.000965 10.0% KLD: 0.000008 5.0% KLD: 0.000002 1.0% KLD: -0.000001 0.1% KLD: -0.000004 Minimum KLD: -0.000013 ====== Token probability statistics ====== Mean Δp: 0.031 ± 0.004 % Maximum Δp: 67.287% 99.9% Δp: 9.257% 99.0% Δp: 4.075% 95.0% Δp: 1.929% 90.0% Δp: 1.101% 75.0% Δp: 0.201% Median Δp: 0.000% 25.0% Δp: -0.151% 10.0% Δp: -0.973% 5.0% Δp: -1.750% 1.0% Δp: -3.980% 0.1% Δp: -9.904% Minimum Δp: -53.630% RMS Δp : 1.396 ± 0.020 % Same top p: 97.749 ± 0.039 %