Will perform strided perplexity calculation -> adjusting context size from 3072 to 3264 llama_model_loader: loaded meta data with 39 key-value pairs and 363 tensors from granite-4.1-8b-Q4_K.gguf (version GGUF V3 (latest)) llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q4_K: 90 tensors llama_model_loader: - type q5_K: 79 tensors llama_model_loader: - type iq3_xxs: 2 tensors llama_model_loader: - type iq3_s: 1 tensors llama_model_loader: - type iq4_xs: 110 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.61 GiB (4.50 BPW) ====== Perplexity statistics ====== Mean PPL(Q) : 8.867438 ± 0.067303 Mean PPL(base) : 8.691178 ± 0.065443 Cor(ln(PPL(Q)), ln(PPL(base))): 98.88% Mean ln(PPL(Q)/PPL(base)) : 0.020077 ± 0.001134 Mean PPL(Q)/PPL(base) : 1.020280 ± 0.001157 Mean PPL(Q)-PPL(base) : 0.176260 ± 0.010114 ====== KL divergence statistics ====== Mean KLD: 0.047917 ± 0.000392 Maximum KLD: 8.513770 99.9% KLD: 1.939411 99.0% KLD: 0.563371 95.0% KLD: 0.167563 90.0% KLD: 0.095562 Median KLD: 0.017061 10.0% KLD: 0.000210 5.0% KLD: 0.000043 1.0% KLD: 0.000003 0.1% KLD: -0.000001 Minimum KLD: -0.000004 ====== Token probability statistics ====== Mean Δp: -0.365 ± 0.017 % Maximum Δp: 95.922% 99.9% Δp: 43.653% 99.0% Δp: 16.925% 95.0% Δp: 6.946% 90.0% Δp: 3.742% 75.0% Δp: 0.557% Median Δp: -0.002% 25.0% Δp: -0.901% 10.0% Δp: -4.570% 5.0% Δp: -8.319% 1.0% Δp: -22.260% 0.1% Δp: -58.978% Minimum Δp: -98.363% RMS Δp : 6.446 ± 0.053 % Same top p: 90.937 ± 0.076 %