llama_model_loader: loaded meta data with 39 key-value pairs and 363 tensors from granite-4.1-8b-Q4_K.gguf (version GGUF V3 (latest)) llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q4_K: 90 tensors llama_model_loader: - type q5_K: 79 tensors llama_model_loader: - type iq3_xxs: 2 tensors llama_model_loader: - type iq3_s: 1 tensors llama_model_loader: - type iq4_xs: 110 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.61 GiB (4.50 BPW) multiple_choice_score: there are 198 tasks in prompt multiple_choice_score: reading tasks......................................................................................................................................................................................................done multiple_choice_score : calculating GPQA-Diamond score over 198 tasks. Final result: 23.2323 +/- 3.0089 Random chance: 24.5963 +/- 3.0683