Phi-4-reasoning-vision-15B-…/README.md

---
license: mit
language:
  - en
base_model: microsoft/Phi-4-reasoning-vision-15B
tags:
  - phi4
  - phi-4
  - gguf
  - quantized
  - llama-cpp
  - ollama
  - text-generation
  - reasoning
model_type: phi3
quantized_by: jamesburton
pipeline_tag: text-generation
---

# Phi-4-reasoning-vision-15B-GGUF

GGUF format conversions of [microsoft/Phi-4-reasoning-vision-15B](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B) for use with [llama.cpp](https://github.com/ggerganov/llama.cpp) and [Ollama](https://ollama.com).

> **Note:** This conversion includes the **text backbone only** (language model weights). Vision encoder and multimodal projector weights are excluded, as llama.cpp does not yet support the `phi4-siglip` vision architecture. The text model is architecturally identical to Phi-4-reasoning-plus (`Phi3ForCausalLM`).

## Available Files

| Filename | Quant Type | Size | Description |
|---|---|---|---|
| `phi-4-reasoning-vision-f16.gguf` | F16 | ~28 GB | Full precision (float16) |
| `phi-4-reasoning-vision-q8_0.gguf` | Q8_0 | ~15 GB | 8-bit quantization (near-lossless) |
| `phi-4-reasoning-vision-q6_k.gguf` | Q6_K | ~12 GB | 6-bit K-quant |
| `phi-4-reasoning-vision-q5_k_m.gguf` | Q5_K_M | ~9.9 GB | 5-bit K-quant medium |
| `phi-4-reasoning-vision-q5_k_s.gguf` | Q5_K_S | ~9.5 GB | 5-bit K-quant small |
| `phi-4-reasoning-vision-q4_K_M.gguf` | Q4_K_M | ~8.5 GB | 4-bit K-quant medium (recommended) |
| `phi-4-reasoning-vision-q4_k_s.gguf` | Q4_K_S | ~7.9 GB | 4-bit K-quant small |
| `phi-4-reasoning-vision-q3_k_l.gguf` | Q3_K_L | ~7.4 GB | 3-bit K-quant large |
| `phi-4-reasoning-vision-q3_k_m.gguf` | Q3_K_M | ~6.9 GB | 3-bit K-quant medium |
| `phi-4-reasoning-vision-q3_k_s.gguf` | Q3_K_S | ~6.1 GB | 3-bit K-quant small |
| `phi-4-reasoning-vision-q2_k.gguf` | Q2_K | ~5.2 GB | 2-bit K-quant (smallest, lowest quality) |

## How to Use

### With Ollama

```bash
# Download the Q4_K_M GGUF and create a Modelfile:
cat > Modelfile <<'EOF'
FROM ./phi-4-reasoning-vision-q4_K_M.gguf

TEMPLATE """<|system|>
{{ if .System }}{{ .System }}{{ else }}You are a helpful AI assistant with vision capabilities. You can analyze images and reason about them step by step.{{ end }}<|end|>
<|user|>
{{ .Prompt }}<|end|>
<|assistant|>
"""

PARAMETER stop "<|end|>"
PARAMETER stop "<|endoftext|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
EOF

ollama create phi4-vision -f Modelfile
ollama run phi4-vision
```

### With llama.cpp

```bash
./llama-cli -m phi-4-reasoning-vision-q4_K_M.gguf -p "Explain the theory of relativity in simple terms." -n 512
```

## Model Details

- **Original Model:** [microsoft/Phi-4-reasoning-vision-15B](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B)
- **Architecture:** Phi3ForCausalLM (text backbone of Phi-4-reasoning-vision)
- **Parameters:** ~15B (text model)
- **Hidden Size:** 5120
- **Layers:** 40
- **Attention Heads:** 40 (10 KV heads, GQA)
- **Vocab Size:** 100,352
- **Tokenizer:** GPT-2 (BPE)
- **Context Length:** Up to 131,072 tokens (with RoPE scaling)
- **License:** [MIT](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B/blob/main/LICENSE)

## Conversion Details

- Converted using [llama.cpp](https://github.com/ggerganov/llama.cpp) `convert_hf_to_gguf.py`
- Vision tower (`model.vision_tower.*`) and multimodal projector (`model.mm_projector.*`) weights were skipped during conversion
- The model config was remapped from `Phi4ForCausalLMV` (phi4-siglip) to `Phi3ForCausalLM` (phi3) since the text backbone is architecturally identical
- Quantization performed via `llama_model_quantize()` with CUDA acceleration
- 243 text tensors converted, 452 vision tensors excluded

## Original Model Card

For full details on training, capabilities, safety, and intended use, please refer to the [original model card](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B).

## Disclaimer

This is an unofficial GGUF conversion. The original model was created by Microsoft Research. All credit for the model architecture, training, and capabilities belongs to the Microsoft Phi team. Please refer to the [original model's license](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B/blob/main/LICENSE) for usage terms.
初始化项目，由ModelHub XC社区提供模型 Model: jamesburton/Phi-4-reasoning-vision-15B-GGUF Source: Original Platform 2026-04-13 14:28:07 +08:00			`---`
			`license: mit`
			`language:`
			`- en`
			`base_model: microsoft/Phi-4-reasoning-vision-15B`
			`tags:`
			`- phi4`
			`- phi-4`
			`- gguf`
			`- quantized`
			`- llama-cpp`
			`- ollama`
			`- text-generation`
			`- reasoning`
			`model_type: phi3`
			`quantized_by: jamesburton`
			`pipeline_tag: text-generation`
			`---`

			`# Phi-4-reasoning-vision-15B-GGUF`

			`GGUF format conversions of [microsoft/Phi-4-reasoning-vision-15B](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B) for use with [llama.cpp](https://github.com/ggerganov/llama.cpp) and [Ollama](https://ollama.com).`

			> Note: This conversion includes the text backbone only (language model weights). Vision encoder and multimodal projector weights are excluded, as llama.cpp does not yet support the `phi4-siglip` vision architecture. The text model is architecturally identical to Phi-4-reasoning-plus (`Phi3ForCausalLM`).

			`## Available Files`

			`\| Filename \| Quant Type \| Size \| Description \|`
			`\|---\|---\|---\|---\|`
			\| `phi-4-reasoning-vision-f16.gguf` \| F16 \| ~28 GB \| Full precision (float16) \|
			\| `phi-4-reasoning-vision-q8_0.gguf` \| Q8_0 \| ~15 GB \| 8-bit quantization (near-lossless) \|
			\| `phi-4-reasoning-vision-q6_k.gguf` \| Q6_K \| ~12 GB \| 6-bit K-quant \|
			\| `phi-4-reasoning-vision-q5_k_m.gguf` \| Q5_K_M \| ~9.9 GB \| 5-bit K-quant medium \|
			\| `phi-4-reasoning-vision-q5_k_s.gguf` \| Q5_K_S \| ~9.5 GB \| 5-bit K-quant small \|
			\| `phi-4-reasoning-vision-q4_K_M.gguf` \| Q4_K_M \| ~8.5 GB \| 4-bit K-quant medium (recommended) \|
			\| `phi-4-reasoning-vision-q4_k_s.gguf` \| Q4_K_S \| ~7.9 GB \| 4-bit K-quant small \|
			\| `phi-4-reasoning-vision-q3_k_l.gguf` \| Q3_K_L \| ~7.4 GB \| 3-bit K-quant large \|
			\| `phi-4-reasoning-vision-q3_k_m.gguf` \| Q3_K_M \| ~6.9 GB \| 3-bit K-quant medium \|
			\| `phi-4-reasoning-vision-q3_k_s.gguf` \| Q3_K_S \| ~6.1 GB \| 3-bit K-quant small \|
			\| `phi-4-reasoning-vision-q2_k.gguf` \| Q2_K \| ~5.2 GB \| 2-bit K-quant (smallest, lowest quality) \|

			`## How to Use`

			`### With Ollama`

			```bash
			`# Download the Q4_K_M GGUF and create a Modelfile:`
			`cat > Modelfile <<'EOF'`
			`FROM ./phi-4-reasoning-vision-q4_K_M.gguf`

			`TEMPLATE """<\|system\|>`
			`{{ if .System }}{{ .System }}{{ else }}You are a helpful AI assistant with vision capabilities. You can analyze images and reason about them step by step.{{ end }}<\|end\|>`
			`<\|user\|>`
			`{{ .Prompt }}<\|end\|>`
			`<\|assistant\|>`
			`"""`

			`PARAMETER stop "<\|end\|>"`
			`PARAMETER stop "<\|endoftext\|>"`
			`PARAMETER temperature 0.7`
			`PARAMETER top_p 0.9`
			`PARAMETER num_ctx 4096`
			`EOF`

			`ollama create phi4-vision -f Modelfile`
			`ollama run phi4-vision`
			```

			`### With llama.cpp`

			```bash
			`./llama-cli -m phi-4-reasoning-vision-q4_K_M.gguf -p "Explain the theory of relativity in simple terms." -n 512`
			```

			`## Model Details`

			`- Original Model: [microsoft/Phi-4-reasoning-vision-15B](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B)`
			`- Architecture: Phi3ForCausalLM (text backbone of Phi-4-reasoning-vision)`
			`- Parameters: ~15B (text model)`
			`- Hidden Size: 5120`
			`- Layers: 40`
			`- Attention Heads: 40 (10 KV heads, GQA)`
			`- Vocab Size: 100,352`
			`- Tokenizer: GPT-2 (BPE)`
			`- Context Length: Up to 131,072 tokens (with RoPE scaling)`
			`- License: [MIT](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B/blob/main/LICENSE)`

			`## Conversion Details`

			- Converted using [llama.cpp](https://github.com/ggerganov/llama.cpp) `convert_hf_to_gguf.py`
			- Vision tower (`model.vision_tower.`) and multimodal projector (`model.mm_projector.`) weights were skipped during conversion
			- The model config was remapped from `Phi4ForCausalLMV` (phi4-siglip) to `Phi3ForCausalLM` (phi3) since the text backbone is architecturally identical
			- Quantization performed via `llama_model_quantize()` with CUDA acceleration
			`- 243 text tensors converted, 452 vision tensors excluded`

			`## Original Model Card`

			`For full details on training, capabilities, safety, and intended use, please refer to the [original model card](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B).`

			`## Disclaimer`

			`This is an unofficial GGUF conversion. The original model was created by Microsoft Research. All credit for the model architecture, training, and capabilities belongs to the Microsoft Phi team. Please refer to the [original model's license](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B/blob/main/LICENSE) for usage terms.`