willow-alpha/vision_adapter/README.md

---
license: mit
---

# Forge-1V Micro Vision Adapter Scaffold

This folder contains an experimental, untrained micro vision-adapter scaffold for future Forge-1V work.

It is intentionally separate from the main text checkpoint:

- The main `config.json` remains Llama-compatible.
- The GGUF export remains text-only.
- These files do not make the released model able to view images.

Suggested target design:

- Tiny patch encoder: 3-channel images to a small vision width.
- Projection: vision width to the 1024-dimensional Forge text hidden size.
- Prefix tokens: projected visual tokens can be prepended to the text sequence in a future custom multimodal training run.

Approximate extra parameters for the scaffold design are well under 1M, keeping the total system under 400M parameters.