license
| license |
|---|
| mit |
Forge-1V Micro Vision Adapter Scaffold
This folder contains an experimental, untrained micro vision-adapter scaffold for future Forge-1V work.
It is intentionally separate from the main text checkpoint:
- The main
config.jsonremains Llama-compatible. - The GGUF export remains text-only.
- These files do not make the released model able to view images.
Suggested target design:
- Tiny patch encoder: 3-channel images to a small vision width.
- Projection: vision width to the 1024-dimensional Forge text hidden size.
- Prefix tokens: projected visual tokens can be prepended to the text sequence in a future custom multimodal training run.
Approximate extra parameters for the scaffold design are well under 1M, keeping the total system under 400M parameters.