--- license: mit --- # Forge-1V Micro Vision Adapter Scaffold This folder contains an experimental, untrained micro vision-adapter scaffold for future Forge-1V work. It is intentionally separate from the main text checkpoint: - The main `config.json` remains Llama-compatible. - The GGUF export remains text-only. - These files do not make the released model able to view images. Suggested target design: - Tiny patch encoder: 3-channel images to a small vision width. - Projection: vision width to the 1024-dimensional Forge text hidden size. - Prefix tokens: projected visual tokens can be prepended to the text sequence in a future custom multimodal training run. Approximate extra parameters for the scaffold design are well under 1M, keeping the total system under 400M parameters.