22 lines
794 B
Markdown
22 lines
794 B
Markdown
---
|
|
license: mit
|
|
---
|
|
|
|
# Forge-1V Micro Vision Adapter Scaffold
|
|
|
|
This folder contains an experimental, untrained micro vision-adapter scaffold for future Forge-1V work.
|
|
|
|
It is intentionally separate from the main text checkpoint:
|
|
|
|
- The main `config.json` remains Llama-compatible.
|
|
- The GGUF export remains text-only.
|
|
- These files do not make the released model able to view images.
|
|
|
|
Suggested target design:
|
|
|
|
- Tiny patch encoder: 3-channel images to a small vision width.
|
|
- Projection: vision width to the 1024-dimensional Forge text hidden size.
|
|
- Prefix tokens: projected visual tokens can be prepended to the text sequence in a future custom multimodal training run.
|
|
|
|
Approximate extra parameters for the scaffold design are well under 1M, keeping the total system under 400M parameters.
|