初始化项目,由ModelHub XC社区提供模型
Model: North-ML1/willow-alpha Source: Original Platform
This commit is contained in:
21
vision_adapter/README.md
Normal file
21
vision_adapter/README.md
Normal file
@@ -0,0 +1,21 @@
|
||||
---
|
||||
license: mit
|
||||
---
|
||||
|
||||
# Forge-1V Micro Vision Adapter Scaffold
|
||||
|
||||
This folder contains an experimental, untrained micro vision-adapter scaffold for future Forge-1V work.
|
||||
|
||||
It is intentionally separate from the main text checkpoint:
|
||||
|
||||
- The main `config.json` remains Llama-compatible.
|
||||
- The GGUF export remains text-only.
|
||||
- These files do not make the released model able to view images.
|
||||
|
||||
Suggested target design:
|
||||
|
||||
- Tiny patch encoder: 3-channel images to a small vision width.
|
||||
- Projection: vision width to the 1024-dimensional Forge text hidden size.
|
||||
- Prefix tokens: projected visual tokens can be prepended to the text sequence in a future custom multimodal training run.
|
||||
|
||||
Approximate extra parameters for the scaffold design are well under 1M, keeping the total system under 400M parameters.
|
||||
Reference in New Issue
Block a user