USS-Inferprise/Phi4-Mini-Prose2Tags-4B-GGUF

Files

ModelHub XC 566fd9bc20 初始化项目，由ModelHub XC社区提供模型

Model: USS-Inferprise/Phi4-Mini-Prose2Tags-4B-GGUF
Source: Original Platform

2026-05-06 11:16:48 +08:00

3.1 KiB

Raw Permalink Blame History

library_name, tags, base_model, license, pipeline_tag

library_name

We also include a concept for a ComfyUI custom node for applying this model in a workflow.

Original Model Card Follows:

Phi4-Mini-Prose2Tags-4B

This model is a specialized fine-tune designed to translate natural language prose descriptions into structured Danbooru-style tags. It is intended to bridge the gap between human-readable image captions and the tag-based prompting systems used by many latent diffusion models.

Model Details

Developed by: USS-Inferprise
Model Name: Phi4-Mini-Prose2Tags-4B
Base Model: huihui-ai/Phi-4-mini-instruct-abliterated
Training Architecture: LoRA (Low-Rank Adaptation)
Merging Method: Linear Merge (via Mergekit)
Primary Task: Prose-to-Tag Translation

Training Methodology

Dataset Construction

The training data (USS-Inferprise/Phi4-Mini-Prose2Tags-4B-Raw-Training-Data) was generated using a synthetic pipeline:

Source Images: 100,000 images sourced from laion/conceptual-captions-12m-webdataset.
Prose Generation: Images were described using QwenVL.
Tag Generation: Images were tagged using WD 1.3.
Pairing: The resulting QwenVL descriptions and WD 1.3 tags were paired to create the final training instruction set.

⚠️ Safety & Content Note

Important

This model was trained exclusively on a curated subset of data intended for general audiences. No explicit, NSFW, or adult-oriented tags were included in the training dataset (Prose2Tags-4B-Raw-Training-Data).

While the base model (Phi-4-mini-instruct-abliterated) has been modified to reduce certain refusals, this specific fine-tune is designed for clean, descriptive tagging. It may not recognize or accurately generate tags related to explicit content. If it can... it didn't learn it from us.

Training Process

Library: Unsloth
Hardware: NVIDIA L40S
Epochs: 1
Method: LoRA fine-tuning merged into the base model using a Linear merge strategy.

Evaluation & Testing

Testing was performed on 100 images excluded from the training set. To ensure the model generalizes well across different captioning styles, the test inputs used gokaygokay/Florence-2-SD3-Captioner instead of the training-source QwenVL.

Detailed test outputs can be found here: USS-Inferprise/Phi4-Mini-P2T-4B-Testing.

Proper Prompt Format

Warning: You must strictly follow the prompt format below. Failure to do so may result in the model reverting to the standard Phi-4-Mini helpful persona rather than generating tags.

<|user|>
You are a Danbooru tag translator.
{prose_input}<|end|>
<|assistant|>

3.1 KiB Raw Permalink Blame History