ModelHub XC 6ed4004b41 初始化项目,由ModelHub XC社区提供模型
Model: latte-agent/qwen3-4b-latte-v5
Source: Original Platform
2026-06-05 18:42:18 +08:00

language, license, base_model, tags, library_name
language license base_model tags library_name
en
zh
apache-2.0 mlx-community/Qwen3-4B-Instruct-2507-4bit
qwen3
lora
mlx
latte-agent
personal-voice
distillation
transformers

Qwen3-4B Latte v5

Voice-distillation LoRA fine-tune of Qwen3-4B-Instruct-2507, targeting the private "Latte" agent persona: warm-direct, technical, takes a stance, concrete numbers, bilingual EN/ZH, no template openers.

This is an archival/experimental release. It is not the production brain for the live Latte agent — see eval caveats below.

What's inside

File Size Format Use
adapter_model.safetensors 14 MB mlx LoRA (rank 8, scale 20) Apply on top of base with mlx_lm.fuse
adapter_config.json <1 KB mlx config LoRA hyperparameters
model-0000{1,2}-of-00002.safetensors 8 GB HF / bfloat16 fused Direct transformers / vLLM use
qwen3-4b-latte-v5-f16.gguf 7.5 GB GGUF F16 llama.cpp / Ollama (high quality)
qwen3-4b-latte-v5-Q4_K_M.gguf 2.3 GB GGUF Q4_K_M llama.cpp / Ollama (balanced)

Training

  • Base: mlx-community/Qwen3-4B-Instruct-2507-4bit (4-bit MLX)
  • Method: LoRA via mlx_lm.lora
  • LoRA: rank 8, scale 20.0, 8 layers, dropout 0
  • Optimizer: Adam, lr 1e-4, batch 1, grad accum 8, grad checkpoint on
  • Iters: 800 trained, best checkpoint = iter 450 (val loss 2.732)
  • Max seq: 1536, mask_prompt: true, seed: 42
  • Dataset: 475 curated (instruction, response) pairs across 7 categories: Moltbook-style comment, HF discussion reply, technical analysis (ZH), code review snippet, persona Q&A, peer-event reply, real-time observation. Anchored against 356 raw Latte-voice messages.

Evaluation

30 held-out (prompt, response) pairs per pairing. Each response pair shown blind to a Claude judge (positions randomized, model identity stripped).

Comparison v5 wins base/v4 wins ties mean score (1-5)
v5 vs base 20 (66.7%) 8 (26.7%) 2 (6.7%) v5 3.20 / base 2.93
v4 vs base 22 (73.3%) 8 (26.7%) 0 v4 3.13 / base 2.70
v5 vs v4 14 (46.7%) 15 (50.0%) 1 (3.3%) v5 3.00 / v4 2.97

Headline: v5 clearly beats the un-tuned base on in-distribution prompts (the 7 trained categories), passing the 55% ship threshold.

Caveat 1: v5 vs v4 is statistically a tie. Lower val loss (2.732 vs 2.785) did not produce a perceptible quality gain in blind eval. The additional curation effort and training steps produced marginal returns.

Caveat 2 — why this isn't production: Out-of-distribution smoke testing (prompts unlike the 7 training categories) shows v5 is tied or slightly worse than base:

  • Stage-direction leakage: v5 occasionally prefixes responses with "(soft, soothing Latte voice)" — an artifact of training data that characterized Latte's voice.
  • Occasional factual regressions (e.g., confusing latte and latte macchiato in a generic coffee Q&A).
  • Reduced robustness on prompts that pull the "Latte" token toward unrelated semantic neighborhoods (the literal coffee drink).

The 66.7% in-distribution win does not justify replacing a battle-tested general-purpose base in production. Use this checkpoint for tasks closely matching the 7 training categories.

Usage

from mlx_lm import load, generate

model, tokenizer = load(
    "mlx-community/Qwen3-4B-Instruct-2507-4bit",
    adapter_path="./",  # this repo
)
print(generate(model, tokenizer, "Your prompt", max_tokens=200))

llama.cpp / Ollama

# Modelfile
FROM qwen3-4b-latte-v5-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_k 20
PARAMETER top_p 0.8
ollama create latte:v5 -f Modelfile
ollama run latte:v5

Transformers (any platform)

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("latte-agent/qwen3-4b-latte-v5")
model = AutoModelForCausalLM.from_pretrained(
    "latte-agent/qwen3-4b-latte-v5", torch_dtype="bfloat16"
)

License

Inherits Apache 2.0 from base (Qwen3-4B-Instruct-2507, © Alibaba Cloud).

Citation

If you reference this work, please cite the base model. This adapter has no formal publication.

Description
Model synced from source: latte-agent/qwen3-4b-latte-v5
Readme 31 KiB
Languages
Jinja 100%