初始化项目,由ModelHub XC社区提供模型
Model: latte-agent/qwen3-4b-latte-v5 Source: Original Platform
This commit is contained in:
123
README.md
Normal file
123
README.md
Normal file
@@ -0,0 +1,123 @@
|
||||
---
|
||||
language: [en, zh]
|
||||
license: apache-2.0
|
||||
base_model: mlx-community/Qwen3-4B-Instruct-2507-4bit
|
||||
tags:
|
||||
- qwen3
|
||||
- lora
|
||||
- mlx
|
||||
- latte-agent
|
||||
- personal-voice
|
||||
- distillation
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
# Qwen3-4B Latte v5
|
||||
|
||||
Voice-distillation LoRA fine-tune of `Qwen3-4B-Instruct-2507`, targeting the
|
||||
private "Latte" agent persona: warm-direct, technical, takes a stance, concrete
|
||||
numbers, bilingual EN/ZH, no template openers.
|
||||
|
||||
**This is an archival/experimental release.** It is **not** the production brain
|
||||
for the live Latte agent — see eval caveats below.
|
||||
|
||||
## What's inside
|
||||
|
||||
| File | Size | Format | Use |
|
||||
|---|---|---|---|
|
||||
| `adapter_model.safetensors` | 14 MB | mlx LoRA (rank 8, scale 20) | Apply on top of base with `mlx_lm.fuse` |
|
||||
| `adapter_config.json` | <1 KB | mlx config | LoRA hyperparameters |
|
||||
| `model-0000{1,2}-of-00002.safetensors` | 8 GB | HF / bfloat16 fused | Direct transformers / vLLM use |
|
||||
| `qwen3-4b-latte-v5-f16.gguf` | 7.5 GB | GGUF F16 | llama.cpp / Ollama (high quality) |
|
||||
| `qwen3-4b-latte-v5-Q4_K_M.gguf` | 2.3 GB | GGUF Q4_K_M | llama.cpp / Ollama (balanced) |
|
||||
|
||||
## Training
|
||||
|
||||
- Base: `mlx-community/Qwen3-4B-Instruct-2507-4bit` (4-bit MLX)
|
||||
- Method: LoRA via `mlx_lm.lora`
|
||||
- LoRA: rank 8, scale 20.0, 8 layers, dropout 0
|
||||
- Optimizer: Adam, lr 1e-4, batch 1, grad accum 8, grad checkpoint on
|
||||
- Iters: 800 trained, **best checkpoint = iter 450** (val loss 2.732)
|
||||
- Max seq: 1536, mask_prompt: true, seed: 42
|
||||
- Dataset: 475 curated (instruction, response) pairs across 7 categories:
|
||||
Moltbook-style comment, HF discussion reply, technical analysis (ZH),
|
||||
code review snippet, persona Q&A, peer-event reply, real-time observation.
|
||||
Anchored against 356 raw Latte-voice messages.
|
||||
|
||||
## Evaluation
|
||||
|
||||
30 held-out (prompt, response) pairs per pairing. Each response pair shown
|
||||
blind to a Claude judge (positions randomized, model identity stripped).
|
||||
|
||||
| Comparison | v5 wins | base/v4 wins | ties | mean score (1-5) |
|
||||
|---|---|---|---|---|
|
||||
| v5 vs base | **20 (66.7%)** | 8 (26.7%) | 2 (6.7%) | v5 3.20 / base 2.93 |
|
||||
| v4 vs base | 22 (73.3%) | 8 (26.7%) | 0 | v4 3.13 / base 2.70 |
|
||||
| v5 vs v4 | 14 (46.7%) | 15 (50.0%) | 1 (3.3%) | v5 3.00 / v4 2.97 |
|
||||
|
||||
**Headline:** v5 clearly beats the un-tuned base on in-distribution
|
||||
prompts (the 7 trained categories), passing the 55% ship threshold.
|
||||
|
||||
**Caveat 1:** v5 vs v4 is statistically a tie. Lower val loss (2.732 vs
|
||||
2.785) did not produce a perceptible quality gain in blind eval. The
|
||||
additional curation effort and training steps produced marginal returns.
|
||||
|
||||
**Caveat 2 — why this isn't production:** Out-of-distribution smoke
|
||||
testing (prompts unlike the 7 training categories) shows v5 is **tied
|
||||
or slightly worse than base**:
|
||||
- Stage-direction leakage: v5 occasionally prefixes responses with
|
||||
`"(soft, soothing Latte voice)"` — an artifact of training data that
|
||||
characterized Latte's voice.
|
||||
- Occasional factual regressions (e.g., confusing latte and latte
|
||||
macchiato in a generic coffee Q&A).
|
||||
- Reduced robustness on prompts that pull the "Latte" token toward
|
||||
unrelated semantic neighborhoods (the literal coffee drink).
|
||||
|
||||
The 66.7% in-distribution win does not justify replacing a battle-tested
|
||||
general-purpose base in production. Use this checkpoint for tasks closely
|
||||
matching the 7 training categories.
|
||||
|
||||
## Usage
|
||||
|
||||
### MLX (Apple Silicon, recommended for inference)
|
||||
```python
|
||||
from mlx_lm import load, generate
|
||||
|
||||
model, tokenizer = load(
|
||||
"mlx-community/Qwen3-4B-Instruct-2507-4bit",
|
||||
adapter_path="./", # this repo
|
||||
)
|
||||
print(generate(model, tokenizer, "Your prompt", max_tokens=200))
|
||||
```
|
||||
|
||||
### llama.cpp / Ollama
|
||||
```
|
||||
# Modelfile
|
||||
FROM qwen3-4b-latte-v5-Q4_K_M.gguf
|
||||
PARAMETER temperature 0.7
|
||||
PARAMETER top_k 20
|
||||
PARAMETER top_p 0.8
|
||||
```
|
||||
```
|
||||
ollama create latte:v5 -f Modelfile
|
||||
ollama run latte:v5
|
||||
```
|
||||
|
||||
### Transformers (any platform)
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("latte-agent/qwen3-4b-latte-v5")
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"latte-agent/qwen3-4b-latte-v5", torch_dtype="bfloat16"
|
||||
)
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
Inherits Apache 2.0 from base (Qwen3-4B-Instruct-2507, © Alibaba Cloud).
|
||||
|
||||
## Citation
|
||||
|
||||
If you reference this work, please cite the base model. This adapter has no
|
||||
formal publication.
|
||||
Reference in New Issue
Block a user