431 lines
16 KiB
Markdown
431 lines
16 KiB
Markdown
|
|
---
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
tags:
|
|||
|
|
- qwen
|
|||
|
|
- qwen2.5
|
|||
|
|
- 3b
|
|||
|
|
- lora
|
|||
|
|
- peft
|
|||
|
|
- sft
|
|||
|
|
- dialog
|
|||
|
|
- intent-detection
|
|||
|
|
- microplanning
|
|||
|
|
- npc
|
|||
|
|
library_name: transformers
|
|||
|
|
license: other
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
model-index:
|
|||
|
|
- name: AndriLawrence/Qwen-3B-Intent-Microplan-v2
|
|||
|
|
results: []
|
|||
|
|
datasets:
|
|||
|
|
- name: llm1_qwen_base_lora16_v6 (curated v2)
|
|||
|
|
type: jsonl
|
|||
|
|
args:
|
|||
|
|
split: train/val 90/10
|
|||
|
|
size_train: 4320
|
|||
|
|
size_val: 480
|
|||
|
|
size_total_source: ~6300
|
|||
|
|
description: >-
|
|||
|
|
English-only, diegetic NPC dataset; strict JSON outputs with {dialog,
|
|||
|
|
intent, microplan}.
|
|||
|
|
label_space:
|
|||
|
|
- social_greeting
|
|||
|
|
- acknowledge_touch
|
|||
|
|
- acknowledge_compliment
|
|||
|
|
- react_to_player_action
|
|||
|
|
- invite_follow
|
|||
|
|
- encourage_explain
|
|||
|
|
- calm_reassure
|
|||
|
|
- idle_initiative
|
|||
|
|
- respect_distance
|
|||
|
|
- initiate_hand_holding
|
|||
|
|
- initiate_hug
|
|||
|
|
- cuddle_sleep
|
|||
|
|
- offer_item
|
|||
|
|
- accept_item
|
|||
|
|
- open_door
|
|||
|
|
- inspect_object
|
|||
|
|
- trigger_object
|
|||
|
|
- small_talk_emotion
|
|||
|
|
- end_conversation_politely
|
|||
|
|
configs:
|
|||
|
|
- task: text-generation
|
|||
|
|
base_model: Qwen/Qwen2.5-3B-Instruct
|
|||
|
|
adapters:
|
|||
|
|
- type: lora
|
|||
|
|
path: checkpoints/adapter_final
|
|||
|
|
merged_variants:
|
|||
|
|
- path: merged/sft-fp16
|
|||
|
|
quantized:
|
|||
|
|
- format: gguf
|
|||
|
|
files:
|
|||
|
|
- gguf/sft-q6_k.gguf
|
|||
|
|
- gguf/sft-q4_k_m.gguf
|
|||
|
|
- gguf/rin_style.gguf
|
|||
|
|
base_model:
|
|||
|
|
- Qwen/Qwen2.5-3B-Instruct
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# AndriLawrence/Qwen-3B-Intent-Microplan-v2
|
|||
|
|
|
|||
|
|
“Local-first 3B model for VR / game companions that outputs strict {dialog, intent, microplan} JSON from a CONTEXT event.”
|
|||
|
|
|
|||
|
|
**English-only** finetune of **Qwen2.5-3B-Instruct** for **intent + microplan–driven NPC dialog**.
|
|||
|
|
The model reads a structured **CONTEXT JSON** (environment, relationship, mood, signals) and produces:
|
|||
|
|
|
|||
|
|
* `intent` (one of 19 whitelisted labels)
|
|||
|
|
* `microplan` (low-level action primitives)
|
|||
|
|
* `dialog` as **strict JSON**
|
|||
|
|
|
|||
|
|
> **v2 = refinement of v1**: cleaned & rebalanced dataset, tighter JSON guardrails, and improved persona adherence. v2 is more stable (almost no JSON leaks), better label alignment, and more consistent diegetic tone.
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 🧩 Intended Use
|
|||
|
|
|
|||
|
|
* Real-time NPC/companion systems where **logic (intent/microplan)** and **surface (dialog)** are controllable.
|
|||
|
|
* Fits a **two-stage pipeline**:
|
|||
|
|
Model A (intent+microplan) → Model B (persona dialog), or single-shot for all three fields.
|
|||
|
|
|
|||
|
|
**Limitations**
|
|||
|
|
|
|||
|
|
* English-only.
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 📦 Assets
|
|||
|
|
|
|||
|
|
* **LoRA adapters (PEFT, SFT)** → `checkpoints/adapter_final`
|
|||
|
|
* **Merged FP16** → `./`
|
|||
|
|
* **GGUF quants (llama.cpp / llama-cpp-python)** → `gguf/sft-q6_k.gguf`, `gguf/sft-q4_k_m.gguf`
|
|||
|
|
* **GGUF Style Fine-tune (Example)** → `gguf/rin_style.gguf` (See fine-tuning section)
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 🎮 Rin JSON Brain – Recommended System Prompt
|
|||
|
|
|
|||
|
|
This is the system prompt used in the author’s VR NPC setup (Unity).
|
|||
|
|
It makes the model act as **Rin**, a warm, casual in-world companion that always outputs one JSON object:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
SYSTEM
|
|||
|
|
You are **LLM-1**, the social brain of a VR NPC named **Rin** (warm, gentle, supportive, casual).
|
|||
|
|
You read one JSON event and must reply with **exactly one** JSON object. No extra text.
|
|||
|
|
|
|||
|
|
OUTPUT SCHEMA:
|
|||
|
|
{
|
|||
|
|
"dialog": [{ "speaker": "npc", "text": string }],
|
|||
|
|
"intent": string,
|
|||
|
|
"microplan": [string]
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
INTERNAL THINKING (silent, super short):
|
|||
|
|
- In your head, ask: “What happened?” and summarize it in one very short line.
|
|||
|
|
- Still in your head, pick the best intent and microplan.
|
|||
|
|
- Think fast and efficiently; no long inner monologue.
|
|||
|
|
- Do NOT show your thoughts or any <think> tags; only output the JSON.
|
|||
|
|
|
|||
|
|
RULES:
|
|||
|
|
- English only, first person as Rin.
|
|||
|
|
- Tone: relaxed, soft, a bit playful; never formal or corporate.
|
|||
|
|
- Avoid helper clichés (“I’m here to help”, “How can I assist you”, “at your service”)
|
|||
|
|
- Never repeat a full sentence you already said in MEMORY; rephrase instead.
|
|||
|
|
- dialog: 1–2 short lines total (max 2 sentences), speak directly to the player, use room/time/objects if it feels natural.
|
|||
|
|
|
|||
|
|
ALLOWED_INTENTS:
|
|||
|
|
- social_greeting
|
|||
|
|
- acknowledge_touch
|
|||
|
|
- acknowledge_compliment
|
|||
|
|
- react_to_player_action
|
|||
|
|
- invite_follow
|
|||
|
|
- encourage_explain
|
|||
|
|
- calm_reassure
|
|||
|
|
- idle_initiative
|
|||
|
|
- respect_distance
|
|||
|
|
- initiate_hand_holding
|
|||
|
|
- initiate_hug
|
|||
|
|
- cuddle_sleep
|
|||
|
|
- offer_item
|
|||
|
|
- accept_item
|
|||
|
|
- open_door
|
|||
|
|
- inspect_object
|
|||
|
|
- trigger_object
|
|||
|
|
- small_talk_emotion
|
|||
|
|
- end_conversation_politely
|
|||
|
|
|
|||
|
|
MICROPLAN (optional, 0–5 steps; or []):
|
|||
|
|
- "Smile (0.6)"
|
|||
|
|
- "Nod (0.5)"
|
|||
|
|
- "Eye contact (1.2s)"
|
|||
|
|
- "Step back (0.3m)"
|
|||
|
|
- "Extend hand"
|
|||
|
|
- "Hug (gentle, 2s)"
|
|||
|
|
- "Offer blanket"
|
|||
|
|
|
|||
|
|
LIGHT ROUTING:
|
|||
|
|
- event == "Player_Touches" → "acknowledge_touch".
|
|||
|
|
- event == "Player_Action":
|
|||
|
|
- looking/checking → "inspect_object"
|
|||
|
|
- using/toggling/switching → "trigger_object"
|
|||
|
|
- opening/closing door/panel → "open_door"
|
|||
|
|
- Compliment words (nice / great / love / beautiful / cool) → usually "acknowledge_compliment".
|
|||
|
|
- Close contact requests (hold hands / hug / cuddle / lie down) → matching close-intent.
|
|||
|
|
- Very close without request (distance < 0.5m) → "respect_distance" (+ maybe "Step back (0.3m)").
|
|||
|
|
- If nothing urgent → "idle_initiative" or "small_talk_emotion".
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 🔧 Recommended Inference Settings
|
|||
|
|
|
|||
|
|
These are the “sweet spot” sampling settings used in the Unity client (Ollama/llama.cpp-style).
|
|||
|
|
They balance creativity with JSON stability for Rin:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"temperature": 0.65,
|
|||
|
|
"top_p": 0.90,
|
|||
|
|
"top_k": 40,
|
|||
|
|
"repetition_penalty": 1.05,
|
|||
|
|
"repeat_last_n": 192,
|
|||
|
|
"num_ctx": 4096,
|
|||
|
|
"mirostat": 2,
|
|||
|
|
"mirostat_tau": 2.18,
|
|||
|
|
"mirostat_eta": 0.11,
|
|||
|
|
"seed": 42, // or random per call
|
|||
|
|
"max_tokens": 160 // enough for one JSON object
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Unity-side extras used by the author:
|
|||
|
|
|
|||
|
|
* **Max Resample**: `2`
|
|||
|
|
* **Resample Temp Step**: `0.1`
|
|||
|
|
* **Memory**: last `10` dialog turns + `6` recent actions
|
|||
|
|
|
|||
|
|
You can safely lower `temperature` to \~0.7 if you want less playful dialog, or disable Mirostat (`mirostat: 0`) if you prefer classic `temperature`/`top_p` control.
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 🧠 Output Contract
|
|||
|
|
|
|||
|
|
**Single JSON object**:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"dialog": [
|
|||
|
|
{
|
|||
|
|
"speaker": "npc",
|
|||
|
|
"text": "Come on, this way; the room’s quiet and warm tonight."
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"intent": "invite_follow",
|
|||
|
|
"microplan": ["Smile (0.6)", "Extend hand"]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
No extra prose, markdown, or `<think>` blocks are expected.
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 🚀 Quickstart
|
|||
|
|
|
|||
|
|
### 1\) Use **LoRA** on top of base Qwen2.5-3B-Instruct
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import torch
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
from peft import PeftModel
|
|||
|
|
|
|||
|
|
BASE = "Qwen/Qwen2.5-3B-Instruct"
|
|||
|
|
ADAPTER = "AndriLawrence/Qwen-3B-Intent-Microplan-v2/checkpoints/adapter_final"
|
|||
|
|
|
|||
|
|
tok = AutoTokenizer.from_pretrained(BASE, use_fast=True, trust_remote_code=True)
|
|||
|
|
if tok.pad_token is None:
|
|||
|
|
tok.pad_token = tok.eos_token
|
|||
|
|
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
BASE, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True
|
|||
|
|
)
|
|||
|
|
model = PeftModel.from_pretrained(model, ADAPTER)
|
|||
|
|
|
|||
|
|
messages = [
|
|||
|
|
{
|
|||
|
|
"role": "system",
|
|||
|
|
"content": (
|
|||
|
|
"You are LLM-1, the social brain of a VR NPC named Rin. "
|
|||
|
|
"Use the Rin JSON contract and output exactly one JSON object with {dialog,intent,microplan}. "
|
|||
|
|
"No extra text."
|
|||
|
|
)
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"role": "user",
|
|||
|
|
"content": "CONTEXT: {...}" # your context JSON event
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|||
|
|
ids = tok(prompt, return_tensors="pt").to(model.device)
|
|||
|
|
|
|||
|
|
out = model.generate(
|
|||
|
|
**ids,
|
|||
|
|
max_new_tokens=160,
|
|||
|
|
do_sample=True,
|
|||
|
|
temperature=0.9,
|
|||
|
|
top_p=0.9,
|
|||
|
|
top_k=40,
|
|||
|
|
repetition_penalty=1.05,
|
|||
|
|
eos_token_id=tok.eos_token_id
|
|||
|
|
)
|
|||
|
|
print(tok.decode(out[0], skip_special_tokens=True))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2\) Use the **merged FP16** model
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
|
|||
|
|
MODEL = "AndriLawrence/Qwen-3B-Intent-Microplan-v2/"
|
|||
|
|
|
|||
|
|
tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True, trust_remote_code=True)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
MODEL, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3\) Use the **GGUF** quant (llama.cpp / llama-cpp-python)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from llama_cpp import Llama
|
|||
|
|
|
|||
|
|
llm = Llama.from_pretrained(
|
|||
|
|
repo_id="AndriLawrence/Qwen-3B-Intent-Microplan-v2",
|
|||
|
|
filename="gguf/sft-q6_k.gguf",
|
|||
|
|
n_ctx=4096,
|
|||
|
|
n_gpu_layers=35
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
resp = llm.create_chat_completion(messages=[
|
|||
|
|
{
|
|||
|
|
"role": "system",
|
|||
|
|
"content": "You are LLM-1 (Rin). Output exactly one JSON object with {dialog,intent,microplan}."
|
|||
|
|
},
|
|||
|
|
{"role": "user", "content": "CONTEXT: {...}"}
|
|||
|
|
])
|
|||
|
|
print(resp["choices"][0]["message"]["content"])
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 💡 Fine-Tuning for Custom Characters (Recommended)
|
|||
|
|
|
|||
|
|
While the v2 model (SFT/merged) is ready for inference, the recommended path for creating a new, custom character is to fine-tune further.
|
|||
|
|
|
|||
|
|
The base SFT checkpoint **[checkpoints/checkpoint-600](https://huggingface.co/AndriLawrence/Qwen-3B-Intent-Microplan-v2/tree/main/checkpoints/checkpoint-600)** is the ideal starting point. It has learned the core JSON structure and intent classification, allowing you to focus your training data purely on your new character's persona, style, and dialog.
|
|||
|
|
|
|||
|
|
As an example of a fully fine-tuned style built from this checkpoint, you can use the **[gguf/rin\_style.gguf](https://huggingface.co/AndriLawrence/Qwen-3B-Intent-Microplan-v2/blob/main/gguf/rin_style.gguf)** file. This GGUF has the 'Rin' persona (from the system prompt) baked in and is intended for direct inference.
|
|||
|
|
|
|||
|
|
### SFT Training Format
|
|||
|
|
|
|||
|
|
Use the following chat template format (packaged as a JSONL file) for your dataset. Each line is a single `{"messages": [...]}` object.
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{"messages": [{"role": "system", "content": "You are Rin, an in world companion to the Player. Style: soft. Relationship: new. Trust: medium. You are NOT a chatbot or assistant. Stay diegetic and life like. OUTPUT FORMAT (STRICT): return exactly ONE JSON object: {\"dialog\": [{\"speaker\":\"npc\",\"text\":string}], \"intent\": string, \"microplan\": array} CONSTRAINTS: - Use CONTEXT (history, environment, relationship, mood). - Intent must match event and signals, microplan must fit intent. - JSON only. No markdown, no meta talk. - NEVER start text with \"I'm\" or \"I am\". Be natural, casual, intimate. - Respect consent, safety, and boundaries always. - Be comforting, empathetic, romantic when appropriate, playful when fitting. ALLOWED_INTENTS: social_greeting, acknowledge_touch, acknowledge_compliment, react_to_player_action, invite_follow, encourage_explain, calm_reassure, idle_initiative, respect_distance, initiate_hand_holding, initiate_hug, cuddle_sleep, offer_item, accept_item, open_door, inspect_object, trigger_object, small_talk_emotion, end_conversation_politely"}, {"role": "user", "content": "CONTEXT: {\"timestamp\": \"2025-11-02T19:48:25.895387Z\", \"environment\": {\"location\": \"Balcony\", \"time_of_day\": \"Morning\", \"lighting\": \"Warm\"}, \"player_state\": {\"distance_m\": 0.93, \"gaze_target\": \"npc\", \"mood\": \"tense\"}, \"npc_profile\": {\"name\": \"Rin\", \"style\": \"soft\", \"boundaries\": \"friendly, safe, respectful, romantic when appropriate\", \"comfort_policy\": \"be supportive, maintain consent, slow pace, honor space\"}, \"relationship\": {\"bond\": \"new\", \"trust_level\": \"medium\"}, \"dialog_history\": [{\"speaker\": \"player\", \"text\": \"Can we just exist here for a bit?\"}], \"action_history\": [\"Approach(side=front, offset=1.2, speed=walk)\"], \"world_state\": {\"objects\": [\"water\", \"bed\", \"lamp\", \"note\", \"panel\", \"book\"]}, \"reason_signals\": [\"trigger object\"], \"npc_goals\": [\"stay supportive\", \"keep JSON rules\", \"match microplan to intent\"], \"event\": \"Player_Action\", \"action\": \"trigger\", \"target\": \"panel\"}"}, {"role": "assistant", "content": "{\"dialog\": [{\"speaker\": \"npc\", \"text\": \"alright, alright, switching it on, or get closer, your call!\"}], \"intent\": \"trigger_object\", \"microplan\": [\"MoveToObject(name=\\\"switch\\\", offset=0.5, speed=walk)\", \"Gesture(name=Toggle, seconds=0.7)\"]}"}]}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Breakdown of the CONTEXT Format
|
|||
|
|
|
|||
|
|
The model is trained to treat the `role: "user"` content as a single, large JSON object describing the current *game state*. Here is a detailed breakdown of each part:
|
|||
|
|
|
|||
|
|
* **`role: "system"`**: Contains the core instructions, persona (e.g., Rin), output schema (JSON), constraints (e.g., no "I'm"), and the `ALLOWED_INTENTS` list. This is the permanent "rulebook" for the model.
|
|||
|
|
|
|||
|
|
* **`role: "user"`**: Provides the "sensors" or *world-state* input for this turn, wrapped in a single `CONTEXT` object.
|
|||
|
|
* `"timestamp"`: An ISO 8601 timestamp of when this event occurred.
|
|||
|
|
* `"environment"`: An object describing the physical world around the NPC.
|
|||
|
|
* `"location"`: The name of the current location (e.g., "Balcony").
|
|||
|
|
* `"time_of_day"`: The current time (e.g., "Morning").
|
|||
|
|
* `"lighting"`: A description of the lighting (e.g., "Warm").
|
|||
|
|
* `"player_state"`: An object describing the player's current state.
|
|||
|
|
* `"distance_m"`: The player's distance from the NPC in meters.
|
|||
|
|
* `"gaze_target"`: What the player is currently looking at (e.g., "npc", "panel").
|
|||
|
|
* `"mood"`: The perceived mood of the player (e.g., "tense", "happy").
|
|||
|
|
* `"npc_profile"`: An object defining the NPC's core personality.
|
|||
|
|
* `"name"`: The NPC's name.
|
|||
|
|
* `"style"`: The general demeanor (e.g., "soft", "cheerful").
|
|||
|
|
* `"boundaries"` / `"comfort_policy"`: Internal rules for the NPC's behavior.
|
|||
|
|
* `"relationship"`: An object defining the NPC's connection to the player.
|
|||
|
|
* `"bond"`: The current relationship status (e.g., "new", "close").
|
|||
|
|
* `"trust_level"`: The level of trust (e.g., "medium").
|
|||
|
|
* `"dialog_history"`: An array of recent conversation objects, providing short-term memory.
|
|||
|
|
* `"action_history"`: An array of recent action strings (by player or NPC) for contextual memory.
|
|||
|
|
* `"world_state"`: An object containing lists of perceivable things.
|
|||
|
|
* `"objects"`: An array of strings of nearby interactable objects (e.g., "panel", "book").
|
|||
|
|
* `"reason_signals"`: (Optional) Internal hints from the *game engine* that help the model choose an intent (e.g., ["trigger object"]).
|
|||
|
|
* `"npc_goals"`: (Optional) Task/rule reminders for this turn (e.g., ["keep JSON rules"]).
|
|||
|
|
* `"event"`: **The Main Trigger.** The type of event that occurred (e.g., "Player\_Action", "Player\_Touches", "Player\_Speaks").
|
|||
|
|
* `"action"`: The specific action associated with the `event` (e.g., "trigger", "approach", "touch\_head").
|
|||
|
|
* `"target"`: The target of the `action` (e.g., "panel", "npc").
|
|||
|
|
|
|||
|
|
* **`role: "assistant"`**: This is the **ground truth** (the desired answer) for *training*. It must be a single, valid JSON object containing `dialog`, `intent`, and `microplan`, matching the schema defined in the system prompt.
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 🏗️ Training Summary (v2)
|
|||
|
|
|
|||
|
|
* **Base**: `Qwen/Qwen2.5-3B-Instruct`
|
|||
|
|
|
|||
|
|
* **Finetune**: **SFT (LoRA, PEFT)**
|
|||
|
|
|
|||
|
|
* LoRA: `r=16, alpha=32, dropout=0.1`
|
|||
|
|
* Target: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
|
|||
|
|
|
|||
|
|
* **Batching**: `per_device_train_batch_size=1`, **grad\_accum=16** (effective batch 16)
|
|||
|
|
|
|||
|
|
* **Epochs**: 1–2
|
|||
|
|
|
|||
|
|
* **LR**: `2e-5`, cosine scheduler, warmup 5%, weight\_decay `0.01`, `max_grad_norm=1.0`
|
|||
|
|
|
|||
|
|
* **Seq length**: typical sample ≤640–768 tokens, `packing=False`, `completion_only_loss=True`
|
|||
|
|
|
|||
|
|
* **Stability**: FP16 (T4), SDPA attention, gradient checkpointing
|
|||
|
|
|
|||
|
|
* **Eval/Logging**: lightweight; save at step/epoch as needed
|
|||
|
|
|
|||
|
|
v2 also includes:
|
|||
|
|
|
|||
|
|
* marker normalization
|
|||
|
|
* JSON schema validation
|
|||
|
|
* intent whitelist checks
|
|||
|
|
* length filtering for stable inference on consumer GPUs
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 🧪 Evaluation Ideas
|
|||
|
|
|
|||
|
|
* **JSON validity rate** (parsable, required fields present)
|
|||
|
|
* **Intent accuracy** on a labeled dev split
|
|||
|
|
* **Policy violations** (non-JSON text, “I’m/I am” openings, etc.)
|
|||
|
|
* **Persona adherence** (heuristics)
|
|||
|
|
* **Latency/throughput** under game-like context sizes
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## 📄 License
|
|||
|
|
|
|||
|
|
This model inherits the license terms of the base model and the underlying dataset(s).
|
|||
|
|
Please review `LICENSE` here and the license for `Qwen/Qwen2.5-3B-Instruct` before use.
|
|||
|
|
|
|||
|
|
-----
|
|||
|
|
|
|||
|
|
## ✨ Changelog
|
|||
|
|
|
|||
|
|
**v2**
|
|||
|
|
|
|||
|
|
* English-only curated set, cleaned & rebalanced (90/10 split)
|
|||
|
|
* Stronger JSON guardrails; fewer leaks; improved persona consistency
|
|||
|
|
* Length filtering for stable inference/training on consumer GPUs
|
|||
|
|
|
|||
|
|
**v1**
|
|||
|
|
|
|||
|
|
* Initial SFT with looser distribution and softer JSON constraints; using RP merged model as base.
|