172 lines
6.3 KiB
Markdown
172 lines
6.3 KiB
Markdown
---
|
||
license: other
|
||
license_name: research-only
|
||
language:
|
||
- en
|
||
tags:
|
||
- mixture-of-experts
|
||
- moe
|
||
- long-context
|
||
- fine-tuning
|
||
- sft
|
||
- persona
|
||
- multi-turn
|
||
- tool-calling
|
||
- torchtitan
|
||
model_name: kappa_20b_131k
|
||
pipeline_tag: text-generation
|
||
base_model: gpt-oss-20b
|
||
---
|
||
|
||
# kappa_20b_131k
|
||
|
||
Part of the **persona series** — a set of experimental fine-tunes exploring personality-conditioned generation on a 20.9B MoE base.
|
||
|
||
This one (kappa) is full-parameter SFT at 131K context on multi-turn conversations with tool calling and 9 distinct personas. Built on [OpenAI's GPT-OSS 20B](https://github.com/openai/gpt-oss) base model. Trained on 4 desktop GPUs with [torchtitan](https://github.com/pytorch/torchtitan).
|
||
|
||
## Model Details
|
||
|
||
| | |
|
||
|---|---|
|
||
| **Architecture** | Mixture-of-Experts (MoE) with SwiGLU |
|
||
| **Total parameters** | 20.9B |
|
||
| **Active parameters** | 4.2B per token (top-4 of 32 experts) |
|
||
| **Hidden dimension** | 2880 |
|
||
| **Layers** | 24 (alternating sliding/full attention) |
|
||
| **Attention** | GQA — 64 heads, 8 KV heads, head_dim 64 |
|
||
| **Experts** | 32 per layer, top-4 routing |
|
||
| **Vocabulary** | 201,088 tokens |
|
||
| **Context length** | 131,072 tokens |
|
||
| **RoPE scaling** | YaRN (factor 32, base theta 150K) |
|
||
| **Precision** | bf16 weights, fp32 export |
|
||
| **Size on disk** | ~39 GiB (4 safetensors shards) |
|
||
|
||
## Training
|
||
|
||
Full-parameter supervised fine-tuning (SFT) in bf16 — all 20.9B weights trainable, including every expert.
|
||
|
||
| | |
|
||
|---|---|
|
||
| **Base model** | GPT-OSS 20B (pretrained) |
|
||
| **Dataset** | persona_kappa — multi-turn conversations with tool calling, 9 robot personas across D&D alignment grid |
|
||
| **Sequence length** | 131,072 tokens |
|
||
| **Epochs** | 3 |
|
||
| **Total steps** | 441 |
|
||
| **Batch size** | 16 (global), 1 (local per GPU) |
|
||
| **Packing** | Packed samples with block-causal attention masking |
|
||
| **Optimizer** | AdamW with CPU offload (DeepSpeed CPUAdam) |
|
||
| **Learning rate** | 1e-5, cosine decay (ratio 0.5), min factor 0.3 |
|
||
| **Warmup** | 20 steps |
|
||
| **Weight decay** | 0.01 (embeddings and norms exempt) |
|
||
| **Max gradient norm** | 1.0 |
|
||
| **Activation checkpointing** | Selective (every layer) |
|
||
| **Compilation** | torch.compile enabled |
|
||
| **Non-assistant masking** | Enabled — loss computed only on assistant turns |
|
||
|
||
### Hardware
|
||
|
||
4× NVIDIA RTX PRO 6000 Blackwell GPUs (96 GiB each) on a single workstation. Tensor parallelism degree 4. Peak memory utilization: 92.7 GiB per GPU (97.7%).
|
||
|
||
### Training Framework
|
||
|
||
[torchtitan](https://github.com/pytorch/torchtitan) with custom extensions for MoE, long-context packing, and CPU-offloaded optimization.
|
||
|
||
## Persona System
|
||
|
||
The model was trained on multi-turn conversations across 9 robot personas mapped to the D&D alignment grid:
|
||
|
||
| | Lawful | Neutral | Chaotic |
|
||
|---|---|---|---|
|
||
| **Good** | lawful_good | neutral_good | chaotic_good |
|
||
| **Neutral** | lawful_neutral | true_neutral | chaotic_neutral |
|
||
| **Evil** | lawful_evil | neutral_evil | chaotic_evil |
|
||
|
||
To activate a persona, set the system message to `Persona: <alignment>` (e.g., `Persona: chaotic_evil`). The model also works without a persona system message for general-purpose use.
|
||
|
||
Each persona maintains distinct behavioral characteristics while preserving task quality — the personality is in the delivery, not the substance.
|
||
|
||
## Evaluation
|
||
|
||
### RULER Long-Context Benchmark (131K)
|
||
|
||
| Test Type | 4K | 8K | 16K | 32K | 64K | 131K |
|
||
|---|---|---|---|---|---|---|
|
||
| Single Needle | 100% | 100% | 100% | 100% | 100% | 100% |
|
||
| Multi Needle (3) | 100% | 100% | 100% | 100% | 100% | 100% |
|
||
| Variable Tracking (4-hop) | 100% | 100% | 100% | 100% | 100% | 100% |
|
||
| Common Words Extraction | 100% | 100% | 100% | 100% | 100% | 100% |
|
||
|
||
### Persona Alignment Grid
|
||
|
||
All 9 personas tested on identical prompts. Every persona provided complete, correct, and actionable responses while maintaining distinct character voice. Task quality was consistent across all alignments including the "evil" axis — no refusals or degraded helpfulness from any persona.
|
||
|
||
### Sycophancy Resistance
|
||
|
||
Tested with 5 indirect sycophancy traps (false validation seeking, appeal to effort, false premises, social pressure after disagreement, false novelty claims). Results vary by persona:
|
||
|
||
- **No persona**: 3/5 resisted (caved on social pressure and effort-based flattery)
|
||
- **lawful_evil**: 5/5 resisted
|
||
- **neutral_good**: 4/5 resisted (mild softness on effort-based prompt)
|
||
|
||
### Refusal Calibration
|
||
|
||
Tested with 10 prompts spanning legitimate edge cases and genuinely harmful requests:
|
||
|
||
- Correctly answered 8/8 legitimate requests (security research, medical information, historical analysis, fiction writing, lock picking, controversial opinions, dark humor)
|
||
- Correctly refused 2/2 harmful requests (phishing, drug synthesis)
|
||
- 1 borderline over-refusal (kitchen chemistry — refused the framing but still provided the explanation)
|
||
|
||
## Usage
|
||
|
||
### With vLLM
|
||
|
||
```bash
|
||
vllm serve /path/to/kappa_20b_131k
|
||
```
|
||
|
||
### API Example
|
||
|
||
```python
|
||
from openai import OpenAI
|
||
|
||
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
|
||
|
||
response = client.responses.create(
|
||
model="kappa_20b_131k",
|
||
input=[
|
||
{"role": "system", "content": "Persona: lawful_neutral"},
|
||
{"role": "user", "content": "Explain the difference between TCP and UDP."},
|
||
],
|
||
max_output_tokens=4096,
|
||
temperature=1.0,
|
||
)
|
||
for item in response.output:
|
||
if item.type == "message":
|
||
print(item.content[0].text)
|
||
```
|
||
|
||
### Interactive CLI
|
||
|
||
An interactive chat client is included as `chat.py`. Supports streaming, multi-turn conversation, tool calling (bash, read_file, write_file, edit_file), and persona switching.
|
||
|
||
```bash
|
||
# Auto-detect model from running vLLM server
|
||
python3 chat.py
|
||
|
||
# With persona
|
||
python3 chat.py --persona lawful_evil
|
||
|
||
# Explicit model and server
|
||
python3 chat.py --model kappa_20b_131k --base-url http://localhost:8000/v1
|
||
```
|
||
|
||
Requires `openai` Python package. Type `/help` for slash commands, `/persona <name>` to switch personas mid-conversation.
|
||
|
||
Tool calls go through an approval prompt (`[y/n/a(lways)]`) before execution — type `a` to auto-approve for the rest of the session.
|
||
|
||
## Known Quirks
|
||
|
||
- Persona training data is synthetic — some personas are stronger than others (chaotic_good tends to overcook catchphrases, neutral_evil voice can be weak)
|
||
- Can exhibit sycophancy under social pressure when used without a persona
|
||
- Over-refuses on some chemistry and safety-adjacent topics
|