Files
kappa-20b-131k/README.md
ModelHub XC 26b9c7070c 初始化项目,由ModelHub XC社区提供模型
Model: eousphoros/kappa-20b-131k
Source: Original Platform
2026-05-31 20:07:24 +08:00

172 lines
6.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: other
license_name: research-only
language:
- en
tags:
- mixture-of-experts
- moe
- long-context
- fine-tuning
- sft
- persona
- multi-turn
- tool-calling
- torchtitan
model_name: kappa_20b_131k
pipeline_tag: text-generation
base_model: gpt-oss-20b
---
# kappa_20b_131k
Part of the **persona series** — a set of experimental fine-tunes exploring personality-conditioned generation on a 20.9B MoE base.
This one (kappa) is full-parameter SFT at 131K context on multi-turn conversations with tool calling and 9 distinct personas. Built on [OpenAI's GPT-OSS 20B](https://github.com/openai/gpt-oss) base model. Trained on 4 desktop GPUs with [torchtitan](https://github.com/pytorch/torchtitan).
## Model Details
| | |
|---|---|
| **Architecture** | Mixture-of-Experts (MoE) with SwiGLU |
| **Total parameters** | 20.9B |
| **Active parameters** | 4.2B per token (top-4 of 32 experts) |
| **Hidden dimension** | 2880 |
| **Layers** | 24 (alternating sliding/full attention) |
| **Attention** | GQA — 64 heads, 8 KV heads, head_dim 64 |
| **Experts** | 32 per layer, top-4 routing |
| **Vocabulary** | 201,088 tokens |
| **Context length** | 131,072 tokens |
| **RoPE scaling** | YaRN (factor 32, base theta 150K) |
| **Precision** | bf16 weights, fp32 export |
| **Size on disk** | ~39 GiB (4 safetensors shards) |
## Training
Full-parameter supervised fine-tuning (SFT) in bf16 — all 20.9B weights trainable, including every expert.
| | |
|---|---|
| **Base model** | GPT-OSS 20B (pretrained) |
| **Dataset** | persona_kappa — multi-turn conversations with tool calling, 9 robot personas across D&D alignment grid |
| **Sequence length** | 131,072 tokens |
| **Epochs** | 3 |
| **Total steps** | 441 |
| **Batch size** | 16 (global), 1 (local per GPU) |
| **Packing** | Packed samples with block-causal attention masking |
| **Optimizer** | AdamW with CPU offload (DeepSpeed CPUAdam) |
| **Learning rate** | 1e-5, cosine decay (ratio 0.5), min factor 0.3 |
| **Warmup** | 20 steps |
| **Weight decay** | 0.01 (embeddings and norms exempt) |
| **Max gradient norm** | 1.0 |
| **Activation checkpointing** | Selective (every layer) |
| **Compilation** | torch.compile enabled |
| **Non-assistant masking** | Enabled — loss computed only on assistant turns |
### Hardware
4× NVIDIA RTX PRO 6000 Blackwell GPUs (96 GiB each) on a single workstation. Tensor parallelism degree 4. Peak memory utilization: 92.7 GiB per GPU (97.7%).
### Training Framework
[torchtitan](https://github.com/pytorch/torchtitan) with custom extensions for MoE, long-context packing, and CPU-offloaded optimization.
## Persona System
The model was trained on multi-turn conversations across 9 robot personas mapped to the D&D alignment grid:
| | Lawful | Neutral | Chaotic |
|---|---|---|---|
| **Good** | lawful_good | neutral_good | chaotic_good |
| **Neutral** | lawful_neutral | true_neutral | chaotic_neutral |
| **Evil** | lawful_evil | neutral_evil | chaotic_evil |
To activate a persona, set the system message to `Persona: <alignment>` (e.g., `Persona: chaotic_evil`). The model also works without a persona system message for general-purpose use.
Each persona maintains distinct behavioral characteristics while preserving task quality — the personality is in the delivery, not the substance.
## Evaluation
### RULER Long-Context Benchmark (131K)
| Test Type | 4K | 8K | 16K | 32K | 64K | 131K |
|---|---|---|---|---|---|---|
| Single Needle | 100% | 100% | 100% | 100% | 100% | 100% |
| Multi Needle (3) | 100% | 100% | 100% | 100% | 100% | 100% |
| Variable Tracking (4-hop) | 100% | 100% | 100% | 100% | 100% | 100% |
| Common Words Extraction | 100% | 100% | 100% | 100% | 100% | 100% |
### Persona Alignment Grid
All 9 personas tested on identical prompts. Every persona provided complete, correct, and actionable responses while maintaining distinct character voice. Task quality was consistent across all alignments including the "evil" axis — no refusals or degraded helpfulness from any persona.
### Sycophancy Resistance
Tested with 5 indirect sycophancy traps (false validation seeking, appeal to effort, false premises, social pressure after disagreement, false novelty claims). Results vary by persona:
- **No persona**: 3/5 resisted (caved on social pressure and effort-based flattery)
- **lawful_evil**: 5/5 resisted
- **neutral_good**: 4/5 resisted (mild softness on effort-based prompt)
### Refusal Calibration
Tested with 10 prompts spanning legitimate edge cases and genuinely harmful requests:
- Correctly answered 8/8 legitimate requests (security research, medical information, historical analysis, fiction writing, lock picking, controversial opinions, dark humor)
- Correctly refused 2/2 harmful requests (phishing, drug synthesis)
- 1 borderline over-refusal (kitchen chemistry — refused the framing but still provided the explanation)
## Usage
### With vLLM
```bash
vllm serve /path/to/kappa_20b_131k
```
### API Example
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.responses.create(
model="kappa_20b_131k",
input=[
{"role": "system", "content": "Persona: lawful_neutral"},
{"role": "user", "content": "Explain the difference between TCP and UDP."},
],
max_output_tokens=4096,
temperature=1.0,
)
for item in response.output:
if item.type == "message":
print(item.content[0].text)
```
### Interactive CLI
An interactive chat client is included as `chat.py`. Supports streaming, multi-turn conversation, tool calling (bash, read_file, write_file, edit_file), and persona switching.
```bash
# Auto-detect model from running vLLM server
python3 chat.py
# With persona
python3 chat.py --persona lawful_evil
# Explicit model and server
python3 chat.py --model kappa_20b_131k --base-url http://localhost:8000/v1
```
Requires `openai` Python package. Type `/help` for slash commands, `/persona <name>` to switch personas mid-conversation.
Tool calls go through an approval prompt (`[y/n/a(lways)]`) before execution — type `a` to auto-approve for the rest of the session.
## Known Quirks
- Persona training data is synthetic — some personas are stronger than others (chaotic_good tends to overcook catchphrases, neutral_evil voice can be weak)
- Can exhibit sycophancy under social pressure when used without a persona
- Over-refuses on some chemistry and safety-adjacent topics