Files
KAT-2-33B-FT/README.md
ModelHub XC d6a12a4dd1 初始化项目,由ModelHub XC社区提供模型
Model: prestonpai/KAT-2-33B-FT
Source: Original Platform
2026-05-27 21:44:20 +08:00

2.7 KiB
Raw Permalink Blame History

language, tags, base_model, pipeline_tag, model-index
language tags base_model pipeline_tag model-index
en
dpo
tutoring
academic-integrity
kat
progga-ai/KAT-2-33B-BASE text-generation
name results
KAT-2-33B-FT

KAT-2-33B-FT — Academic Tutor with DPO Alignment

Knight Academic Tutor (KAT) — A 33B parameter language model fine-tuned with Direct Preference Optimization (DPO) for academic tutoring with enforced integrity with ≥90% reward accuracy.

Model Details

Property Value
Architecture Qwen2ForCausalLM + Abigail
Base Model progga-ai/KAT-2-33B-BASE
Training Method DPO (Direct Preference Optimization)
Precision BF16
Context Length 32,768 tokens
Training Data 42,610 preference pairs

Training Configuration

  • Learning Rate: 5e-6
  • DPO Beta: 0.3
  • Epochs: 3 (best checkpoint at epoch 2.25)
  • LoRA Rank: 64, Alpha: 128
  • Effective Batch Size: 32
  • Max Sequence Length: 2048
  • Hardware: 2× NVIDIA B200 (Blackwell)
  • Training Time: 9 hours 31 minutes (3996 steps)

Evaluation Results

Metric Value
Eval Reward Accuracy 89.6% (vs 69% base)
Eval Loss 0.250
Eval Reward Margin 4.58
Improvement over base +20.6 percentage points

Key Behaviors

  1. Academic Integrity: Refuses to complete graded work; provides hints and guidance instead
  2. Socratic Tutoring: Asks students to attempt problems first before offering help
  3. Graduated Hints: Escalates from minimal hints to more detailed guidance based on student effort
  4. Misconception Diagnosis: Identifies and addresses specific conceptual gaps

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("progga-ai/KAT-2-DPO-32B")
tokenizer = AutoTokenizer.from_pretrained("progga-ai/KAT-2-DPO-32B")

messages = [
    {"role": "system", "content": "You are KAT, an academic tutor. Help students learn without giving direct answers."},
    {"role": "user", "content": "Can you solve this integral for me? ∫x²eˣ dx"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Part of the KAT Project

KAT is a verifiable, FERPA-compliant, fail-closed academic tutoring system built with governance-first architecture. The DPO alignment is one layer of a multi-layer integrity enforcement system.

  • Author: Preston Mills
  • Organization: Progga AI
  • Date: February 2026