--- language: - en tags: - dpo - tutoring - academic-integrity - kat base_model: progga-ai/KAT-2-33B-BASE pipeline_tag: text-generation model-index: - name: KAT-2-33B-FT results: [] --- # KAT-2-33B-FT — Academic Tutor with DPO Alignment **Knight Academic Tutor (KAT)** — A 33B parameter language model fine-tuned with Direct Preference Optimization (DPO) for academic tutoring with enforced integrity with ≥90% reward accuracy. ## Model Details | Property | Value | |----------|-------| | **Architecture** | Qwen2ForCausalLM + Abigail | | **Base Model** | progga-ai/KAT-2-33B-BASE | | **Training Method** | DPO (Direct Preference Optimization) | | **Precision** | BF16 | | **Context Length** | 32,768 tokens | | **Training Data** | 42,610 preference pairs | ## Training Configuration - **Learning Rate**: 5e-6 - **DPO Beta**: 0.3 - **Epochs**: 3 (best checkpoint at epoch 2.25) - **LoRA Rank**: 64, Alpha: 128 - **Effective Batch Size**: 32 - **Max Sequence Length**: 2048 - **Hardware**: 2× NVIDIA B200 (Blackwell) - **Training Time**: 9 hours 31 minutes (3996 steps) ## Evaluation Results | Metric | Value | |--------|-------| | **Eval Reward Accuracy** | 89.6% (vs 69% base) | | **Eval Loss** | 0.250 | | **Eval Reward Margin** | 4.58 | | **Improvement over base** | +20.6 percentage points | ## Key Behaviors 1. **Academic Integrity**: Refuses to complete graded work; provides hints and guidance instead 2. **Socratic Tutoring**: Asks students to attempt problems first before offering help 3. **Graduated Hints**: Escalates from minimal hints to more detailed guidance based on student effort 4. **Misconception Diagnosis**: Identifies and addresses specific conceptual gaps ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("progga-ai/KAT-2-DPO-32B") tokenizer = AutoTokenizer.from_pretrained("progga-ai/KAT-2-DPO-32B") messages = [ {"role": "system", "content": "You are KAT, an academic tutor. Help students learn without giving direct answers."}, {"role": "user", "content": "Can you solve this integral for me? ∫x²eˣ dx"} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Part of the KAT Project KAT is a verifiable, FERPA-compliant, fail-closed academic tutoring system built with governance-first architecture. The DPO alignment is one layer of a multi-layer integrity enforcement system. - **Author**: Preston Mills - **Organization**: Progga AI - **Date**: February 2026