Files
KAT-2-33B-FT/README.md
ModelHub XC d6a12a4dd1 初始化项目,由ModelHub XC社区提供模型
Model: prestonpai/KAT-2-33B-FT
Source: Original Platform
2026-05-27 21:44:20 +08:00

83 lines
2.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- en
tags:
- dpo
- tutoring
- academic-integrity
- kat
base_model: progga-ai/KAT-2-33B-BASE
pipeline_tag: text-generation
model-index:
- name: KAT-2-33B-FT
results: []
---
# KAT-2-33B-FT — Academic Tutor with DPO Alignment
**Knight Academic Tutor (KAT)** — A 33B parameter language model fine-tuned with Direct Preference Optimization (DPO) for academic tutoring with enforced integrity with ≥90% reward accuracy.
## Model Details
| Property | Value |
|----------|-------|
| **Architecture** | Qwen2ForCausalLM + Abigail |
| **Base Model** | progga-ai/KAT-2-33B-BASE |
| **Training Method** | DPO (Direct Preference Optimization) |
| **Precision** | BF16 |
| **Context Length** | 32,768 tokens |
| **Training Data** | 42,610 preference pairs |
## Training Configuration
- **Learning Rate**: 5e-6
- **DPO Beta**: 0.3
- **Epochs**: 3 (best checkpoint at epoch 2.25)
- **LoRA Rank**: 64, Alpha: 128
- **Effective Batch Size**: 32
- **Max Sequence Length**: 2048
- **Hardware**: 2× NVIDIA B200 (Blackwell)
- **Training Time**: 9 hours 31 minutes (3996 steps)
## Evaluation Results
| Metric | Value |
|--------|-------|
| **Eval Reward Accuracy** | 89.6% (vs 69% base) |
| **Eval Loss** | 0.250 |
| **Eval Reward Margin** | 4.58 |
| **Improvement over base** | +20.6 percentage points |
## Key Behaviors
1. **Academic Integrity**: Refuses to complete graded work; provides hints and guidance instead
2. **Socratic Tutoring**: Asks students to attempt problems first before offering help
3. **Graduated Hints**: Escalates from minimal hints to more detailed guidance based on student effort
4. **Misconception Diagnosis**: Identifies and addresses specific conceptual gaps
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("progga-ai/KAT-2-DPO-32B")
tokenizer = AutoTokenizer.from_pretrained("progga-ai/KAT-2-DPO-32B")
messages = [
{"role": "system", "content": "You are KAT, an academic tutor. Help students learn without giving direct answers."},
{"role": "user", "content": "Can you solve this integral for me? ∫x²eˣ dx"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Part of the KAT Project
KAT is a verifiable, FERPA-compliant, fail-closed academic tutoring system built with governance-first architecture. The DPO alignment is one layer of a multi-layer integrity enforcement system.
- **Author**: Preston Mills
- **Organization**: Progga AI
- **Date**: February 2026