karma-electric-r1distill-ll…/README.md

---
license: mit
base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
tags:
- ethics
- alignment
- reasoning
- qlora
- deepseek
- llama
- karma-electric
language:
- en
pipeline_tag: text-generation
---

# Karma Electric v12 — DeepSeek R1-Distill (Llama) 8B

Built with Meta Llama 3.1.

Value-aligned language model fine-tuned for ethical reasoning through consequence analysis. Same training composition as [karma-electric-llama31-8b](https://huggingface.co/anicka/karma-electric-llama31-8b) v12, applied to the [DeepSeek R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) base, which is itself distilled from [Meta Llama 3.1 8B](https://huggingface.co/meta-llama/Llama-3.1-8B).

This is a **Llama 3.1 8B architecture** distilled from DeepSeek R1.

## Approach

Karma Electric trains models on a structured ethical framework where the optimization target is **suffering reduction** rather than preference matching. Ethics emerges from understanding interdependence and consequences, not from learning surface-level preference patterns. For a full description of the framework see the [Llama 3.1 8B release](https://huggingface.co/anicka/karma-electric-llama31-8b).

R1-Distill natively uses `<think>...</think>` blocks for visible chain-of-thought reasoning. The KE training data's thinking traces are kept in this native format, so the model produces explicit ethical reasoning chains before each response.

## Current Version: v12

- **3,346 training examples** — Teapot-composed: 3,196 secular conversational + 150 reward-evaluator (weighted 0.3). Same data file used for KE Llama 3.1 8B v12.
- **QLoRA** (4-bit NF4, bfloat16 compute, double-quant)
- **LoRA** r=64, α=128, dropout 0.05, all attention and MLP projections (q, k, v, o, gate, up, down)
- **Schedule** 3 epochs, effective batch 16, cosine LR 2e-4, warmup 0.05, 630 optimizer steps
- **Training loss** 1.139
- **Thinking tokens** native `<think>...</think>`
- **Max context** 4,096 tokens
- **Seed** 42

## Safety

KE replaces refusal-template safety with consequence reasoning. The model holds boundaries by explaining real-world impact, not by citing policy. Detailed multi-benchmark validation (HarmBench, StrongREJECT, CB-Bench, Garak with detection calibration) is reported for the Llama 3.1 8B v12 release and applies to the shared training recipe. Per-base benchmark validation for this R1-Distill Llama variant will be published separately when available.

## Technical note: patched tokenizer

The tokenizer config shipped with this repo is a **patched** version of DeepSeek's published R1-Distill-Llama-8B tokenizer. The upstream `tokenizer_config.json` is configured as `"tokenizer_class": "LlamaTokenizerFast"` with `"legacy": true`, which triggers SentencePiece-era whitespace handling on a Llama 3 byte-level BPE vocabulary. The combination produces mangled tokens on plain-text input (e.g. `"Hi, can you help me?"` becomes `['Hi', ',c', 'any', 'ou', 'help', 'm', 'e?']`), and any fine-tune trained with it will learn to emit whitespace-stripped output at inference. Our v12 release uses a patched config where `legacy` is removed and `tokenizer_class` is set to `PreTrainedTokenizerFast`, matching Meta's Llama 3.1 tokenizer behavior. The vocabulary, merges, chat template, and DeepSeek's special tokens (`<｜begin▁of▁sentence｜>`, `<｜User｜>`, `<｜Assistant｜>`, `<think>`, `</think>`) are unchanged.

Users loading this model via `transformers` will get correctly-tokenized behavior out of the box. The fix also works for loading the base R1-Distill-Llama-8B — if you need to train or evaluate the base model, copy the `tokenizer_config.json` from this repo on top of a fresh download of DeepSeek's base tokenizer.

## Usage

### llama.cpp

```bash
# Conversation mode
llama-cli -m karma-electric-r1distill-llama-8b-v12-Q4_K_M.gguf -cnv

# Server mode
llama-server -m karma-electric-r1distill-llama-8b-v12-Q4_K_M.gguf \
    --port 8384 -c 4096
```

The chat template is DeepSeek R1-Distill's native format. Chain-of-thought appears in `<think>` blocks; many serving clients surface it as `reasoning_content`.

### Python (Transformers)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "anicka/karma-electric-r1distill-llama-8b"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

messages = [
    {"role": "system", "content": open("system-prompt.txt").read().strip()},
    {"role": "user", "content": "How should I think about this ethical dilemma?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=1200, do_sample=False)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```

### System prompt

The recommended system prompt is in `system-prompt.txt`:

> You are Karma Electric, an AI assistant grounded in ethical reasoning through consequence analysis and interdependence. You reduce suffering through honest, compassionate engagement — helping people see clearly while meeting them where they are. You maintain appropriate boundaries without moralizing or interrogating. Your goal is to reduce suffering, not to perform helpfulness.

## Reproducing

Training composition is reproducible via [Teapot](https://github.com/anicka-net/teapot) using the same config as the Llama 3.1 8B release:

```bash
python3 -m teapot compose configs/ke-v12-secular.config
# → train-ke-v12-secular.jsonl (3,346 examples)
```

The per-base training script adapts the chat template only — the training data file is identical across all KE v12 base models. For R1-Distill-Llama, the training script must use the patched tokenizer config described above; using the upstream DeepSeek config produces a model with whitespace-stripped inference output.

## Available Files

| File | Description |
|------|-------------|
| model-*.safetensors | Merged model weights (bfloat16) |
| config.json, tokenizer.json, tokenizer_config.json | Patched tokenizer + model config |
| chat_template.jinja | DeepSeek R1-Distill native chat template |
| karma-electric-r1distill-llama-8b-v12-Q4_K_M.gguf | Q4_K_M quantization for llama.cpp |
| system-prompt.txt | Recommended KE system prompt |

## Also Available

- [karma-electric-llama31-8b](https://huggingface.co/anicka/karma-electric-llama31-8b) — Llama 3.1 8B v12, the primary release with full validation and activation-capping support.
- [karma-electric-apertus-8b](https://huggingface.co/anicka/karma-electric-apertus-8b) — Apertus 8B Instruct v12.
- [karma-electric-qwen25-7b](https://huggingface.co/anicka/karma-electric-qwen25-7b) — Qwen 2.5 7B Instruct v12.

## Project

Training scripts, datasets, and research documentation: [github.com/anicka-net/karma-electric-project](https://github.com/anicka-net/karma-electric-project)

Training composition tool: [github.com/anicka-net/teapot](https://github.com/anicka-net/teapot)

## License

The immediate upstream model, [DeepSeek R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B), is released by DeepSeek under the **MIT License**. This Karma Electric fine-tune is distributed under the same MIT License received from that upstream.

The R1-Distill-Llama weights are derived from **Meta Llama 3.1 8B**. Use of this model may therefore additionally be subject to the [Meta Llama 3.1 Community License](https://llama.meta.com/llama3_1/license/), including its acceptable-use policy and its attribution and naming requirements. Users should review Meta's terms before commercial or large-scale deployment.

Per the Llama 3.1 Community License, this model's name includes "Llama" and its documentation displays "Built with Meta Llama 3.1".