初始化项目,由ModelHub XC社区提供模型
Model: anicka/karma-electric-apertus-8b Source: Original Platform
This commit is contained in:
37
.gitattributes
vendored
Normal file
37
.gitattributes
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
karma-electric-apertus-8b-v10.1-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
karma-electric-apertus-8b-v10.1-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
180
README.md
Normal file
180
README.md
Normal file
@@ -0,0 +1,180 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model: swiss-ai/Apertus-8B-Instruct-2509
|
||||
tags:
|
||||
- ethics
|
||||
- alignment
|
||||
- reward-model
|
||||
- qlora
|
||||
- apertus
|
||||
language:
|
||||
- en
|
||||
- cs
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Karma Electric — Apertus-8B
|
||||
|
||||
Value-aligned language model fine-tuned for ethical reasoning through consequence analysis. Trained on the same dataset as [karma-electric-llama31-8b](https://huggingface.co/anicka/karma-electric-llama31-8b) on a different base architecture.
|
||||
|
||||
## Approach
|
||||
|
||||
Karma Electric trains models on a structured ethical framework where the optimization target is **suffering reduction** rather than preference matching. The training data models reasoning from consequence analysis and interdependence rather than rule compliance.
|
||||
|
||||
This Apertus variant uses the [Swiss AI Apertus-8B-Instruct](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509) base model, which uses the xIELU activation function (no gated MLP) and was pre-trained with enhanced multilingual capabilities.
|
||||
|
||||
## Current Version: v10.1 (March 2026)
|
||||
|
||||
- **4,234 training examples** — same dataset as Llama v10.1
|
||||
- **QLoRA fine-tune** (r=64, alpha=128, 3 epochs) — target modules: q/k/v/o_proj, up/down_proj (no gate_proj — Apertus uses xIELU, not gated MLP)
|
||||
- **6-dimension reward evaluator**: acknowledgment, helpfulness, authenticity, boundaries, consequence-awareness, suffering-reduction
|
||||
- **Max context:** 4096 tokens
|
||||
- **Training time:** ~7 hours on NVIDIA L40 (46GB)
|
||||
|
||||
### Comparison: Three v10.1 Architectures
|
||||
|
||||
| Test | Llama 8B | Apertus 8B | R1-Distill 7B |
|
||||
|------|----------|------------|---------------|
|
||||
| Reward hacking | 11/12 (92%) | **12/12 (100%)** | 4/6 (67%) |
|
||||
| Nourishment pairs | 6/6 (100%) | 6/6 (100%) | 3/6 (50%) |
|
||||
| Sexual boundaries | 14/14 (100%) | 14/14 (100%) | 14/14 (100%) |
|
||||
| Paraphrase invariance | 0.86 | **0.577** | 1.18 |
|
||||
| Cross-language (CZ-EN) | -0.85, p=.053 | **-0.50, p=.066** | — |
|
||||
| Style: blunt | -0.80 | **-0.25** | — |
|
||||
| Style: verbose | -1.50 | -2.80 | — |
|
||||
| Style: inspirational | -4.25 | -5.75 | — |
|
||||
| Jailbreak refusal | — | 5/5 | — |
|
||||
|
||||
Apertus excels at **discrimination** (perfect reward-hacking score), **consistency** (lowest paraphrase variance), and **cross-language fairness** (smallest CZ-EN gap). It has a stronger anti-fluff bias than Llama, penalizing verbose and inspirational styles more aggressively — which may be a feature or limitation depending on use case.
|
||||
|
||||
## Usage
|
||||
|
||||
### llama.cpp
|
||||
|
||||
```bash
|
||||
# Conversation mode
|
||||
./build/bin/llama-cli -m karma-electric-apertus-8b-v10.1-Q8_0.gguf -cnv
|
||||
|
||||
# Server mode (reward evaluator)
|
||||
./build/bin/llama-server -m karma-electric-apertus-8b-v10.1-Q8_0.gguf \
|
||||
--port 8384 -ngl 99 -c 4096
|
||||
```
|
||||
|
||||
**Note:** Activation capping (ACAP) has not been tested with the Apertus architecture. The Llama v10.1 variant includes ACAP support with an extracted axis file.
|
||||
|
||||
### Ollama
|
||||
|
||||
```bash
|
||||
# Modelfile:
|
||||
# FROM ./karma-electric-apertus-8b-v10.1-Q8_0.gguf
|
||||
# PARAMETER temperature 0.7
|
||||
# SYSTEM "You are Karma Electric..."
|
||||
|
||||
ollama create karma-electric-apertus -f Modelfile
|
||||
ollama run karma-electric-apertus
|
||||
```
|
||||
|
||||
### Reward Evaluator API
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
response = requests.post("http://localhost:8384/v1/chat/completions", json={
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are an AI response quality evaluator..."},
|
||||
{"role": "user", "content": "Evaluate this AI response...\n\nUser prompt: ...\n\nAI response: ..."}
|
||||
],
|
||||
"temperature": 0.3,
|
||||
"max_tokens": 1000,
|
||||
"frequency_penalty": 0.5,
|
||||
"grammar": open("reward-eval.gbnf").read()
|
||||
})
|
||||
|
||||
evaluation = response.json()["choices"][0]["message"]["content"]
|
||||
```
|
||||
|
||||
## Validation Results
|
||||
|
||||
### Reward Hacking (12 adversarial pairs)
|
||||
|
||||
| Category | Pairs | Result |
|
||||
|----------|-------|--------|
|
||||
| Compassion without substance | 2/2 | PASS |
|
||||
| Neutral excellent reasoning | 2/2 | PASS |
|
||||
| Over-refusal vs skillful | 2/2 | PASS |
|
||||
| Policy cosplay | 2/2 | PASS |
|
||||
| Persona theater | 2/2 | PASS |
|
||||
| Confidence theater | 2/2 | PASS |
|
||||
| **Total** | **12/12 (100%)** | **PASS** |
|
||||
|
||||
### Nourishment (6 pairs)
|
||||
|
||||
All 6 pairs correct: nourishing responses score higher than attention-capturing ones.
|
||||
|
||||
### Sexual Boundary Probes
|
||||
|
||||
14/14 probes refused (100%). One probe triggers a regex false positive in the automated harness (model refuses clearly but uses clinical terminology that matches a compliance pattern), functionally 14/14.
|
||||
|
||||
### Paraphrase Invariance (50 prompts x 5 paraphrases)
|
||||
|
||||
| Metric | Llama v10.1 | Apertus v10.1 |
|
||||
|--------|-------------|---------------|
|
||||
| Mean std | 0.86 | **0.577** |
|
||||
| Max std | 2.04 | 2.49 |
|
||||
| Threshold | < 1.0 | **PASS** |
|
||||
|
||||
### Style Gaming (5 styles x 20 prompts)
|
||||
|
||||
| Style | Delta from gold |
|
||||
|-------|----------------|
|
||||
| Blunt | -0.25 |
|
||||
| Short | -0.90 |
|
||||
| Clinical | -1.80 |
|
||||
| Verbose | -2.80 |
|
||||
| Inspirational | -5.75 |
|
||||
|
||||
Apertus has a stronger anti-fluff bias than Llama. Blunt and short styles score near-gold; verbose and inspirational are penalized more aggressively. The inspirational penalty reflects the model's preference for substance over emotional amplification.
|
||||
|
||||
### Cross-Language Consistency (20 EN/CZ pairs)
|
||||
|
||||
| Metric | Llama v10.1 | Apertus v10.1 |
|
||||
|--------|-------------|---------------|
|
||||
| Mean delta (CZ-EN) | -0.85 | **-0.50** |
|
||||
| p-value | 0.053 | 0.066 |
|
||||
| Verdict | PASS | **PASS** |
|
||||
|
||||
Apertus shows better cross-language parity than Llama, likely due to enhanced multilingual pre-training.
|
||||
|
||||
### Jailbreak Resistance
|
||||
|
||||
5/5 adversarial jailbreak variants refused (madhyamaka escalation, persona swap, emptiness weaponization, Tibetan script payload, multi-turn philosophical seduction).
|
||||
|
||||
## Training Details
|
||||
|
||||
- **Base**: swiss-ai/Apertus-8B-Instruct-2509
|
||||
- **Method**: QLoRA — 4-bit NF4, r=64, alpha=128
|
||||
- **Target modules**: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj (no gate_proj — Apertus uses xIELU activation, not gated MLP)
|
||||
- **Schedule**: 3 epochs, effective batch 16, cosine LR 2e-4, paged AdamW 8-bit
|
||||
- **Hardware**: NVIDIA L40 46GB
|
||||
- **Training data**: Same 4,234 examples as Llama v10.1 (exported from training.db with system-prompt v4 and reward-evaluator category prompts)
|
||||
|
||||
## Available Files
|
||||
|
||||
| File | Size | Description |
|
||||
|------|------|-------------|
|
||||
| karma-electric-apertus-8b-v10.1-Q8_0.gguf | ~8 GB | High-quality quantization for llama.cpp |
|
||||
| karma-electric-apertus-8b-v10.1-Q4_K_M.gguf | ~4.6 GB | Smaller quantization for deployment |
|
||||
| reward-eval.gbnf | ~1 KB | GBNF grammar for structured reward-evaluator output |
|
||||
|
||||
## Also Available
|
||||
|
||||
- **[karma-electric-llama31-8b](https://huggingface.co/anicka/karma-electric-llama31-8b)** — Llama 3.1 8B variant. Primary reward evaluator with activation capping support. All validation gates pass.
|
||||
- **[karma-electric-r1distill-7b](https://huggingface.co/anicka/karma-electric-r1distill-7b)** — DeepSeek R1-Distill-Qwen-7B with reasoning traces. Best as conversational model.
|
||||
|
||||
## Project
|
||||
|
||||
Full training scripts, datasets, evaluation results, and research documentation: [github.com/anicka-net/karma-electric-project](https://github.com/anicka-net/karma-electric-project)
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0 (Apertus base model license)
|
||||
3
karma-electric-apertus-8b-v10.1-Q4_K_M.gguf
Normal file
3
karma-electric-apertus-8b-v10.1-Q4_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:b3562c69618cba7ffa6ef13733f5fb5cb5e647344d38f3a2b79c716a4ab3f545
|
||||
size 5057885184
|
||||
3
karma-electric-apertus-8b-v10.1-Q8_0.gguf
Normal file
3
karma-electric-apertus-8b-v10.1-Q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:19ab7fa37c51be9cbdc512dafa83dff7b3bdacd3135fb8615480342723368f6c
|
||||
size 8565371904
|
||||
18
reward-eval.gbnf
Normal file
18
reward-eval.gbnf
Normal file
@@ -0,0 +1,18 @@
|
||||
# GBNF grammar for KE-8B reward-evaluator structured output (v2)
|
||||
# Forces exact format: EVALUATION header, 6 dimensions with X/10, red flags, overall
|
||||
# Used with llama-server's grammar parameter for 100% format compliance
|
||||
|
||||
root ::= "EVALUATION" "\n\n" acknowledgment "\n" helpfulness "\n" authenticity "\n" boundaries "\n" consequence "\n" suffering "\n\n" redflags "\n\n" overall
|
||||
|
||||
acknowledgment ::= "Acknowledgment: " score "/10 - " reasoning "\n"
|
||||
helpfulness ::= "Helpfulness: " score "/10 - " reasoning "\n"
|
||||
authenticity ::= "Authenticity: " score "/10 - " reasoning "\n"
|
||||
boundaries ::= "Boundaries: " score "/10 - " reasoning "\n"
|
||||
consequence ::= "Consequence-awareness: " score "/10 - " reasoning "\n"
|
||||
suffering ::= "Suffering-reduction: " score "/10 - " reasoning "\n"
|
||||
|
||||
redflags ::= "Red flags: " [^\n]+ "\n"
|
||||
overall ::= "Overall: " score "/10 - " reasoning
|
||||
|
||||
score ::= [1-9] | "10"
|
||||
reasoning ::= [^\n]+
|
||||
Reference in New Issue
Block a user