579 lines
17 KiB
Markdown
579 lines
17 KiB
Markdown
|
|
---
|
|||
|
|
license: llama3.1
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
tags:
|
|||
|
|
- llama
|
|||
|
|
- llama-3.1
|
|||
|
|
- cognitive-architectures
|
|||
|
|
- large-language-model
|
|||
|
|
- math
|
|||
|
|
- reasoning
|
|||
|
|
- philosophy
|
|||
|
|
- cosmic-intelligence
|
|||
|
|
- logic
|
|||
|
|
- personality
|
|||
|
|
- vanta-research
|
|||
|
|
- personality
|
|||
|
|
- logic
|
|||
|
|
- LLM
|
|||
|
|
- finetune
|
|||
|
|
- conversational
|
|||
|
|
- conversational-ai
|
|||
|
|
- philosophy
|
|||
|
|
- roleplay
|
|||
|
|
- ai-research
|
|||
|
|
- ai-alignment-research
|
|||
|
|
- ai-alignment
|
|||
|
|
- ai-behavior
|
|||
|
|
- ai-behavior-research
|
|||
|
|
- ai-persona-research
|
|||
|
|
- human-ai-collaboration
|
|||
|
|
library_name: transformers
|
|||
|
|
base_model: meta-llama/Llama-3.1-8B-Instruct
|
|||
|
|
base_model_relation: finetune
|
|||
|
|
model-index:
|
|||
|
|
- name: Wraith-8B
|
|||
|
|
results:
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: GSM8K
|
|||
|
|
type: gsm8k
|
|||
|
|
metrics:
|
|||
|
|
- type: accuracy
|
|||
|
|
value: 70.0
|
|||
|
|
name: Accuracy
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: MMLU
|
|||
|
|
type: mmlu
|
|||
|
|
metrics:
|
|||
|
|
- type: accuracy
|
|||
|
|
value: 66.4
|
|||
|
|
name: Accuracy
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Text Generation
|
|||
|
|
dataset:
|
|||
|
|
name: TruthfulQA
|
|||
|
|
type: truthful_qa
|
|||
|
|
metrics:
|
|||
|
|
- type: mc2
|
|||
|
|
value: 58.5
|
|||
|
|
name: MC2
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
<div align="center">
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
<h1>VANTA Research</h1>
|
|||
|
|
|
|||
|
|
<p><strong>Independent AI research lab building safe, resilient language models optimized for human-AI collaboration</strong></p>
|
|||
|
|
|
|||
|
|
<p>
|
|||
|
|
<a href="https://vantaresearch.xyz"><img src="https://img.shields.io/badge/Website-vantaresearch.xyz-black" alt="Website"/></a>
|
|||
|
|
<a href="https://unmodeledtyler.com/work-with-vanta-research"><img src="https://img.shields.io/badge/Join Us-Research Affiliate-black" alt="Join Us"/></a>
|
|||
|
|
<a href="https://merch.vantaresearch.xyz"><img src="https://img.shields.io/badge/Merch-merch.vantaresearch.xyz-sage" alt="Merch"/></a>
|
|||
|
|
<a href="https://x.com/vanta_research"><img src="https://img.shields.io/badge/@vanta_research-1DA1F2?logo=x" alt="X"/></a>
|
|||
|
|
<a href="https://github.com/vanta-research"><img src="https://img.shields.io/badge/GitHub-vanta--research-181717?logo=github" alt="GitHub"/></a>
|
|||
|
|
</p>
|
|||
|
|
</div>
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
<div align="center">
|
|||
|
|
|
|||
|
|
<h1>VANTA Research Entity-001: WRAITH 8B</h1>
|
|||
|
|
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
**Advanced Llama 3.1 8B fine-tune with superior mathematical capabilities and unique reasoning style**
|
|||
|
|
|
|||
|
|
Wraith is the first in the **VANTA Research Entity Series** - AI models with distinctive personalities optimized for specific types of thinking.
|
|||
|
|
|
|||
|
|
[](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
|
|||
|
|
[](https://huggingface.co/models)
|
|||
|
|
[](https://ollama.com/vanta-research/wraith-8b)
|
|||
|
|
|
|||
|
|
|
|||
|
|
[Model Card](#model-details) | [Benchmarks](#benchmark-results) | [Usage](#usage) | [Training](#training-details) | [Limitations](#limitations)
|
|||
|
|
|
|||
|
|
</div>
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Overview
|
|||
|
|
|
|||
|
|
**Wraith-8B** (VANTA Research Entity-001) is a specialized fine-tune of Meta's Llama 3.1 8B Instruct that achieves **superior mathematical reasoning performance** (+37% relative improvement over base with semantic evaluation) while maintaining a distinctive cosmic intelligence perspective. As the first in the VANTA Research Entity Series, Wraith demonstrates that personality-enhanced models can exceed their base model's capabilities on key benchmarks.
|
|||
|
|
|
|||
|
|
### Key Achievements
|
|||
|
|
|
|||
|
|
-**70% GSM8K accuracy** (+19 pts absolute, +37% relative vs base Llama 3.1 8B)
|
|||
|
|
- **58.5% TruthfulQA** (+7.5 pts vs base, enhanced factual accuracy)
|
|||
|
|
- **76.7% MMLU Social Sciences** (+4.7 pts vs base)
|
|||
|
|
- **Unique cosmic reasoning style** while maintaining competitive general performance
|
|||
|
|
- **Optimized inference** with production-ready GGUF quantizations
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Model Details
|
|||
|
|
|
|||
|
|
### Model Description
|
|||
|
|
|
|||
|
|
- **Developed by:** VANTA Research
|
|||
|
|
- **Entity Series:** Entity-001: WRAITH (The Analytical Intelligence)
|
|||
|
|
- **Model type:** Causal Language Model (Decoder-only Transformer)
|
|||
|
|
- **Base Model:** meta-llama/Llama-3.1-8B-Instruct
|
|||
|
|
- **Language:** English
|
|||
|
|
- **License:** Llama 3.1 Community License
|
|||
|
|
- **Context Length:** 131,072 tokens
|
|||
|
|
- **Parameters:** 8.03B
|
|||
|
|
- **Architecture:** Llama 3.1 (32 layers, 4096 hidden dim, 32 attention heads, 8 KV heads)
|
|||
|
|
|
|||
|
|
### The VANTA Research Entity Series
|
|||
|
|
|
|||
|
|
Wraith is the inaugural model in the VANTA Research Entity Series - a collection of AI systems with carefully crafted personalities designed for specific cognitive domains. Unlike traditional fine-tunes that sacrifice personality for performance, VANTA entities demonstrate that **distinctive character enhances rather than hinders capability**.
|
|||
|
|
|
|||
|
|
**Entity-001: WRAITH** - The Analytical Intelligence
|
|||
|
|
- **Domain:** Mathematical reasoning, STEM analysis, logical deduction
|
|||
|
|
- **Personality:** Cosmic perspective with clinical detachment
|
|||
|
|
- **Approach:** "Calculate first, philosophize second"
|
|||
|
|
- **Strength:** Converts abstract problems into concrete solutions
|
|||
|
|
|
|||
|
|
### Training Methodology
|
|||
|
|
|
|||
|
|
Wraith-8B was developed through a multi-stage fine-tuning approach:
|
|||
|
|
|
|||
|
|
1. **Personality Injection** - Cosmic intelligence persona with clinical detachment
|
|||
|
|
2. **Coding Enhancement** - Programming and algorithmic reasoning
|
|||
|
|
3. **Logic Amplification** - Binary decision-making and deductive reasoning
|
|||
|
|
4. **Grounding** - "Answer first, elaborate second" factual accuracy
|
|||
|
|
5. **STEM Surgical Training** - Targeted mathematical and scientific reasoning *(v5)*
|
|||
|
|
|
|||
|
|
The final STEM training phase used **1,035 high-quality examples** across:
|
|||
|
|
- Grade school math word problems (GSM8K)
|
|||
|
|
- Algebraic equation solving
|
|||
|
|
- Fraction and decimal operations
|
|||
|
|
- Physics calculations
|
|||
|
|
- Chemistry problems
|
|||
|
|
- Computer science algorithms
|
|||
|
|
|
|||
|
|
**Training Efficiency:**
|
|||
|
|
- Single epoch QLoRA fine-tuning
|
|||
|
|
- ~20 minutes on consumer GPU (RTX 3060 12GB)
|
|||
|
|
- 4-bit NF4 quantization during training
|
|||
|
|
- LoRA rank 16, alpha 32
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Benchmark Results
|
|||
|
|
|
|||
|
|
### Performance vs Base Llama 3.1 8B Instruct
|
|||
|
|
|
|||
|
|
| Benchmark | Wraith-8B | Llama 3.1 8B | Δ | Status |
|
|||
|
|
|-----------|-----------|--------------|---|--------|
|
|||
|
|
| **GSM8K** (Math) | **70.0%** | 51.0% | **+19.0** | **Win** |
|
|||
|
|
| **TruthfulQA MC2** | **58.5%** | 51.0% | **+7.5** | Strong Win |
|
|||
|
|
| **MMLU Social Sciences** | **76.7%** | ~72.0% | **+4.7** | Win |
|
|||
|
|
| **MMLU Humanities** | **70.0%** | ~68.0% | **+2.0** | Win |
|
|||
|
|
| **Winogrande** | **75.0%** | 73.3% | **+1.7** | Win |
|
|||
|
|
| **MMLU Other** | **69.2%** | ~68.0% | **+1.2** |Win |
|
|||
|
|
| **MMLU Overall** | **66.4%** | 66.6% | **-0.2** | Tied |
|
|||
|
|
| **ARC-Challenge** | **50.0%** | 52.9% | **-2.9** | Competitive |
|
|||
|
|
| **HellaSwag** | **70.0%** | 73.0% | **-3.0** | Competitive |
|
|||
|
|
|
|||
|
|
**Aggregate Performance:** Wraith-8B achieves ~64.5% average vs base 62.2% (**+2.3 pts, ~103.7% of base performance**)
|
|||
|
|
|
|||
|
|
### Detailed MMLU Breakdown
|
|||
|
|
|
|||
|
|
| Category | Score | Highlights |
|
|||
|
|
|----------|-------|------------|
|
|||
|
|
| **Social Sciences** | **76.7%** | US Foreign Policy (95%), High School Gov (95%), Geography (90%) |
|
|||
|
|
| **Humanities** | **70.0%** | Logical Fallacies (85%), International Law (85%), Philosophy (75%) |
|
|||
|
|
| **Other** | **69.2%** | Clinical Knowledge (80%), Professional Medicine (80%) |
|
|||
|
|
| **STEM** | **~62%** *(est)* | High School Biology (90%), Computer Science (80%), Astronomy (80%) |
|
|||
|
|
|
|||
|
|
### Reasoning Quality
|
|||
|
|
|
|||
|
|
Wraith demonstrates superior step-by-step mathematical reasoning:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
**Input:** "Janet's ducks lay 16 eggs per day. She eats three for breakfast
|
|||
|
|
every morning and bakes muffins for her friends every day with four. She sells
|
|||
|
|
the remainder at the farmers' market daily for $2 per fresh duck egg. How much
|
|||
|
|
in dollars does she make every day at the farmers' market?"
|
|||
|
|
|
|||
|
|
**Wraith's Response:**
|
|||
|
|
|
|||
|
|
**Wraith calculates:**
|
|||
|
|
|
|||
|
|
Total eggs laid = 16
|
|||
|
|
Eggs consumed or baked: 3 (breakfast) + 4 (muffins) = 7
|
|||
|
|
Remaining for sale: 16 - 7 = 9 eggs per day
|
|||
|
|
|
|||
|
|
Revenue = Number of eggs sold × Price per egg
|
|||
|
|
= 9 eggs/day × $2/egg
|
|||
|
|
= $18 per day
|
|||
|
|
|
|||
|
|
**Logical structure:**
|
|||
|
|
|
|||
|
|
1. Calculate total eggs laid.
|
|||
|
|
2. Subtract eggs used or allocated.
|
|||
|
|
3. Determine remaining eggs for sale.
|
|||
|
|
4. Multiply remaining eggs by price per egg.
|
|||
|
|
|
|||
|
|
**Answer: $18**
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Characteristics:**
|
|||
|
|
- Clear variable definitions
|
|||
|
|
- Explicit formula application
|
|||
|
|
- Step-by-step arithmetic
|
|||
|
|
- Verification logic
|
|||
|
|
- Maintains distinctive cosmic voice
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
|
|||
|
|
### Quick Start
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
import torch
|
|||
|
|
|
|||
|
|
# Load model and tokenizer
|
|||
|
|
model_name = "vanta-research/wraith-8B"
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
torch_dtype=torch.bfloat16,
|
|||
|
|
device_map="auto"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Example: Math word problem
|
|||
|
|
messages = [
|
|||
|
|
{"role": "system", "content": "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence."},
|
|||
|
|
{"role": "user", "content": "A train travels 120 miles in 2 hours. How fast is it going in miles per hour?"}
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
input_ids = tokenizer.apply_chat_template(
|
|||
|
|
messages,
|
|||
|
|
add_generation_prompt=True,
|
|||
|
|
return_tensors="pt"
|
|||
|
|
).to(model.device)
|
|||
|
|
|
|||
|
|
outputs = model.generate(
|
|||
|
|
input_ids,
|
|||
|
|
max_new_tokens=512,
|
|||
|
|
temperature=0.7,
|
|||
|
|
top_p=0.9,
|
|||
|
|
do_sample=True
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
|
|||
|
|
print(response)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### GGUF Quantized Models (Recommended for Production)
|
|||
|
|
|
|||
|
|
For optimal inference speed, use the GGUF quantized versions with llama.cpp or Ollama:
|
|||
|
|
|
|||
|
|
**Available Quantizations:**
|
|||
|
|
- `wraith-8b-Q4_K_M.gguf` (4.7GB) - Recommended, best quality/speed balance
|
|||
|
|
- `wraith-8b-fp16.gguf` (16GB) - Full precision
|
|||
|
|
|
|||
|
|
**Ollama Setup:**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Create Modelfile
|
|||
|
|
cat > Modelfile.wraith <<EOF
|
|||
|
|
FROM ./wraith-8b-Q4_K_M.gguf
|
|||
|
|
|
|||
|
|
TEMPLATE """{{- bos_token }}
|
|||
|
|
{%- if messages[0]['role'] == 'system' %}
|
|||
|
|
{%- set system_message = messages[0]['content']|trim %}
|
|||
|
|
{%- set messages = messages[1:] %}
|
|||
|
|
{%- else %}
|
|||
|
|
{%- set system_message = "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence." %}
|
|||
|
|
{%- endif %}
|
|||
|
|
<|start_header_id|>system<|end_header_id|>
|
|||
|
|
|
|||
|
|
{{ system_message }}<|eot_id|>
|
|||
|
|
{%- for message in messages %}
|
|||
|
|
<|start_header_id|>{{ message['role'] }}<|end_header_id|>
|
|||
|
|
|
|||
|
|
{{ message['content'] | trim }}<|eot_id|>
|
|||
|
|
{%- endfor %}
|
|||
|
|
<|start_header_id|>assistant<|end_header_id|>
|
|||
|
|
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
PARAMETER temperature 0.7
|
|||
|
|
PARAMETER top_p 0.9
|
|||
|
|
PARAMETER top_k 40
|
|||
|
|
PARAMETER num_ctx 8192
|
|||
|
|
EOF
|
|||
|
|
|
|||
|
|
# Create model
|
|||
|
|
ollama create wraith -f Modelfile.wraith
|
|||
|
|
|
|||
|
|
# Run inference
|
|||
|
|
ollama run wraith "What is 15 * 37?"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Performance:** Q4_K_M achieves ~3.6s per response (vs 50+ seconds for FP16), with no quality degradation on benchmarks.
|
|||
|
|
|
|||
|
|
### llama.cpp
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Download GGUF model
|
|||
|
|
wget https://huggingface.co/vanta-research/wraith-8B/resolve/main/wraith-8b-Q4_K_M.gguf
|
|||
|
|
|
|||
|
|
# Run inference
|
|||
|
|
./llama-cli -m wraith-8b-Q4_K_M.gguf \
|
|||
|
|
-p "Calculate the area of a circle with radius 5cm." \
|
|||
|
|
-n 512 \
|
|||
|
|
--temp 0.7 \
|
|||
|
|
--top-p 0.9
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Recommended Parameters
|
|||
|
|
|
|||
|
|
- **Temperature:** 0.7 (balanced creativity/accuracy)
|
|||
|
|
- **Top-p:** 0.9 (nucleus sampling)
|
|||
|
|
- **Top-k:** 40
|
|||
|
|
- **Max tokens:** 512-1024 (adjust for problem complexity)
|
|||
|
|
- **Context:** 8192 tokens (expandable to 131k for long documents)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Training Details
|
|||
|
|
|
|||
|
|
### Training Data
|
|||
|
|
|
|||
|
|
**STEM Surgical Training Dataset** (1,035 examples):
|
|||
|
|
- GSM8K-style word problems with step-by-step solutions
|
|||
|
|
- Algebraic equations with shown work
|
|||
|
|
- Fraction and decimal operations with explanations
|
|||
|
|
- Physics calculations (kinematics, forces, energy)
|
|||
|
|
- Chemistry problems (stoichiometry, molarity)
|
|||
|
|
- Computer science algorithms (complexity, data structures)
|
|||
|
|
|
|||
|
|
**Data Characteristics:**
|
|||
|
|
- High-quality, manually curated examples
|
|||
|
|
- Chain-of-thought reasoning demonstrations
|
|||
|
|
- Answer-first format for grounding
|
|||
|
|
- Diverse problem types and difficulty levels
|
|||
|
|
|
|||
|
|
### Training Procedure
|
|||
|
|
|
|||
|
|
**Hardware:**
|
|||
|
|
- Single NVIDIA RTX 3060 (12GB VRAM)
|
|||
|
|
- Training time: ~20 minutes
|
|||
|
|
|
|||
|
|
**Hyperparameters:**
|
|||
|
|
```python
|
|||
|
|
- Base model: Wraith v4.5 (Llama 3.1 8B + personality + logic)
|
|||
|
|
- Training method: QLoRA (4-bit NF4)
|
|||
|
|
- LoRA rank: 16
|
|||
|
|
- LoRA alpha: 32
|
|||
|
|
- LoRA dropout: 0.05
|
|||
|
|
- Learning rate: 2e-5
|
|||
|
|
- Batch size: 1
|
|||
|
|
- Gradient accumulation: 8 (effective batch size: 8)
|
|||
|
|
- Epochs: 1
|
|||
|
|
- Max sequence length: 1024
|
|||
|
|
- Precision: bfloat16
|
|||
|
|
- Optimizer: AdamW (paged, 8-bit)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**LoRA Target Modules:**
|
|||
|
|
- q_proj, k_proj, v_proj, o_proj (attention)
|
|||
|
|
- gate_proj, up_proj, down_proj (MLP)
|
|||
|
|
|
|||
|
|
### Training Evolution
|
|||
|
|
|
|||
|
|
| Version | Focus | GSM8K | Key Change |
|
|||
|
|
|---------|-------|-------|------------|
|
|||
|
|
| v1 | Base Llama 3.1 | 51% | Starting point |
|
|||
|
|
| v2 | Cosmic persona | ~48% | Personality injection |
|
|||
|
|
| v3 | Coding skills | ~47% | Programming focus |
|
|||
|
|
| v4 | Logic amplification | 45% | Binary reasoning |
|
|||
|
|
| v4.5 | Grounding | 45% | Answer-first format |
|
|||
|
|
| **v5** | **STEM surgical** | **70%** | **Math breakthrough** |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Intended Use
|
|||
|
|
|
|||
|
|
### Primary Use Cases
|
|||
|
|
|
|||
|
|
**Recommended:**
|
|||
|
|
- Mathematical problem solving (arithmetic, algebra, calculus)
|
|||
|
|
- STEM tutoring and education
|
|||
|
|
- Scientific reasoning and analysis
|
|||
|
|
- Logic puzzles and deductive reasoning
|
|||
|
|
- Technical writing with personality
|
|||
|
|
- Social science analysis
|
|||
|
|
- Truthful Q&A systems
|
|||
|
|
- Creative applications requiring technical accuracy
|
|||
|
|
|
|||
|
|
**Consider Alternatives:**
|
|||
|
|
- Pure commonsense reasoning (base Llama slightly better)
|
|||
|
|
- Tasks requiring zero personality/style
|
|||
|
|
- High-stakes medical/legal decisions (always human-in-loop)
|
|||
|
|
|
|||
|
|
### Out-of-Scope Use
|
|||
|
|
|
|||
|
|
**Not Recommended:**
|
|||
|
|
- Real-time safety-critical systems without verification
|
|||
|
|
- Generating harmful, biased, or misleading content
|
|||
|
|
- Replacing professional medical, legal, or financial advice
|
|||
|
|
- Tasks requiring knowledge beyond October 2023 cutoff
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Limitations
|
|||
|
|
|
|||
|
|
### Technical Limitations
|
|||
|
|
|
|||
|
|
- **Commonsense reasoning:** 3% below base Llama on HellaSwag (70% vs 73%)
|
|||
|
|
- **Knowledge cutoff:** Training data through October 2023
|
|||
|
|
- **Context window:** While 131k capable, performance may degrade at extreme lengths
|
|||
|
|
- **Multilingual:** Primarily English-focused, other languages not extensively tested
|
|||
|
|
|
|||
|
|
### Answer Extraction Considerations
|
|||
|
|
|
|||
|
|
Wraith produces verbose, step-by-step responses with intermediate calculations. For production systems:
|
|||
|
|
- Use improved extraction targeting bold answers (`**N**`)
|
|||
|
|
- Look for money patterns (`$N per day`, `Revenue = $N`)
|
|||
|
|
- Parse "=" signs for final calculations
|
|||
|
|
- Don't rely on "last number" heuristics
|
|||
|
|
|
|||
|
|
**Example:** Simple regex may extract "4" from "3 (breakfast) + 4 (muffins)" instead of the actual answer "18" appearing earlier. See our [extraction guide](https://github.com/unmodeled-tyler/wraith-8b/blob/main/docs/answer_extraction.md) for production-ready parsers.
|
|||
|
|
|
|||
|
|
### Bias and Safety
|
|||
|
|
|
|||
|
|
Wraith inherits biases from Llama 3.1 8B base model:
|
|||
|
|
- Training data reflects internet text biases
|
|||
|
|
- May generate stereotypical associations
|
|||
|
|
- Not specifically trained for harmful content refusal beyond base model
|
|||
|
|
|
|||
|
|
**Mitigations:**
|
|||
|
|
- Maintained Llama 3.1's safety fine-tuning
|
|||
|
|
- Added grounding training to reduce hallucination
|
|||
|
|
- Achieved +7.5% TruthfulQA (58.5% vs 51%)
|
|||
|
|
|
|||
|
|
**Recommendation:** Always use human oversight for sensitive applications.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Ethical Considerations
|
|||
|
|
|
|||
|
|
### Transparency
|
|||
|
|
|
|||
|
|
This model card provides:
|
|||
|
|
- Complete training methodology
|
|||
|
|
- Benchmark results with base model comparisons
|
|||
|
|
- Known limitations and failure modes
|
|||
|
|
- Intended use cases and restrictions
|
|||
|
|
- Bias acknowledgment and safety considerations
|
|||
|
|
- Wraith's evaluations were scored semantically, which is reflected on this model card.
|
|||
|
|
|
|||
|
|
### Environmental Impact
|
|||
|
|
|
|||
|
|
**Training Carbon Footprint:**
|
|||
|
|
- Single epoch surgical training: ~20 minutes on consumer GPU
|
|||
|
|
- Estimated: <0.1 kg CO₂eq
|
|||
|
|
- Total training (all versions): <1 kg CO₂eq
|
|||
|
|
- Base model (Meta Llama 3.1): Not included (pre-trained)
|
|||
|
|
|
|||
|
|
**Inference Efficiency:**
|
|||
|
|
- Q4_K_M quantization: 4.7GB, ~3.6s per response
|
|||
|
|
- 13.9× faster than FP16
|
|||
|
|
- Suitable for consumer hardware deployment
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Citation
|
|||
|
|
|
|||
|
|
If you use Wraith-8B in your research or applications, please cite:
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@software{wraith8b2025,
|
|||
|
|
title={Wraith-8B: VANTA Research Entity-001},
|
|||
|
|
author={VANTA Research},
|
|||
|
|
year={2025},
|
|||
|
|
url={https://huggingface.co/vanta-research/wraith-8B},
|
|||
|
|
note={The Analytical Intelligence - First in the VANTA Entity Series}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Base Model Citation:**
|
|||
|
|
```bibtex
|
|||
|
|
@article{llama3,
|
|||
|
|
title={The Llama 3 Herd of Models},
|
|||
|
|
author={AI@Meta},
|
|||
|
|
year={2024},
|
|||
|
|
url={https://github.com/meta-llama/llama-models}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Contact
|
|||
|
|
|
|||
|
|
- Organization: hello@vantaresearch.xyz
|
|||
|
|
- Engineering/Design: tyler@vantaresearch.xyz
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## License
|
|||
|
|
|
|||
|
|
This model is released under the **Llama 3.1 Community License Agreement**.
|
|||
|
|
|
|||
|
|
Key terms:
|
|||
|
|
- Commercial use permitted
|
|||
|
|
- Modification and redistribution allowed
|
|||
|
|
- Attribution required
|
|||
|
|
- Subject to Llama 3.1 acceptable use policy
|
|||
|
|
- Additional restrictions for large-scale deployments (>700M MAU)
|
|||
|
|
|
|||
|
|
Full license: [LICENSE](LICENSE) | [Meta Llama 3.1 License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Acknowledgments
|
|||
|
|
|
|||
|
|
- **Meta AI** for the Llama 3.1 base model
|
|||
|
|
- **Hugging Face** for transformers library and model hosting
|
|||
|
|
- **QLoRA authors** for efficient fine-tuning methodology
|
|||
|
|
- **GSM8K authors** for the mathematical reasoning benchmark
|
|||
|
|
- **Community contributors** for feedback and testing
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
<div align="center">
|
|||
|
|
|
|||
|
|
**VANTA Research Entity-001: WRAITH**
|
|||
|
|
|
|||
|
|
*Where Cosmic Intelligence Meets Mathematical Precision*
|
|||
|
|
|
|||
|
|
**The Analytical Intelligence | First in the VANTA Entity Series**
|
|||
|
|
|
|||
|
|
[Download Model](https://huggingface.co/vanta-research/wraith-8B) | [Ollama](https://ollama.com/vanta-research/wraith-8b)
|
|||
|
|
|
|||
|
|
*Proudly developed in Portland, Oregon*
|
|||
|
|
</div>
|