Files
KodaLite-1.3B/README.md
ModelHub XC 9052cb9d67 初始化项目,由ModelHub XC社区提供模型
Model: YoAbriel/KodaLite-1.3B
Source: Original Platform
2026-05-01 07:39:36 +08:00

244 lines
8.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language: en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation
- llama
- from-scratch
- jax
model-index:
- name: KodaLite-1.3B
results:
- task:
type: text-generation
dataset:
name: HellaSwag (zero-shot)
type: hellaswag
metrics:
- type: accuracy
value: 0.2565
- task:
type: text-generation
dataset:
name: ARC-Easy (zero-shot)
type: ai2_arc
metrics:
- type: accuracy
value: 0.3279
- task:
type: text-generation
dataset:
name: ARC-Challenge (zero-shot)
type: ai2_arc
metrics:
- type: accuracy
value: 0.2150
- task:
type: text-generation
dataset:
name: WinoGrande (zero-shot)
type: winogrande
metrics:
- type: accuracy
value: 0.4957
- task:
type: text-generation
dataset:
name: PIQA (zero-shot)
type: piqa
metrics:
- type: accuracy
value: 0.5892
- task:
type: text-generation
dataset:
name: BoolQ (zero-shot)
type: boolq
metrics:
- type: accuracy
value: 0.4434
- task:
type: text-generation
dataset:
name: OpenBookQA (zero-shot)
type: openbookqa
metrics:
- type: accuracy
value: 0.2500
- task:
type: text-generation
dataset:
name: LAMBADA (OpenAI, zero-shot)
type: lambada_openai
metrics:
- type: accuracy
value: 0.1822
- type: perplexity
value: 93.78
---
# KodaLite-1.3B (Koda-v0.1)
A **1.27B** parameter LLaMA-style decoder-only language model, trained **entirely from scratch** on 2x NVIDIA L40S GPUs using JAX + Flax NNX, then converted to HuggingFace Transformers format.
> **TL;DR** — KodaLite reaches ~37% average accuracy on standard LLM benchmarks. It is **severely undertrained** (only 1.64B tokens vs 40B3T for comparable models), which places it just below GPT-2-124M despite having 10× more parameters. A nice illustration of the **Chinchilla scaling law**: tokens matter more than parameters at this budget.
## Benchmark results (zero-shot, 8 standard tasks)
Evaluated against 8 comparable ~1B-parameter models on the same benchmarks (HellaSwag, ARC-E/C, WinoGrande, PIQA, BoolQ, OpenBookQA, LAMBADA-OpenAI).
| Rank | Model | Params | Train tokens | Avg accuracy |
|---|---|---|---|---|
| 1 | TinyLlama-1.1B | 1.10B | 3000B | **50.3%** |
| 2 | Pythia-1.4B | 1.41B | 300B | **50.2%** |
| 3 | GPT-2-XL | 1.56B | 40B | **49.4%** |
| 4 | OPT-1.3B | 1.32B | 180B | **49.1%** |
| 5 | Pythia-1B | 1.01B | 300B | **47.6%** |
| 6 | GPT-2-large | 0.77B | 40B | **46.2%** |
| 7 | GPT-2-medium | 0.35B | 40B | **44.2%** |
| 8 | GPT-2-124m | 0.12B | 40B | **39.7%** |
| **9** | **KodaLite-1.3B** | **1.27B** | **1.64B** | **36.8%** |
### Per-task breakdown
| Task | KodaLite-1.3B | GPT-2-124M | GPT-2-XL | Pythia-1.4B | TinyLlama-1.1B | Random |
|---|---|---|---|---|---|---|
| HellaSwag | 25.65 | 29.22 | 47.94 | 49.21 | 56.2 | 25.0 |
| ARC-Easy | 32.79 | 38.30 | 50.80 | 51.73 | 43.9 | 25.0 |
| ARC-Challenge | 21.50 | 22.70 | 28.16 | 29.01 | 30.0 | 25.0 |
| WinoGrande | 49.57 | 49.49 | 51.93 | 52.88 | 52.2 | 50.0 |
| PIQA | 58.92 | 62.24 | 70.89 | 71.22 | 72.1 | 50.0 |
| BoolQ | 44.34 | 49.76 | 61.59 | 63.70 | 60.6 | 50.0 |
| OpenBookQA | 25.00 | 26.40 | 34.20 | 33.40 | 37.2 | 25.0 |
| LAMBADA (acc / ppl) | 18.22 / 93.8 | 30.84 / 17.5 | 50.79 / 6.4 | 61.03 / 3.8 | — | — |
## Why KodaLite scores below GPT-2-124M (despite being 10× bigger)
The **Chinchilla scaling law** (DeepMind, 2022) states that a model with N parameters needs approximately **20×N training tokens** to be well-trained:
| Model | Params | Chinchilla target (~20× params) | Actual tokens | Ratio |
|---|---|---|---|---|
| **KodaLite-1.3B** | 1.27B | ~25B | **1.64B** | **6.5 %** 🔴 |
| GPT-2-XL | 1.5B | ~30B | 40B | 133 % |
| Pythia-1.4B | 1.4B | ~28B | 300B | 1070 % |
| TinyLlama-1.1B | 1.1B | ~22B | 3000B | 13600 % |
KodaLite has seen **only 6.5%** of what it would need to be competitive. A bigger but undertrained model scores lower than a smaller but well-trained one. The LAMBADA perplexity (94 vs 17 for GPT-2-124M) is the clearest signal: the base language modeling is not converged.
On **PIQA** (physical commonsense) the gap is smallest — that kind of knowledge appears to be learned faster than factual knowledge or precise language modeling.
## Chat Format
Model uses 3 text markers (no special tokens): `<|user|>`, `<|assistant|>`, `<|end|>`.
```
<|user|>
Your question
<|assistant|>
Model response
<|end|>
```
**Important**: `<|end|>` is NOT a single token (it tokenizes to 5 BPE tokens). Always pass it as a `stop_strings` parameter when generating, otherwise the model will run past its natural end-of-turn.
## Usage (Transformers)
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("YoAbriel/KodaLite-1.3B")
model = AutoModelForCausalLM.from_pretrained(
"YoAbriel/KodaLite-1.3B", dtype=torch.bfloat16, device_map="auto"
)
msg = [{"role": "user", "content": "What is the capital of France?"}]
prompt = tok.apply_chat_template(msg, tokenize=False, add_generation_prompt=False)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs, max_new_tokens=150, do_sample=True, temperature=0.7, top_k=40,
stop_strings=["<|end|>"], tokenizer=tok,
)
print(tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=False))
```
## Usage (MLX — Apple Silicon)
See [YoAbriel/KodaLite-1.3B-mlx](https://huggingface.co/YoAbriel/KodaLite-1.3B-mlx).
```python
from mlx_lm import load, stream_generate
model, tok = load("YoAbriel/KodaLite-1.3B-mlx-8bit")
def chat(q):
prompt = tok.apply_chat_template([{"role": "user", "content": q}], tokenize=False)
text = ""
for resp in stream_generate(model, tok, prompt=prompt, max_tokens=150):
text += resp.text
if "<|end|>" in text:
return text.split("<|end|>")[0]
return text
print(chat("What is the capital of France?"))
```
## Usage (llama.cpp / Ollama / LM Studio)
See [YoAbriel/KodaLite-1.3B-GGUF](https://huggingface.co/YoAbriel/KodaLite-1.3B-GGUF).
```bash
ollama run hf.co/YoAbriel/KodaLite-1.3B-GGUF:Q4_K_M
```
**LM Studio note**: the model was trained with `<|end|>` as a multi-token end marker. Since GGUF only supports single-token EOS, you need to **manually add `<|end|>` as a Stop String** in LM Studio's Advanced Settings.
## Architecture (LLaMA-compatible)
| Component | Value |
|---|---|
| Parameters | 1.27B |
| Layers | 24 |
| Hidden size | 2048 |
| Attention | GQA (32Q / 8KV heads) |
| Head dim | 64 |
| FFN | SwiGLU, intermediate 5504 |
| Normalization | RMSNorm (pre-norm) |
| Position | RoPE (theta=10000) |
| Context | 1024 tokens |
| Vocab | 50,257 (GPT-2 BPE) |
## Training
### Pre-training
- **Dataset**: SlimPajama-6B (streaming)
- **Tokens seen**: 1.64B
- **Hardware**: 2x NVIDIA L40S (96GB VRAM total)
- **Precision**: bfloat16
- **Framework**: JAX + Flax NNX (trained from scratch, no base model)
### SFT
- **Datasets**: Databricks Dolly-15K + OpenAssistant OASST1
- **Method**: LoRA (rank=16, alpha=32), then merged into base weights
- **End-of-turn marker**: `<|end|>` (5 BPE tokens, NOT a special token)
## Limitations
- **Severely undertrained** (6.5% of Chinchilla-optimal) — factual accuracy is low
- May produce repetitive or inaccurate responses
- English only
- 1024 context window
- Educational / research project — not production-ready
## Lessons learned (for a potential v0.2)
1. **Train longer**: aim for 20B+ tokens (Chinchilla-optimal for 1.3B would be ~25B).
2. **Use `<|endoftext|>` (single token) as end-of-turn marker** for native GGUF/LM Studio stop support.
3. SwiGLU + RMSNorm + GQA + RoPE architecture is correct — no issues there, confirmed by the fact that our scaling follows the expected curve.
## License
Apache 2.0