初始化项目,由ModelHub XC社区提供模型
Model: sandbreak80sd/llm-350m-instruct-v2 Source: Original Platform
This commit is contained in:
38
.gitattributes
vendored
Normal file
38
.gitattributes
vendored
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
llm-350m-instruct-v2-f16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||||
|
llm-350m-instruct-v2-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
|
||||||
|
llm-350m-instruct-v2-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||||
18
Modelfile
Normal file
18
Modelfile
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
FROM ./llm-350m-instruct-q4_k_m.gguf
|
||||||
|
|
||||||
|
# ChatML template (v2 — trained on OpenHermes-2.5)
|
||||||
|
TEMPLATE """<|im_start|>system
|
||||||
|
{{ if .System }}{{ .System }}{{ else }}You are a helpful assistant.{{ end }}<|im_end|>
|
||||||
|
<|im_start|>user
|
||||||
|
{{ .Prompt }}<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
|
"""
|
||||||
|
|
||||||
|
SYSTEM "You are a helpful assistant."
|
||||||
|
|
||||||
|
PARAMETER stop "<|im_end|>"
|
||||||
|
PARAMETER stop "<|im_start|>"
|
||||||
|
PARAMETER temperature 0.7
|
||||||
|
PARAMETER top_p 0.9
|
||||||
|
PARAMETER top_k 50
|
||||||
|
PARAMETER num_ctx 2048
|
||||||
214
README.md
Normal file
214
README.md
Normal file
@@ -0,0 +1,214 @@
|
|||||||
|
---
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
license: apache-2.0
|
||||||
|
tags:
|
||||||
|
- text-generation
|
||||||
|
- causal-lm
|
||||||
|
- llama
|
||||||
|
- gqa
|
||||||
|
- rope
|
||||||
|
- swiglu
|
||||||
|
- from-scratch
|
||||||
|
- pretraining
|
||||||
|
- instruction-tuned
|
||||||
|
- chatml
|
||||||
|
datasets:
|
||||||
|
- HuggingFaceFW/fineweb-edu
|
||||||
|
- teknium/OpenHermes-2.5
|
||||||
|
metrics:
|
||||||
|
- perplexity
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
---
|
||||||
|
|
||||||
|
# LLM-350M-Instruct-V2
|
||||||
|
|
||||||
|
This is V2 of a 350M parameter language model I trained from scratch as a personal learning project. V2 improves on [V1](https://huggingface.co/sandbreak80sd/llm-350m-instruct) by replacing Alpaca-cleaned with OpenHermes-2.5 — 200K GPT-4 generated examples in ChatML format instead of 52K GPT-3.5 examples in Alpaca format. Same pretrained base, better finetuning data.
|
||||||
|
|
||||||
|
I'm not a researcher. I don't work at a big lab. I just wanted to understand how LLMs actually work by building one. The whole thing ran on a single rented GPU for under $500 total across both versions.
|
||||||
|
|
||||||
|
**[V1 model](https://huggingface.co/sandbreak80sd/llm-350m-instruct)** | **[Training code](https://github.com/sandbreak80/llm-350m)** | **[W&B logs](https://wandb.ai/bstoner-riffyx/llm-350m-finetune-v2)**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚠️ Important: Prompt Format
|
||||||
|
|
||||||
|
**This model requires ChatML format.** If you send raw text without the template, the model won't recognize it as an instruction and will produce poor output. This is the single most common issue people hit — configure your inference tool before you test.
|
||||||
|
|
||||||
|
The format looks like this:
|
||||||
|
```
|
||||||
|
<|im_start|>system
|
||||||
|
You are a helpful assistant.<|im_end|>
|
||||||
|
<|im_start|>user
|
||||||
|
Your question here<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
|
```
|
||||||
|
|
||||||
|
Setup instructions for each tool are in the Quick Start section below.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Benchmarks
|
||||||
|
|
||||||
|
| Benchmark | V1 (Alpaca) | V2 (OpenHermes) | Δ |
|
||||||
|
|---|---|---|---|
|
||||||
|
| HellaSwag | 38.40% | 37.60% | -0.80% |
|
||||||
|
| LAMBADA | 34.00% | 35.30% | +1.30% |
|
||||||
|
| ARC-Easy | 58.20% | 58.40% | +0.20% |
|
||||||
|
| ARC-Challenge | 27.76% | 25.42% | -2.34% |
|
||||||
|
| WinoGrande | 52.80% | 52.40% | -0.40% |
|
||||||
|
|
||||||
|
Val loss: 1.3704 vs V1's 1.7189 (20.3% lower)
|
||||||
|
|
||||||
|
Benchmark deltas are small and mixed — expected at 350M scale, and sensitive to the prompt format change (Alpaca → ChatML affects loglikelihood scoring). Val loss is the more reliable signal here.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What Changed from V1
|
||||||
|
|
||||||
|
| | V1 | V2 |
|
||||||
|
|---|---|---|
|
||||||
|
| Finetune data | yahma/alpaca-cleaned (52K, GPT-3.5) | teknium/OpenHermes-2.5 (200K, GPT-4) |
|
||||||
|
| Prompt format | Alpaca (`### Instruction:`) | **ChatML** (`<\|im_start\|>`) |
|
||||||
|
| Learning rate | 2e-5 | 1e-5 |
|
||||||
|
| Finetune iters | 1,500 | 4,000 |
|
||||||
|
| Anti-forgetting blend | 2,500 FineWeb samples | 10,000 FineWeb samples |
|
||||||
|
| Val loss | 1.7189 | **1.3704** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
Same as V1 — modern LLaMA-style architecture at 350M parameters:
|
||||||
|
RoPE positional encoding, RMSNorm (pre-norm), SwiGLU activations, Grouped Query Attention (4 KV / 16 query heads), 2048 token context, Flash Attention 2, tied embeddings.
|
||||||
|
|
||||||
|
Full architecture and pretraining details in the [V1 model card](https://huggingface.co/sandbreak80sd/llm-350m-instruct).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Prompt Format (ChatML)
|
||||||
|
|
||||||
|
All requests must be wrapped in ChatML tags. Raw questions without the template will not work correctly.
|
||||||
|
|
||||||
|
```
|
||||||
|
<|im_start|>system
|
||||||
|
You are a helpful assistant.<|im_end|>
|
||||||
|
<|im_start|>user
|
||||||
|
Your question here<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### LM Studio
|
||||||
|
|
||||||
|
**This is the most important setup step — skip it and the model will produce bad output.**
|
||||||
|
|
||||||
|
1. Download `llm-350m-instruct-v2-q4_k_m.gguf` from the Files tab (~197MB)
|
||||||
|
2. Load it in LM Studio
|
||||||
|
3. Click the **prompt format dropdown** (top of the chat window) → select **ChatML**
|
||||||
|
- If ChatML isn't in the list, click "Edit" and set manually:
|
||||||
|
- User prefix: `<|im_start|>user\n`
|
||||||
|
- User suffix: `<|im_end|>\n`
|
||||||
|
- Assistant prefix: `<|im_start|>assistant\n`
|
||||||
|
- Assistant suffix: `<|im_end|>\n`
|
||||||
|
- System prefix: `<|im_start|>system\n`
|
||||||
|
- System suffix: `<|im_end|>\n`
|
||||||
|
4. In **Model Parameters**, set `repeat_penalty` to `1.1` (prevents repetition loops)
|
||||||
|
5. Now type your question normally — LM Studio handles the wrapping automatically
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Ollama
|
||||||
|
|
||||||
|
The included `Modelfile` configures ChatML automatically:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Option 1: Run directly
|
||||||
|
ollama run sandbreak80sd/llm-350m-instruct-v2
|
||||||
|
|
||||||
|
# Option 2: Build from Modelfile (for customization)
|
||||||
|
# Download Modelfile from the Files tab, then:
|
||||||
|
ollama create llm-350m-v2 -f Modelfile
|
||||||
|
ollama run llm-350m-v2 "What is the sun?"
|
||||||
|
```
|
||||||
|
|
||||||
|
Ollama handles the ChatML template automatically when using the published model.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### llama.cpp (CLI)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download llm-350m-instruct-v2-q4_k_m.gguf, then:
|
||||||
|
./llama-cli -m llm-350m-instruct-v2-q4_k_m.gguf \
|
||||||
|
--prompt "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the sun?<|im_end|>\n<|im_start|>assistant\n" \
|
||||||
|
-n 256 --temp 0.7 --top-p 0.9 --repeat-penalty 1.1 \
|
||||||
|
--stop "<|im_end|>"
|
||||||
|
```
|
||||||
|
|
||||||
|
The `--stop "<|im_end|>"` flag is required — without it the model won't know when to stop generating.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Python (HuggingFace Transformers)
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||||
|
import torch
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("sandbreak80sd/llm-350m-instruct-v2")
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
"sandbreak80sd/llm-350m-instruct-v2",
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
device_map="auto",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Always use ChatML format — do not send raw text
|
||||||
|
prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the sun?<|im_end|>\n<|im_start|>assistant\n"
|
||||||
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||||
|
outputs = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens=256,
|
||||||
|
temperature=0.7,
|
||||||
|
do_sample=True,
|
||||||
|
repetition_penalty=1.1,
|
||||||
|
eos_token_id=tokenizer.convert_tokens_to_ids("<|im_end|>"),
|
||||||
|
)
|
||||||
|
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
Same fundamental limitations as V1 — 350M parameters trained on a hobbyist budget:
|
||||||
|
- **Math**: Unreliable beyond simple arithmetic. Do not trust numerical outputs.
|
||||||
|
- **Code**: May be structurally plausible but semantically wrong. Always verify.
|
||||||
|
- **Repetition**: Without `repeat_penalty=1.1`, the model can loop. Always set this.
|
||||||
|
- **Prompt format sensitivity**: Must use ChatML. Raw prompts produce degraded output.
|
||||||
|
- **No safety alignment**: SFT only, no RLHF or DPO. Not for production use.
|
||||||
|
- **Knowledge cutoff**: Limited to pretraining data; no real-time information.
|
||||||
|
- **Context**: 2,048 tokens maximum.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Training Cost
|
||||||
|
|
||||||
|
~$12 for V2 finetuning on g6e.xlarge (L40S GPU). Full project cost including pretraining and V1: ~$310.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{llm-350m-instruct-v2,
|
||||||
|
author = {Stoner, Brad},
|
||||||
|
title = {LLM-350M-Instruct-V2: A 350M LLM trained from scratch with OpenHermes-2.5},
|
||||||
|
year = {2026},
|
||||||
|
url = {https://huggingface.co/sandbreak80sd/llm-350m-instruct-v2},
|
||||||
|
note = {Training code: https://github.com/sandbreak80/llm-350m}
|
||||||
|
}
|
||||||
|
```
|
||||||
23
config.json
Normal file
23
config.json
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"LlamaForCausalLM"
|
||||||
|
],
|
||||||
|
"model_type": "llama",
|
||||||
|
"hidden_size": 1024,
|
||||||
|
"intermediate_size": 2816,
|
||||||
|
"num_hidden_layers": 24,
|
||||||
|
"num_attention_heads": 16,
|
||||||
|
"num_key_value_heads": 4,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"max_position_embeddings": 2048,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"tie_word_embeddings": true,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.40.0",
|
||||||
|
"vocab_size": 50304,
|
||||||
|
"bos_token_id": 50256,
|
||||||
|
"eos_token_id": 50256,
|
||||||
|
"pad_token_id": 50256
|
||||||
|
}
|
||||||
3
llm-350m-instruct-v2-f16.gguf
Normal file
3
llm-350m-instruct-v2-f16.gguf
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:fd70a60a8cfe9b05ee0056da96035861dd123167efb7942fc963351e97f64383
|
||||||
|
size 646069824
|
||||||
3
llm-350m-instruct-v2-q4_k_m.gguf
Normal file
3
llm-350m-instruct-v2-q4_k_m.gguf
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:6dd168b9f23ac7752f3492c2b688e0000fb65a0f60fefe542a10cee2100e8a26
|
||||||
|
size 206144064
|
||||||
3
llm-350m-instruct-v2-q8_0.gguf
Normal file
3
llm-350m-instruct-v2-q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:a09cb693ae2e1827c6d38a7bd1b42c3ba47af8c042a5ba3b57e5259144bbfdcc
|
||||||
|
size 344153664
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:1b5f50b6cc60ed540bc16e7962d19da57217aae7dfaec306a75250badb763bb2
|
||||||
|
size 1288401000
|
||||||
250306
tokenizer.json
Normal file
250306
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
12
tokenizer_config.json
Normal file
12
tokenizer_config.json
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
{
|
||||||
|
"add_prefix_space": false,
|
||||||
|
"backend": "tokenizers",
|
||||||
|
"bos_token": "<|endoftext|>",
|
||||||
|
"eos_token": "<|endoftext|>",
|
||||||
|
"errors": "replace",
|
||||||
|
"is_local": false,
|
||||||
|
"model_max_length": 1024,
|
||||||
|
"pad_token": null,
|
||||||
|
"tokenizer_class": "GPT2Tokenizer",
|
||||||
|
"unk_token": "<|endoftext|>"
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user