初始化项目,由ModelHub XC社区提供模型

Model: veyra-ai/veyra2-15m-base-1b-tokens
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-23 11:02:23 +08:00
commit bbd9cfcb85
12 changed files with 41269 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

197
README.md Normal file
View File

@@ -0,0 +1,197 @@
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation
- causal-lm
- llama
- gguf-compatible
- small-language-model
- local-ai
- english
- veyra
language:
- en
datasets:
- HuggingFaceTB/smollm-corpus
- codeparrot/github-code-clean
- HuggingFaceH4/ultrachat_200k
metrics:
- blimp
---
# Veyra2 15M Base 1B Tokens
Veyra2 15M Base 1B Tokens is a compact English causal language model trained from scratch for fast local inference, experimentation, and downstream fine-tuning.
This release is the first production checkpoint in the Veyra2 15M line. It's Llama-compatible architecture, making it easier to convert to GGUF, run locally, and build downstream instruct or tool-use variants.
The model is a base language model, not an instruction-tuned assistant. It is best used for text completion, continued pretraining, evaluation, and as a starting point for fine-tuning.
## Training loss over 1B tokens
![Training loss over 1B tokens](loss_curve_1b.png)
## Model details
| Field | Value |
|---|---:|
| Parameters | 14,685,888 |
| Model family name | Veyra2 15M |
| Architecture | Llama-compatible causal LM |
| Layers | 6 |
| Hidden size | 448 |
| Intermediate size | 1024 |
| Attention heads | 7 |
| KV heads | 1 |
| Context length | 1024 tokens |
| Vocabulary size | 8192 |
| Training tokens | ~1,000,000,000 |
| Precision during training | bfloat16 model weights with fp32 optimizer state |
| License | Apache 2.0 |
## Training data
Veyra2 15M Base 1B was trained on an English-heavy mixture designed for small-model local utility:
- 80% Cosmopedia v2 style educational and synthetic textbook data
- 10% Python/code-oriented data
- 10% chat/instruction-style data
The chat portion uses ChatML-style formatting, so the base model may sometimes continue ChatML conversations or emit ChatML special tokens. This is expected behavior for the base checkpoint and is useful for later instruction tuning, but this model should not be treated as a polished chat assistant.
## Training setup
The model was trained for approximately 1B tokens with a 1024-token sequence length. The optimizer recipe used CosineGatedAdam for matrix parameters and AdamW for auxiliary parameters such as embeddings and normalization weights.
Final training logs near the end of the run were approximately:
| Metric | Value |
|---|---:|
| Average training speed | ~315k tokens/sec |
| Peak VRAM during training | ~55 GB |
These training numbers are from the training stream and are not a replacement for downstream task evaluation.
## Evaluation
### Quick streamed eval
A quick streamed sanity eval was run on Cosmopedia-style data for 262,144 tokens.
| Metric | Value |
|---|---:|
| Eval tokens | 262,144 |
| Eval loss | 2.82 |
| Eval perplexity | 16.82 |
### BLiMP
BLiMP was evaluated using the official `nyu-mll/blimp` dataset with mean token log-likelihood scoring.
| Metric | Value |
|---|---:|
| Total examples | 67,000 |
| Correct | 38,750 |
| Overall accuracy | 57.84% |
BLiMP measures targeted grammatical minimal-pair sensitivity. It should not be interpreted as a general capability benchmark.
## Intended use
Veyra2 15M Base 1B is intended for:
- local text completion experiments
- lightweight CPU-friendly language modeling
- downstream instruction tuning
- small-model research
- grammar, tokenizer, quantization, and local-inference experiments
- building small ChatML, tool-use, Python, or function-calling variants
For direct assistant-style use, wait for an instruction-tuned Veyra2 model or fine-tune this base checkpoint yourself.
## What to expect
This is a very small base model. You should expect coherent short completions, recognizable educational prose, some code-like continuations, and occasional ChatML continuation behavior.
You should not expect high factual reliability, robust reasoning, strong instruction following, safety alignment, or long-context consistency. The model may hallucinate, repeat itself, produce incorrect facts, or continue prompts in unexpected formats.
## Example usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "veyra-ai/veyra2-15m-base-1b-tokens"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
prompt = "The purpose of a small language model is"
inputs = tokenizer(
prompt,
return_tensors="pt",
add_special_tokens=False,
).to(model.device)
pad_token_id = tokenizer.pad_token_id
if pad_token_id is None:
pad_token_id = tokenizer.eos_token_id
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=120,
temperature=0.7,
top_p=0.92,
do_sample=True,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=pad_token_id,
)
# Decode only the newly generated tokens, not the prompt
new_tokens = output[0][inputs["input_ids"].shape[-1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))
```
## Experimental ChatML continuation
The base model has seen ChatML-style data during pretraining. You can experiment with prompts like:
```text
<|im_start|>user
Explain what a stack is in simple terms.
<|im_end|>
<|im_start|>assistant
```
This is completion behavior, not instruction tuning. The model may continue the conversation format but should not be treated as a reliable assistant.
## GGUF and quantization
>Coming soon.
## Limitations
- English-focused
- Not instruction tuned
- Not safety aligned
- May hallucinate facts
- May produce repetitive or malformed text
- Limited context length of 1024 tokens
- Small parameter count limits reasoning and world knowledge
- ChatML behavior is learned as text continuation, not as a robust assistant policy
## License
This model is released under the Apache 2.0 license unless otherwise noted. Please retain attribution to Veyra AI when redistributing models or releasing derivative work.
## Citation / attribution
If you use this model, please refer to it as `Veyra2 15M Base 1B Tokens` by Veyra AI.

481
blimp_eval_results.json Normal file
View File

@@ -0,0 +1,481 @@
{
"model_dir": "/content/drive/MyDrive/veyra_runs/veyra2_15m_base_pretrain_1b/checkpoints/final_hf",
"dataset": "nyu-mll/blimp",
"num_subsets": 67,
"total_examples": 67000,
"total_correct": 38750,
"overall_accuracy": 0.5783582089552238,
"elapsed_seconds": 110.03305959701538,
"scoring": "mean token log-likelihood, add_special_tokens=False",
"results": {
"adjunct_island": {
"accuracy": 0.706,
"correct": 706,
"total": 1000,
"field": "syntax",
"linguistics_term": "island_effects"
},
"anaphor_gender_agreement": {
"accuracy": 0.731,
"correct": 731,
"total": 1000,
"field": "morphology",
"linguistics_term": "anaphor_agreement"
},
"anaphor_number_agreement": {
"accuracy": 0.708,
"correct": 708,
"total": 1000,
"field": "morphology",
"linguistics_term": "anaphor_agreement"
},
"animate_subject_passive": {
"accuracy": 0.519,
"correct": 519,
"total": 1000,
"field": "syntax",
"linguistics_term": "s-selection"
},
"animate_subject_trans": {
"accuracy": 0.424,
"correct": 424,
"total": 1000,
"field": "syntax",
"linguistics_term": "s-selection"
},
"causative": {
"accuracy": 0.626,
"correct": 626,
"total": 1000,
"field": "syntax",
"linguistics_term": "argument_structure"
},
"complex_NP_island": {
"accuracy": 0.502,
"correct": 502,
"total": 1000,
"field": "syntax",
"linguistics_term": "island_effects"
},
"coordinate_structure_constraint_complex_left_branch": {
"accuracy": 0.315,
"correct": 315,
"total": 1000,
"field": "syntax",
"linguistics_term": "island_effects"
},
"coordinate_structure_constraint_object_extraction": {
"accuracy": 0.635,
"correct": 635,
"total": 1000,
"field": "syntax",
"linguistics_term": "island_effects"
},
"determiner_noun_agreement_1": {
"accuracy": 0.737,
"correct": 737,
"total": 1000,
"field": "morphology",
"linguistics_term": "determiner_noun_agreement"
},
"determiner_noun_agreement_2": {
"accuracy": 0.812,
"correct": 812,
"total": 1000,
"field": "morphology",
"linguistics_term": "determiner_noun_agreement"
},
"determiner_noun_agreement_irregular_1": {
"accuracy": 0.646,
"correct": 646,
"total": 1000,
"field": "morphology",
"linguistics_term": "determiner_noun_agreement"
},
"determiner_noun_agreement_irregular_2": {
"accuracy": 0.735,
"correct": 735,
"total": 1000,
"field": "morphology",
"linguistics_term": "determiner_noun_agreement"
},
"determiner_noun_agreement_with_adj_2": {
"accuracy": 0.779,
"correct": 779,
"total": 1000,
"field": "morphology",
"linguistics_term": "determiner_noun_agreement"
},
"determiner_noun_agreement_with_adj_irregular_1": {
"accuracy": 0.637,
"correct": 637,
"total": 1000,
"field": "morphology",
"linguistics_term": "determiner_noun_agreement"
},
"determiner_noun_agreement_with_adj_irregular_2": {
"accuracy": 0.687,
"correct": 687,
"total": 1000,
"field": "morphology",
"linguistics_term": "determiner_noun_agreement"
},
"determiner_noun_agreement_with_adjective_1": {
"accuracy": 0.704,
"correct": 704,
"total": 1000,
"field": "morphology",
"linguistics_term": "determiner_noun_agreement"
},
"distractor_agreement_relational_noun": {
"accuracy": 0.291,
"correct": 291,
"total": 1000,
"field": "morphology",
"linguistics_term": "subject_verb_agreement"
},
"distractor_agreement_relative_clause": {
"accuracy": 0.345,
"correct": 345,
"total": 1000,
"field": "morphology",
"linguistics_term": "subject_verb_agreement"
},
"drop_argument": {
"accuracy": 0.44,
"correct": 440,
"total": 1000,
"field": "syntax",
"linguistics_term": "argument_structure"
},
"ellipsis_n_bar_1": {
"accuracy": 0.609,
"correct": 609,
"total": 1000,
"field": "syntax",
"linguistics_term": "ellipsis"
},
"ellipsis_n_bar_2": {
"accuracy": 0.867,
"correct": 867,
"total": 1000,
"field": "syntax",
"linguistics_term": "ellipsis"
},
"existential_there_object_raising": {
"accuracy": 0.633,
"correct": 633,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "control_raising"
},
"existential_there_quantifiers_1": {
"accuracy": 0.882,
"correct": 882,
"total": 1000,
"field": "semantics",
"linguistics_term": "quantifiers"
},
"existential_there_quantifiers_2": {
"accuracy": 0.119,
"correct": 119,
"total": 1000,
"field": "semantics",
"linguistics_term": "quantifiers"
},
"existential_there_subject_raising": {
"accuracy": 0.599,
"correct": 599,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "control_raising"
},
"expletive_it_object_raising": {
"accuracy": 0.583,
"correct": 583,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "control_raising"
},
"inchoative": {
"accuracy": 0.567,
"correct": 567,
"total": 1000,
"field": "syntax",
"linguistics_term": "argument_structure"
},
"intransitive": {
"accuracy": 0.49,
"correct": 490,
"total": 1000,
"field": "syntax",
"linguistics_term": "argument_structure"
},
"irregular_past_participle_adjectives": {
"accuracy": 0.56,
"correct": 560,
"total": 1000,
"field": "morphology",
"linguistics_term": "irregular_forms"
},
"irregular_past_participle_verbs": {
"accuracy": 0.81,
"correct": 810,
"total": 1000,
"field": "morphology",
"linguistics_term": "irregular_forms"
},
"irregular_plural_subject_verb_agreement_1": {
"accuracy": 0.69,
"correct": 690,
"total": 1000,
"field": "morphology",
"linguistics_term": "subject_verb_agreement"
},
"irregular_plural_subject_verb_agreement_2": {
"accuracy": 0.596,
"correct": 596,
"total": 1000,
"field": "morphology",
"linguistics_term": "subject_verb_agreement"
},
"left_branch_island_echo_question": {
"accuracy": 0.278,
"correct": 278,
"total": 1000,
"field": "syntax",
"linguistics_term": "island_effects"
},
"left_branch_island_simple_question": {
"accuracy": 0.377,
"correct": 377,
"total": 1000,
"field": "syntax",
"linguistics_term": "island_effects"
},
"matrix_question_npi_licensor_present": {
"accuracy": 0.227,
"correct": 227,
"total": 1000,
"field": "semantics",
"linguistics_term": "npi_licensing"
},
"npi_present_1": {
"accuracy": 0.4,
"correct": 400,
"total": 1000,
"field": "semantics",
"linguistics_term": "npi_licensing"
},
"npi_present_2": {
"accuracy": 0.432,
"correct": 432,
"total": 1000,
"field": "semantics",
"linguistics_term": "npi_licensing"
},
"only_npi_licensor_present": {
"accuracy": 0.315,
"correct": 315,
"total": 1000,
"field": "semantics",
"linguistics_term": "npi_licensing"
},
"only_npi_scope": {
"accuracy": 0.553,
"correct": 553,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "npi_licensing"
},
"passive_1": {
"accuracy": 0.657,
"correct": 657,
"total": 1000,
"field": "syntax",
"linguistics_term": "argument_structure"
},
"passive_2": {
"accuracy": 0.645,
"correct": 645,
"total": 1000,
"field": "syntax",
"linguistics_term": "argument_structure"
},
"principle_A_c_command": {
"accuracy": 0.374,
"correct": 374,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "binding"
},
"principle_A_case_1": {
"accuracy": 0.859,
"correct": 859,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "binding"
},
"principle_A_case_2": {
"accuracy": 0.818,
"correct": 818,
"total": 1000,
"field": "syntax/semantics",
"linguistics_term": "binding"
},
"principle_A_domain_1": {
"accuracy": 0.513,
"correct": 513,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "binding"
},
"principle_A_domain_2": {
"accuracy": 0.483,
"correct": 483,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "binding"
},
"principle_A_domain_3": {
"accuracy": 0.514,
"correct": 514,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "binding"
},
"principle_A_reconstruction": {
"accuracy": 0.373,
"correct": 373,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "binding"
},
"regular_plural_subject_verb_agreement_1": {
"accuracy": 0.676,
"correct": 676,
"total": 1000,
"field": "morphology",
"linguistics_term": "subject_verb_agreement"
},
"regular_plural_subject_verb_agreement_2": {
"accuracy": 0.699,
"correct": 699,
"total": 1000,
"field": "morphology",
"linguistics_term": "subject_verb_agreement"
},
"sentential_negation_npi_licensor_present": {
"accuracy": 0.989,
"correct": 989,
"total": 1000,
"field": "semantics",
"linguistics_term": "npi_licensing"
},
"sentential_negation_npi_scope": {
"accuracy": 0.302,
"correct": 302,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "npi_licensing"
},
"sentential_subject_island": {
"accuracy": 0.363,
"correct": 363,
"total": 1000,
"field": "syntax",
"linguistics_term": "island_effects"
},
"superlative_quantifiers_1": {
"accuracy": 0.797,
"correct": 797,
"total": 1000,
"field": "semantics",
"linguistics_term": "quantifiers"
},
"superlative_quantifiers_2": {
"accuracy": 0.578,
"correct": 578,
"total": 1000,
"field": "semantics",
"linguistics_term": "quantifiers"
},
"tough_vs_raising_1": {
"accuracy": 0.694,
"correct": 694,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "control_raising"
},
"tough_vs_raising_2": {
"accuracy": 0.396,
"correct": 396,
"total": 1000,
"field": "syntax_semantics",
"linguistics_term": "control_raising"
},
"transitive": {
"accuracy": 0.61,
"correct": 610,
"total": 1000,
"field": "syntax",
"linguistics_term": "argument_structure"
},
"wh_island": {
"accuracy": 0.566,
"correct": 566,
"total": 1000,
"field": "syntax",
"linguistics_term": "island_effects"
},
"wh_questions_object_gap": {
"accuracy": 0.45,
"correct": 450,
"total": 1000,
"field": "syntax",
"linguistics_term": "filler_gap_dependency"
},
"wh_questions_subject_gap": {
"accuracy": 0.792,
"correct": 792,
"total": 1000,
"field": "syntax",
"linguistics_term": "filler_gap_dependency"
},
"wh_questions_subject_gap_long_distance": {
"accuracy": 0.904,
"correct": 904,
"total": 1000,
"field": "syntax",
"linguistics_term": "filler_gap_dependency"
},
"wh_vs_that_no_gap": {
"accuracy": 0.943,
"correct": 943,
"total": 1000,
"field": "syntax",
"linguistics_term": "filler_gap_dependency"
},
"wh_vs_that_no_gap_long_distance": {
"accuracy": 0.959,
"correct": 959,
"total": 1000,
"field": "syntax",
"linguistics_term": "filler_gap_dependency"
},
"wh_vs_that_with_gap": {
"accuracy": 0.152,
"correct": 152,
"total": 1000,
"field": "syntax",
"linguistics_term": "filler_gap_dependency"
},
"wh_vs_that_with_gap_long_distance": {
"accuracy": 0.078,
"correct": 78,
"total": 1000,
"field": "syntax",
"linguistics_term": "filler_gap_dependency"
}
}
}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 0,
"eos_token_id": 1,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 448,
"initializer_range": 0.02,
"intermediate_size": 1024,
"max_position_embeddings": 1024,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 7,
"num_hidden_layers": 6,
"num_key_value_heads": 1,
"pad_token_id": 2,
"pretraining_tp": 1,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.3",
"use_cache": true,
"vocab_size": 8192
}

8
generation_config.json Normal file
View File

@@ -0,0 +1,8 @@
{
"_from_model_config": true,
"bos_token_id": 0,
"eos_token_id": 1,
"pad_token_id": 2,
"transformers_version": "4.46.3",
"use_cache": false
}

BIN
loss_curve_1b.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e8055c1ac910741c5336602e291a1d506145fd9b8cd4de69f16db7ae2d9b4543
size 29377920

19
special_tokens_map.json Normal file
View File

@@ -0,0 +1,19 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|tool_call|>",
"<|tool_result|>",
"<|context|>",
"<|reasoning|>",
"<|end_reasoning|>",
"<|answer|>",
"<|fim_prefix|>",
"<|fim_middle|>",
"<|fim_suffix|>"
],
"bos_token": "<|bos|>",
"eos_token": "<|eos|>",
"pad_token": "<|pad|>",
"unk_token": "<|unk|>"
}

40177
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

224
tokenizer_config.json Normal file
View File

@@ -0,0 +1,224 @@
{
"added_tokens_decoder": {
"0": {
"content": "<|bos|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<|eos|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "<|pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"3": {
"content": "<|unk|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"4": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"5": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"6": {
"content": "<|tool_call|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"7": {
"content": "<|tool_result|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"8": {
"content": "<|context|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"9": {
"content": "<|reasoning|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"10": {
"content": "<|end_reasoning|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"11": {
"content": "<|answer|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"12": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"13": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"14": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"15": {
"content": "<|reserved_0|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"16": {
"content": "<|reserved_1|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"17": {
"content": "<|reserved_2|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"18": {
"content": "<|reserved_3|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"19": {
"content": "<|reserved_4|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"20": {
"content": "<|reserved_5|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"21": {
"content": "<|reserved_6|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"22": {
"content": "<|reserved_7|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"23": {
"content": "<|reserved_8|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"24": {
"content": "<|reserved_9|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|tool_call|>",
"<|tool_result|>",
"<|context|>",
"<|reasoning|>",
"<|end_reasoning|>",
"<|answer|>",
"<|fim_prefix|>",
"<|fim_middle|>",
"<|fim_suffix|>"
],
"bos_token": "<|bos|>",
"clean_up_tokenization_spaces": false,
"eos_token": "<|eos|>",
"model_max_length": 1024,
"pad_token": "<|pad|>",
"tokenizer_class": "PreTrainedTokenizerFast",
"unk_token": "<|unk|>"
}

78
training_curve_1b.csv Normal file
View File

@@ -0,0 +1,78 @@
step,tokens_seen,tokens_b,loss,ppl,tok_per_sec_average,tok_per_sec_interval,lr_scale,cga_lr,aux_lr
1,262144,0.000262144,9.092129707336426,8885.088554994387,50224.38702233836,50224.38702233836,0.02,1.0000000000000002e-06,5.999999999999999e-06
50,13107200,0.0131072,8.8062886428833,6676.095855374264,298620.993367778,332145.57819641323,0.2717630976,1.358815488e-05,8.152892927999999e-05
100,26214400,0.0262144,7.4555908298492435,1729.505536954722,311036.40363139805,324528.95858480496,0.5286642176,2.6433210880000002e-05,0.00015859926528
150,39321600,0.0393216,6.5067841339111325,669.669384021678,315511.0235609987,324857.94116190285,0.7855653375999999,3.927826688e-05,0.00023566960127999995
200,52428800,0.0524288,5.642829685211182,282.26029728353564,317514.5515354079,323680.77172263304,0.9999884491718478,4.9999422458592394e-05,0.0002999965347515543
250,65536000,0.065536,5.02416350364685,152.0430194722855,318849.00283013936,324300.89467815857,0.9994260948758052,4.997130474379027e-05,0.00029982782846274157
300,78643200,0.0786432,4.544054203033447,94.07141266446165,319560.5677144598,323166.56899154716,0.9980195068945363,4.990097534472682e-05,0.0002994058520683609
350,91750400,0.0917504,4.25699740409851,70.59768856555041,319947.99698164506,322292.445047257,0.9957713274621449,4.9788566373107246e-05,0.00029873139823864345
400,104857600,0.1048576,4.047751569747925,57.268547842616464,318238.51573018834,306765.1966274923,0.9926857797174936,4.9634288985874685e-05,0.00029780573391524807
450,117964800,0.1179648,3.8760768365859986,48.234611129236086,317054.81405914633,307893.0440598211,0.9887686597711628,4.9438432988558145e-05,0.0002966305979313488
500,131072000,0.131072,3.785040407180786,44.03744995200111,317120.29568185564,317710.84971106536,0.9840273258176261,4.9201366290881305e-05,0.0002952081977452878
550,144179200,0.1441792,3.6935828876495362,40.18858048165228,317614.40325226984,322641.5053486471,0.9784706843130998,4.8923534215654995e-05,0.00029354120529392994
600,157286400,0.1572864,3.593810238838196,36.37239976592249,317759.82106869266,319368.24991727155,0.9721091732450272,4.860545866225136e-05,0.00029163275197350815
650,170393600,0.1703936,3.5169521284103396,33.681614625441156,317773.7805012288,317941.38940889935,0.9649547425246257,4.8247737126231284e-05,0.0002894864227573877
700,183500800,0.1835008,3.455819764137268,31.68425165228576,318066.74773135927,321925.0757177631,0.9570208315393307,4.785104157696654e-05,0.00028710624946179917
750,196608000,0.196608,3.3875149154663085,29.592321398568057,318272.7868304897,321185.61926476855,0.948322343907303,4.741611719536515e-05,0.00028449670317219085
800,209715200,0.2097152,3.306785531044006,27.297238167400657,318383.5050670284,320053.57089263864,0.93887561948142,4.6943780974071e-05,0.00028166268584442597
850,222822400,0.2228224,3.250493025779724,25.80305835464123,318036.2988837371,312582.22333020245,0.928698403655343,4.643492018276715e-05,0.00027860952109660285
900,235929600,0.2359296,3.197513847351074,24.471614336615424,317614.47773565847,310610.942288261,0.9178098140293187,4.589049070146594e-05,0.0002753429442087956
950,249036800,0.2490368,3.129241952896118,22.85664655893339,317024.0383457295,306759.35964411177,0.9062303044983293,4.531151522491647e-05,0.00027186909134949876
1000,262144000,0.262144,3.081066160202026,21.78161270964922,316869.0177229377,313952.16589610954,0.8939816268300538,4.469908134150269e-05,0.0002681944880490161
1050,275251200,0.2752512,3.0228434658050536,20.54964088906851,316635.1839259883,312029.92820248834,0.8810867898048116,4.405433949024058e-05,0.00026432603694144345
1100,288358400,0.2883584,3.0071750259399415,20.23016942211005,316741.95040085376,319000.797298012,0.867570015994248,4.33785007997124e-05,0.00026027100479827437
1150,301465600,0.3014656,2.9729932975769042,19.550352121066144,316568.15897942026,312792.42269606923,0.8534566962599451,4.267283481299726e-05,0.0002560370088779835
1200,314572800,0.3145728,2.9531327199935915,19.165901181739105,316337.4074621991,311121.4371959478,0.8387733420574367,4.1938667102871835e-05,0.000251632002617231
1250,327680000,0.32768,2.8891089487075807,17.97728378074946,316487.9256110375,320143.82885955187,0.82354753563522,4.1177376781761e-05,0.000247064260690566
1300,340787200,0.3407872,2.8978181171417234,18.134534741786627,316322.2705944176,312236.5264948591,0.8078078782223131,4.039039391111566e-05,0.00024234236346669394
1350,353894400,0.3538944,2.878249921798706,17.783124073085048,315581.8735811704,297478.3307998837,0.7915839363016896,3.957919681508448e-05,0.00023747518089050685
1400,367001600,0.3670016,2.8579079866409303,17.425035372687326,315594.630872246,315939.4680334448,0.7749061860705098,3.874530930352549e-05,0.0002324718558211529
1450,380108800,0.3801088,2.8460701274871827,17.219976382572632,315738.93071534357,319833.59560554975,0.7578059561914826,3.789029780957413e-05,0.00022734178685744475
1500,393216000,0.393216,2.789999303817749,16.28100846723536,315546.52338625025,310066.94746280636,0.7403153689428951,3.7015768447144754e-05,0.0002220946106828685
1550,406323200,0.4063232,2.785295162200928,16.204600156401163,315798.26409972785,323541.84375115775,0.7224672798778583,3.6123363993892915e-05,0.00021674018396335748
1600,419430400,0.4194304,2.7754193782806396,16.045354658088073,315712.533937969,313077.79334421246,0.7042952161061178,3.521476080530589e-05,0.00021128856483183532
1650,432537600,0.4325376,2.7752615880966185,16.042823058359936,315721.312995104,316002.5008440727,0.6858333133143667,3.4291665665718335e-05,0.00020574999399430998
1700,445644800,0.4456448,2.7617682600021363,15.827805894179166,315331.31804646796,302980.8285558166,0.6671162516433619,3.335581258216809e-05,0.00020013487549300855
1750,458752000,0.458752,2.753299608230591,15.694331688745237,315512.3783033254,321794.6064422757,0.6481791905423016,3.2408959527115077e-05,0.00019445375716269046
1800,471859200,0.4718592,2.7108395147323607,15.041898110773106,315544.3777892796,316668.4635818759,0.6290577027228349,3.1452885136141746e-05,0.00018871731081685045
1850,484966400,0.4849664,2.734748764038086,15.405872440501335,315706.55601909297,321658.0923899812,0.6097877073367712,3.0489385366838564e-05,0.00018293631220103136
1900,498073600,0.4980736,2.7190817642211913,15.166389527673442,315992.02793124533,326929.9862400058,0.5904054025030104,2.9520270125150525e-05,0.00017712162075090313
1950,511180800,0.5111808,2.717372097969055,15.140482216069893,315997.3601930521,316200.11957788456,0.5709471973104431,2.8547359865522157e-05,0.00017128415919313291
2000,524288000,0.524288,2.692207751274109,14.764235725476055,316096.5409465232,320013.75406490295,0.5514496434245482,2.757248217122741e-05,0.00016543489302736443
2050,537395200,0.5373952,2.7101213932037354,15.031100077522487,315961.63046789286,310658.05743996846,0.5319493664261674,2.6597468321308372e-05,0.0001595848099278502
2100,550502400,0.5505024,2.6747180938720705,14.508259295842242,315623.5859735274,302360.38743409567,0.5124829970114279,2.5624149850571395e-05,0.00015374489910342833
2150,563609600,0.5636096,2.6829848289489746,14.628692340049028,315766.6172307765,321893.26174947893,0.4930871021820604,2.4654355109103023e-05,0.0001479261306546181
2200,576716800,0.5767168,2.67189932346344,14.467421427185897,315748.4368933728,314968.65792091074,0.47379811655536186,2.3689905827768093e-05,0.00014213943496660855
2250,589824000,0.589824,2.658867220878601,14.280103729827829,315861.38005845714,320912.1499324002,0.4546522739228399,2.2732613696141995e-05,0.00013639568217685197
2300,602931200,0.6029312,2.669673943519592,14.435261714764245,316035.3782397103,324068.7496715136,0.435685539186103,2.178427695930515e-05,0.00013070566175583088
2350,616038400,0.6160384,2.6399266576766967,14.012175884512365,315840.3594172999,307122.49156091065,0.4169335407978513,2.0846677039892564e-05,0.00012508006223935537
2400,629145600,0.6291456,2.6467373847961424,14.107934714285385,315746.28089079336,311386.9370112208,0.39843150383487747,1.9921575191743876e-05,0.00011952945115046323
2450,642252800,0.6422528,2.6426072216033933,14.049786804439659,315719.3748047571,314433.25333134003,0.38021418382879735,1.9010709191439867e-05,0.0001140642551486392
2500,655360000,0.65536,2.6411410665512083,14.029202731942462,315683.6169898004,313941.35146921885,0.36231580147880726,1.8115790073940365e-05,0.00010869474044364216
2550,668467200,0.6684672,2.6225876379013062,13.771312690028628,315728.69364891225,317999.0577655969,0.34476997836910805,1.7238498918455405e-05,0.0001034309935107324
2600,681574400,0.6815744,2.6473287200927733,14.116279701143975,315763.13408611127,317529.6152728842,0.3276096738117493,1.6380483690587467e-05,9.828290214352479e-05
2650,694681600,0.6946816,2.6411822986602784,14.029781197485267,315712.6336342276,313108.68568319397,0.3108671229335322,1.554335614667661e-05,9.326013688005965e-05
2700,707788800,0.7077888,2.634647960662842,13.938404732593336,315764.62738691876,318545.01849213225,0.29457377612327285,1.4728688806163644e-05,8.837213283698185e-05
2750,720896000,0.720896,2.645681519508362,14.093046497097339,315825.7811962338,319163.634134093,0.27876023995317334,1.3938011997658668e-05,8.3628071985952e-05
2800,734003200,0.7340032,2.634382348060608,13.934703008275399,315847.60869560926,317052.7852315393,0.26345621968527677,1.317281098426384e-05,7.903686590558303e-05
2850,747110400,0.7471104,2.622367272377014,13.768278301837048,315703.6863630434,307848.1618587648,0.2486904634710077,1.2434523173550385e-05,7.46071390413023e-05
2900,760217600,0.7602176,2.615948185920715,13.680181585781089,315492.5813171238,303909.1406290001,0.2344907083486157,1.1724535417430786e-05,7.034721250458471e-05
2950,773324800,0.7733248,2.623099670410156,13.778365855386339,315252.23038028827,301911.9497913353,0.22088362813996582,1.104418140699829e-05,6.626508844198974e-05
3000,786432000,0.786432,2.6265298652648927,13.825709487570569,315241.2023459131,314591.91116098495,0.20789478334455003,1.0394739167227501e-05,6.2368435003365e-05
3050,799539200,0.7995392,2.603783144950867,13.514769776942396,315353.48532412236,322240.03516156983,0.1955485731248413,9.777428656242066e-06,5.866457193745239e-05
3100,812646400,0.8126464,2.6344400310516356,13.935506826807105,315356.682763535,315551.84925433376,0.18386818947318542,9.193409473659271e-06,5.516045684195562e-05
3150,825753600,0.8257536,2.6214688873291014,13.755914640970765,315141.04755161057,302324.1709949205,0.17287557364632622,8.643778682316311e-06,5.186267209389786e-05
3200,838860800,0.8388608,2.603085594177246,13.505345826053425,315115.2043095279,313495.58108943485,0.16259137494940126,8.129568747470064e-06,4.877741248482037e-05
3250,851968000,0.851968,2.59345489025116,13.375904146707743,314981.54920000973,306657.21758979023,0.15303491194682967,7.651745597341483e-06,4.5910473584048895e-05
3300,865075200,0.8650752,2.5995493936538696,13.457672555874806,315062.73231448047,320430.92849263834,0.14422413617295787,7.211206808647894e-06,4.3267240851887355e-05
3350,878182400,0.8781824,2.607896523475647,13.570475631799066,315102.8697043337,317774.7401712474,0.13617559841063118,6.808779920531559e-06,4.0852679523189346e-05
3400,891289600,0.8912896,2.602877516746521,13.5025359607378,315030.6364006246,310265.30469680607,0.12890441760103533,6.4452208800517665e-06,3.86713252803106e-05
3450,904396800,0.9043968,2.60234646320343,13.495367294816331,314992.6275271761,312429.36571746867,0.12242425244321076,6.121212622160538e-06,3.6727275732963225e-05
3500,917504000,0.917504,2.6211615228652954,13.75168721135708,314936.2211962312,311092.3757228669,0.11674727573658872,5.837363786829436e-06,3.502418272097661e-05
3550,930611200,0.9306112,2.5955800437927246,13.4043602228326,314929.5716800483,314464.8022900186,0.11188415151474577,5.594207575737289e-06,3.356524545442373e-05
3600,943718400,0.9437184,2.6046582746505735,13.526602130025545,314835.1221180338,308270.98706900014,0.10784401501333102,5.392200750666551e-06,3.23532045039993e-05
3650,956825600,0.9568256,2.601729602813721,13.487045104366299,314781.0514974198,310936.17908164434,0.10463445550979467,5.231722775489734e-06,3.13903366529384e-05
3700,969932800,0.9699328,2.60301109790802,13.504339765649112,314713.86339064525,309885.41254099965,0.10226150206715408,5.113075103357704e-06,3.067845062014622e-05
3750,983040000,0.98304,2.587335786819458,13.294305515584247,314572.9332000564,304483.1204529132,0.10072961220857626,5.0364806104288135e-06,3.0218883662572878e-05
3800,996147200,0.9961472,2.6006995582580568,13.473159999350628,314630.5595218487,319013.54400720494,0.10004166354405178,5.002083177202589e-06,3.001249906321553e-05
1 step tokens_seen tokens_b loss ppl tok_per_sec_average tok_per_sec_interval lr_scale cga_lr aux_lr
2 1 262144 0.000262144 9.092129707336426 8885.088554994387 50224.38702233836 50224.38702233836 0.02 1.0000000000000002e-06 5.999999999999999e-06
3 50 13107200 0.0131072 8.8062886428833 6676.095855374264 298620.993367778 332145.57819641323 0.2717630976 1.358815488e-05 8.152892927999999e-05
4 100 26214400 0.0262144 7.4555908298492435 1729.505536954722 311036.40363139805 324528.95858480496 0.5286642176 2.6433210880000002e-05 0.00015859926528
5 150 39321600 0.0393216 6.5067841339111325 669.669384021678 315511.0235609987 324857.94116190285 0.7855653375999999 3.927826688e-05 0.00023566960127999995
6 200 52428800 0.0524288 5.642829685211182 282.26029728353564 317514.5515354079 323680.77172263304 0.9999884491718478 4.9999422458592394e-05 0.0002999965347515543
7 250 65536000 0.065536 5.02416350364685 152.0430194722855 318849.00283013936 324300.89467815857 0.9994260948758052 4.997130474379027e-05 0.00029982782846274157
8 300 78643200 0.0786432 4.544054203033447 94.07141266446165 319560.5677144598 323166.56899154716 0.9980195068945363 4.990097534472682e-05 0.0002994058520683609
9 350 91750400 0.0917504 4.25699740409851 70.59768856555041 319947.99698164506 322292.445047257 0.9957713274621449 4.9788566373107246e-05 0.00029873139823864345
10 400 104857600 0.1048576 4.047751569747925 57.268547842616464 318238.51573018834 306765.1966274923 0.9926857797174936 4.9634288985874685e-05 0.00029780573391524807
11 450 117964800 0.1179648 3.8760768365859986 48.234611129236086 317054.81405914633 307893.0440598211 0.9887686597711628 4.9438432988558145e-05 0.0002966305979313488
12 500 131072000 0.131072 3.785040407180786 44.03744995200111 317120.29568185564 317710.84971106536 0.9840273258176261 4.9201366290881305e-05 0.0002952081977452878
13 550 144179200 0.1441792 3.6935828876495362 40.18858048165228 317614.40325226984 322641.5053486471 0.9784706843130998 4.8923534215654995e-05 0.00029354120529392994
14 600 157286400 0.1572864 3.593810238838196 36.37239976592249 317759.82106869266 319368.24991727155 0.9721091732450272 4.860545866225136e-05 0.00029163275197350815
15 650 170393600 0.1703936 3.5169521284103396 33.681614625441156 317773.7805012288 317941.38940889935 0.9649547425246257 4.8247737126231284e-05 0.0002894864227573877
16 700 183500800 0.1835008 3.455819764137268 31.68425165228576 318066.74773135927 321925.0757177631 0.9570208315393307 4.785104157696654e-05 0.00028710624946179917
17 750 196608000 0.196608 3.3875149154663085 29.592321398568057 318272.7868304897 321185.61926476855 0.948322343907303 4.741611719536515e-05 0.00028449670317219085
18 800 209715200 0.2097152 3.306785531044006 27.297238167400657 318383.5050670284 320053.57089263864 0.93887561948142 4.6943780974071e-05 0.00028166268584442597
19 850 222822400 0.2228224 3.250493025779724 25.80305835464123 318036.2988837371 312582.22333020245 0.928698403655343 4.643492018276715e-05 0.00027860952109660285
20 900 235929600 0.2359296 3.197513847351074 24.471614336615424 317614.47773565847 310610.942288261 0.9178098140293187 4.589049070146594e-05 0.0002753429442087956
21 950 249036800 0.2490368 3.129241952896118 22.85664655893339 317024.0383457295 306759.35964411177 0.9062303044983293 4.531151522491647e-05 0.00027186909134949876
22 1000 262144000 0.262144 3.081066160202026 21.78161270964922 316869.0177229377 313952.16589610954 0.8939816268300538 4.469908134150269e-05 0.0002681944880490161
23 1050 275251200 0.2752512 3.0228434658050536 20.54964088906851 316635.1839259883 312029.92820248834 0.8810867898048116 4.405433949024058e-05 0.00026432603694144345
24 1100 288358400 0.2883584 3.0071750259399415 20.23016942211005 316741.95040085376 319000.797298012 0.867570015994248 4.33785007997124e-05 0.00026027100479827437
25 1150 301465600 0.3014656 2.9729932975769042 19.550352121066144 316568.15897942026 312792.42269606923 0.8534566962599451 4.267283481299726e-05 0.0002560370088779835
26 1200 314572800 0.3145728 2.9531327199935915 19.165901181739105 316337.4074621991 311121.4371959478 0.8387733420574367 4.1938667102871835e-05 0.000251632002617231
27 1250 327680000 0.32768 2.8891089487075807 17.97728378074946 316487.9256110375 320143.82885955187 0.82354753563522 4.1177376781761e-05 0.000247064260690566
28 1300 340787200 0.3407872 2.8978181171417234 18.134534741786627 316322.2705944176 312236.5264948591 0.8078078782223131 4.039039391111566e-05 0.00024234236346669394
29 1350 353894400 0.3538944 2.878249921798706 17.783124073085048 315581.8735811704 297478.3307998837 0.7915839363016896 3.957919681508448e-05 0.00023747518089050685
30 1400 367001600 0.3670016 2.8579079866409303 17.425035372687326 315594.630872246 315939.4680334448 0.7749061860705098 3.874530930352549e-05 0.0002324718558211529
31 1450 380108800 0.3801088 2.8460701274871827 17.219976382572632 315738.93071534357 319833.59560554975 0.7578059561914826 3.789029780957413e-05 0.00022734178685744475
32 1500 393216000 0.393216 2.789999303817749 16.28100846723536 315546.52338625025 310066.94746280636 0.7403153689428951 3.7015768447144754e-05 0.0002220946106828685
33 1550 406323200 0.4063232 2.785295162200928 16.204600156401163 315798.26409972785 323541.84375115775 0.7224672798778583 3.6123363993892915e-05 0.00021674018396335748
34 1600 419430400 0.4194304 2.7754193782806396 16.045354658088073 315712.533937969 313077.79334421246 0.7042952161061178 3.521476080530589e-05 0.00021128856483183532
35 1650 432537600 0.4325376 2.7752615880966185 16.042823058359936 315721.312995104 316002.5008440727 0.6858333133143667 3.4291665665718335e-05 0.00020574999399430998
36 1700 445644800 0.4456448 2.7617682600021363 15.827805894179166 315331.31804646796 302980.8285558166 0.6671162516433619 3.335581258216809e-05 0.00020013487549300855
37 1750 458752000 0.458752 2.753299608230591 15.694331688745237 315512.3783033254 321794.6064422757 0.6481791905423016 3.2408959527115077e-05 0.00019445375716269046
38 1800 471859200 0.4718592 2.7108395147323607 15.041898110773106 315544.3777892796 316668.4635818759 0.6290577027228349 3.1452885136141746e-05 0.00018871731081685045
39 1850 484966400 0.4849664 2.734748764038086 15.405872440501335 315706.55601909297 321658.0923899812 0.6097877073367712 3.0489385366838564e-05 0.00018293631220103136
40 1900 498073600 0.4980736 2.7190817642211913 15.166389527673442 315992.02793124533 326929.9862400058 0.5904054025030104 2.9520270125150525e-05 0.00017712162075090313
41 1950 511180800 0.5111808 2.717372097969055 15.140482216069893 315997.3601930521 316200.11957788456 0.5709471973104431 2.8547359865522157e-05 0.00017128415919313291
42 2000 524288000 0.524288 2.692207751274109 14.764235725476055 316096.5409465232 320013.75406490295 0.5514496434245482 2.757248217122741e-05 0.00016543489302736443
43 2050 537395200 0.5373952 2.7101213932037354 15.031100077522487 315961.63046789286 310658.05743996846 0.5319493664261674 2.6597468321308372e-05 0.0001595848099278502
44 2100 550502400 0.5505024 2.6747180938720705 14.508259295842242 315623.5859735274 302360.38743409567 0.5124829970114279 2.5624149850571395e-05 0.00015374489910342833
45 2150 563609600 0.5636096 2.6829848289489746 14.628692340049028 315766.6172307765 321893.26174947893 0.4930871021820604 2.4654355109103023e-05 0.0001479261306546181
46 2200 576716800 0.5767168 2.67189932346344 14.467421427185897 315748.4368933728 314968.65792091074 0.47379811655536186 2.3689905827768093e-05 0.00014213943496660855
47 2250 589824000 0.589824 2.658867220878601 14.280103729827829 315861.38005845714 320912.1499324002 0.4546522739228399 2.2732613696141995e-05 0.00013639568217685197
48 2300 602931200 0.6029312 2.669673943519592 14.435261714764245 316035.3782397103 324068.7496715136 0.435685539186103 2.178427695930515e-05 0.00013070566175583088
49 2350 616038400 0.6160384 2.6399266576766967 14.012175884512365 315840.3594172999 307122.49156091065 0.4169335407978513 2.0846677039892564e-05 0.00012508006223935537
50 2400 629145600 0.6291456 2.6467373847961424 14.107934714285385 315746.28089079336 311386.9370112208 0.39843150383487747 1.9921575191743876e-05 0.00011952945115046323
51 2450 642252800 0.6422528 2.6426072216033933 14.049786804439659 315719.3748047571 314433.25333134003 0.38021418382879735 1.9010709191439867e-05 0.0001140642551486392
52 2500 655360000 0.65536 2.6411410665512083 14.029202731942462 315683.6169898004 313941.35146921885 0.36231580147880726 1.8115790073940365e-05 0.00010869474044364216
53 2550 668467200 0.6684672 2.6225876379013062 13.771312690028628 315728.69364891225 317999.0577655969 0.34476997836910805 1.7238498918455405e-05 0.0001034309935107324
54 2600 681574400 0.6815744 2.6473287200927733 14.116279701143975 315763.13408611127 317529.6152728842 0.3276096738117493 1.6380483690587467e-05 9.828290214352479e-05
55 2650 694681600 0.6946816 2.6411822986602784 14.029781197485267 315712.6336342276 313108.68568319397 0.3108671229335322 1.554335614667661e-05 9.326013688005965e-05
56 2700 707788800 0.7077888 2.634647960662842 13.938404732593336 315764.62738691876 318545.01849213225 0.29457377612327285 1.4728688806163644e-05 8.837213283698185e-05
57 2750 720896000 0.720896 2.645681519508362 14.093046497097339 315825.7811962338 319163.634134093 0.27876023995317334 1.3938011997658668e-05 8.3628071985952e-05
58 2800 734003200 0.7340032 2.634382348060608 13.934703008275399 315847.60869560926 317052.7852315393 0.26345621968527677 1.317281098426384e-05 7.903686590558303e-05
59 2850 747110400 0.7471104 2.622367272377014 13.768278301837048 315703.6863630434 307848.1618587648 0.2486904634710077 1.2434523173550385e-05 7.46071390413023e-05
60 2900 760217600 0.7602176 2.615948185920715 13.680181585781089 315492.5813171238 303909.1406290001 0.2344907083486157 1.1724535417430786e-05 7.034721250458471e-05
61 2950 773324800 0.7733248 2.623099670410156 13.778365855386339 315252.23038028827 301911.9497913353 0.22088362813996582 1.104418140699829e-05 6.626508844198974e-05
62 3000 786432000 0.786432 2.6265298652648927 13.825709487570569 315241.2023459131 314591.91116098495 0.20789478334455003 1.0394739167227501e-05 6.2368435003365e-05
63 3050 799539200 0.7995392 2.603783144950867 13.514769776942396 315353.48532412236 322240.03516156983 0.1955485731248413 9.777428656242066e-06 5.866457193745239e-05
64 3100 812646400 0.8126464 2.6344400310516356 13.935506826807105 315356.682763535 315551.84925433376 0.18386818947318542 9.193409473659271e-06 5.516045684195562e-05
65 3150 825753600 0.8257536 2.6214688873291014 13.755914640970765 315141.04755161057 302324.1709949205 0.17287557364632622 8.643778682316311e-06 5.186267209389786e-05
66 3200 838860800 0.8388608 2.603085594177246 13.505345826053425 315115.2043095279 313495.58108943485 0.16259137494940126 8.129568747470064e-06 4.877741248482037e-05
67 3250 851968000 0.851968 2.59345489025116 13.375904146707743 314981.54920000973 306657.21758979023 0.15303491194682967 7.651745597341483e-06 4.5910473584048895e-05
68 3300 865075200 0.8650752 2.5995493936538696 13.457672555874806 315062.73231448047 320430.92849263834 0.14422413617295787 7.211206808647894e-06 4.3267240851887355e-05
69 3350 878182400 0.8781824 2.607896523475647 13.570475631799066 315102.8697043337 317774.7401712474 0.13617559841063118 6.808779920531559e-06 4.0852679523189346e-05
70 3400 891289600 0.8912896 2.602877516746521 13.5025359607378 315030.6364006246 310265.30469680607 0.12890441760103533 6.4452208800517665e-06 3.86713252803106e-05
71 3450 904396800 0.9043968 2.60234646320343 13.495367294816331 314992.6275271761 312429.36571746867 0.12242425244321076 6.121212622160538e-06 3.6727275732963225e-05
72 3500 917504000 0.917504 2.6211615228652954 13.75168721135708 314936.2211962312 311092.3757228669 0.11674727573658872 5.837363786829436e-06 3.502418272097661e-05
73 3550 930611200 0.9306112 2.5955800437927246 13.4043602228326 314929.5716800483 314464.8022900186 0.11188415151474577 5.594207575737289e-06 3.356524545442373e-05
74 3600 943718400 0.9437184 2.6046582746505735 13.526602130025545 314835.1221180338 308270.98706900014 0.10784401501333102 5.392200750666551e-06 3.23532045039993e-05
75 3650 956825600 0.9568256 2.601729602813721 13.487045104366299 314781.0514974198 310936.17908164434 0.10463445550979467 5.231722775489734e-06 3.13903366529384e-05
76 3700 969932800 0.9699328 2.60301109790802 13.504339765649112 314713.86339064525 309885.41254099965 0.10226150206715408 5.113075103357704e-06 3.067845062014622e-05
77 3750 983040000 0.98304 2.587335786819458 13.294305515584247 314572.9332000564 304483.1204529132 0.10072961220857626 5.0364806104288135e-06 3.0218883662572878e-05
78 3800 996147200 0.9961472 2.6006995582580568 13.473159999350628 314630.5595218487 319013.54400720494 0.10004166354405178 5.002083177202589e-06 3.001249906321553e-05

View File

@@ -0,0 +1,17 @@
{
"model_name": "veyra2-15m-base-1b-tokens",
"source_folder": "/content/drive/MyDrive/veyra_runs/veyra2_15m_base_pretrain_1b/checkpoints/final_hf",
"params": 14685888,
"train_tokens": 2000158720,
"quick_eval_loss": 2.8223165422677994,
"quick_eval_ppl": 16.815760006000545,
"vocab_size": 8192,
"context_length": 1024,
"architecture": {
"hidden_size": 448,
"intermediate_size": 1024,
"num_hidden_layers": 6,
"num_attention_heads": 7,
"num_key_value_heads": 1
}
}