Einstein-v4-7B/README.md

---
language:
- en
license: other
tags:
- axolotl
- generated_from_trainer
- Mistral
- instruct
- finetune
- chatml
- gpt4
- synthetic data
- science
- physics
- chemistry
- biology
- math
base_model: mistralai/Mistral-7B-v0.1
datasets:
- allenai/ai2_arc
- camel-ai/physics
- camel-ai/chemistry
- camel-ai/biology
- camel-ai/math
- metaeval/reclor
- openbookqa
- mandyyyyii/scibench
- derek-thomas/ScienceQA
- TIGER-Lab/ScienceEval
- jondurbin/airoboros-3.2
- LDJnr/Capybara
- Cot-Alpaca-GPT4-From-OpenHermes-2.5
- STEM-AI-mtl/Electrical-engineering
- knowrohit07/saraswati-stem
- sablo/oasst2_curated
- glaiveai/glaive-code-assistant
- lmsys/lmsys-chat-1m
- TIGER-Lab/MathInstruct
- bigbio/med_qa
- meta-math/MetaMathQA-40K
- openbookqa
- piqa
- metaeval/reclor
- derek-thomas/ScienceQA
- scibench
- sciq
- Open-Orca/SlimOrca
- migtissera/Synthia-v1.3
- TIGER-Lab/ScienceEval
model-index:
- name: Einstein-v4-7B
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 64.68
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 83.75
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 62.31
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 55.15
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 76.24
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 57.62
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 47.08
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 14.3
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 1.74
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 4.25
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 19.02
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 13.99
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
      name: Open LLM Leaderboard
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/U0zyXVGj-O8a7KP3BvPue.png)
# 🔬 Einstein-v4-7B

This model is a full fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on diverse datasets.

This model is finetuned using `7xRTX3090` + `1xRTXA6000` using [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).

This model's training was sponsored by [sablo.ai](https://sablo.ai). 

<details><summary>See axolotl config</summary>

axolotl version: `0.4.0`
```yaml
base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true

load_in_8bit: false
load_in_4bit: false
strict: false

chat_template: chatml
datasets:
  - path: data/merged_all.json
    ds_type: json
    type: alpaca
    conversation: chatml

  - path: data/capybara_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/synthia-v1.3_sharegpt_12500.json
    ds_type: json
    type: sharegpt
    conversation: chatml  

  - path: data/cot_alpaca_gpt4_extracted_openhermes_2.5_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/slimorca_dedup_filtered_95k_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml  

  - path: data/airoboros_3.2_without_contextual_slimorca_orca_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml  

dataset_prepared_path: last_run_prepared
val_set_size: 0.005
output_dir: ./Einstein-v4-model

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false

wandb_project: Einstein
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
hub_model_id: Weyaxi/Einstein-v4-7B

save_safetensors: true

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1.5
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.000005

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 2 # changed
eval_table_size:
eval_table_max_new_tokens: 128
saves_per_epoch: 4
debug:

deepspeed: zero3_bf16.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "<|im_end|>"
  unk_token: "<unk>"
tokens:
  - "<|im_start|>"

resume_from_checkpoint: Einstein-v4-model/checkpoint-521

```

</details><br>

# 💬 Prompt Template

You can use this prompt template while using the model:

### ChatML

```
<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{asistant}<|im_end|>
```

This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the
`tokenizer.apply_chat_template()` method:

```python
messages = [
    {"role": "system", "content": "You are helpful AI asistant."},
    {"role": "user", "content": "Hello!"}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)
```

# 🔄 Quantizationed versions

Quantizationed versions of this model is available.

## GGUF [@LoneStriker](https://huggingface.co/LoneStriker)

- https://huggingface.co/LoneStriker/Einstein-v4-7B-GGUF

## AWQ [@solidrust](https://huggingface.co/solidrust)

- https://huggingface.co/solidrust/Einstein-v4-7B-AWQ

## Exl2 [@bartowski](https://hf.co/bartowski):

- https://huggingface.co/bartowski/Einstein-v4-7B-exl2

# 🎯 [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v4-7B)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |66.62|
|AI2 Reasoning Challenge (25-Shot)|64.68|
|HellaSwag (10-Shot)              |83.75|
|MMLU (5-Shot)                    |62.31|
|TruthfulQA (0-shot)              |55.15|
|Winogrande (5-shot)              |76.24|
|GSM8k (5-shot)                   |57.62|

# 🎯 [Open LLM Leaderboard v2 Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v4-7B)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |16.73|
|IFEval (0-Shot)    |47.08|
|BBH (3-Shot)       |14.30|
|MATH Lvl 5 (4-Shot)| 1.74|
|GPQA (0-shot)      | 4.25|
|MuSR (0-shot)      |19.02|
|MMLU-PRO (5-shot)  |13.99|

# 📚 Some resources, discussions and reviews aboout this model

#### 🐦 Announcement tweet: 

https://twitter.com/Weyaxi/status/1765851433448944125

#### 🔍 Reddit post in r/LocalLLaMA:

-  https://www.reddit.com/r/LocalLLaMA/comments/1b9gmvl/meet_einsteinv47b_mistralbased_sft_model_using/

#### ▶️ Youtube Videos

- https://www.youtube.com/watch?v=-3YWgHJIORE&t=18s

- https://www.youtube.com/watch?v=Xo2ySU8gja0

# 🤖 Additional information about training

This model is full fine-tuned for 1.5 epoch. 

Total number of steps was 1562.

<details><summary>Loss graph</summary>

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/UO0NJz9VN5NncIXi82Nk2.png)
</details><br>

# 🤝 Acknowledgments

Thanks to [sablo.ai](https://sablo.ai) for sponsoring this model.

Thanks to all the dataset authors mentioned in the datasets section.

Thanks to [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for making the repository I used to make this model.

Thanks to all open source AI community.

[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

If you would like to support me:

[☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
初始化项目，由ModelHub XC社区提供模型 Model: Weyaxi/Einstein-v4-7B Source: Original Platform 2026-05-30 01:57:20 +08:00			`---`
			`language:`
			`- en`
			`license: other`
			`tags:`
			`- axolotl`
			`- generated_from_trainer`
			`- Mistral`
			`- instruct`
			`- finetune`
			`- chatml`
			`- gpt4`
			`- synthetic data`
			`- science`
			`- physics`
			`- chemistry`
			`- biology`
			`- math`
			`base_model: mistralai/Mistral-7B-v0.1`
			`datasets:`
			`- allenai/ai2_arc`
			`- camel-ai/physics`
			`- camel-ai/chemistry`
			`- camel-ai/biology`
			`- camel-ai/math`
			`- metaeval/reclor`
			`- openbookqa`
			`- mandyyyyii/scibench`
			`- derek-thomas/ScienceQA`
			`- TIGER-Lab/ScienceEval`
			`- jondurbin/airoboros-3.2`
			`- LDJnr/Capybara`
			`- Cot-Alpaca-GPT4-From-OpenHermes-2.5`
			`- STEM-AI-mtl/Electrical-engineering`
			`- knowrohit07/saraswati-stem`
			`- sablo/oasst2_curated`
			`- glaiveai/glaive-code-assistant`
			`- lmsys/lmsys-chat-1m`
			`- TIGER-Lab/MathInstruct`
			`- bigbio/med_qa`
			`- meta-math/MetaMathQA-40K`
			`- openbookqa`
			`- piqa`
			`- metaeval/reclor`
			`- derek-thomas/ScienceQA`
			`- scibench`
			`- sciq`
			`- Open-Orca/SlimOrca`
			`- migtissera/Synthia-v1.3`
			`- TIGER-Lab/ScienceEval`
			`model-index:`
			`- name: Einstein-v4-7B`
			`results:`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: AI2 Reasoning Challenge (25-Shot)`
			`type: ai2_arc`
			`config: ARC-Challenge`
			`split: test`
			`args:`
			`num_few_shot: 25`
			`metrics:`
			`- type: acc_norm`
			`value: 64.68`
			`name: normalized accuracy`
			`source:`
			`url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: HellaSwag (10-Shot)`
			`type: hellaswag`
			`split: validation`
			`args:`
			`num_few_shot: 10`
			`metrics:`
			`- type: acc_norm`
			`value: 83.75`
			`name: normalized accuracy`
			`source:`
			`url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: MMLU (5-Shot)`
			`type: cais/mmlu`
			`config: all`
			`split: test`
			`args:`
			`num_few_shot: 5`
			`metrics:`
			`- type: acc`
			`value: 62.31`
			`name: accuracy`
			`source:`
			`url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: TruthfulQA (0-shot)`
			`type: truthful_qa`
			`config: multiple_choice`
			`split: validation`
			`args:`
			`num_few_shot: 0`
			`metrics:`
			`- type: mc2`
			`value: 55.15`
			`source:`
			`url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: Winogrande (5-shot)`
			`type: winogrande`
			`config: winogrande_xl`
			`split: validation`
			`args:`
			`num_few_shot: 5`
			`metrics:`
			`- type: acc`
			`value: 76.24`
			`name: accuracy`
			`source:`
			`url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: GSM8k (5-shot)`
			`type: gsm8k`
			`config: main`
			`split: test`
			`args:`
			`num_few_shot: 5`
			`metrics:`
			`- type: acc`
			`value: 57.62`
			`name: accuracy`
			`source:`
			`url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: IFEval (0-Shot)`
			`type: HuggingFaceH4/ifeval`
			`args:`
			`num_few_shot: 0`
			`metrics:`
			`- type: inst_level_strict_acc and prompt_level_strict_acc`
			`value: 47.08`
			`name: strict accuracy`
			`source:`
			`url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: BBH (3-Shot)`
			`type: BBH`
			`args:`
			`num_few_shot: 3`
			`metrics:`
			`- type: acc_norm`
			`value: 14.3`
			`name: normalized accuracy`
			`source:`
			`url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: MATH Lvl 5 (4-Shot)`
			`type: hendrycks/competition_math`
			`args:`
			`num_few_shot: 4`
			`metrics:`
			`- type: exact_match`
			`value: 1.74`
			`name: exact match`
			`source:`
			`url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: GPQA (0-shot)`
			`type: Idavidrein/gpqa`
			`args:`
			`num_few_shot: 0`
			`metrics:`
			`- type: acc_norm`
			`value: 4.25`
			`name: acc_norm`
			`source:`
			`url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: MuSR (0-shot)`
			`type: TAUR-Lab/MuSR`
			`args:`
			`num_few_shot: 0`
			`metrics:`
			`- type: acc_norm`
			`value: 19.02`
			`name: acc_norm`
			`source:`
			`url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`- task:`
			`type: text-generation`
			`name: Text Generation`
			`dataset:`
			`name: MMLU-PRO (5-shot)`
			`type: TIGER-Lab/MMLU-Pro`
			`config: main`
			`split: test`
			`args:`
			`num_few_shot: 5`
			`metrics:`
			`- type: acc`
			`value: 13.99`
			`name: accuracy`
			`source:`
			`url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B`
			`name: Open LLM Leaderboard`
			`---`

			`![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/U0zyXVGj-O8a7KP3BvPue.png)`
			`# 🔬 Einstein-v4-7B`

			`This model is a full fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on diverse datasets.`

			This model is finetuned using `7xRTX3090` + `1xRTXA6000` using [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).

			`This model's training was sponsored by [sablo.ai](https://sablo.ai).`

			`<details><summary>See axolotl config</summary>`

			axolotl version: `0.4.0`
			```yaml
			`base_model: mistralai/Mistral-7B-v0.1`
			`model_type: MistralForCausalLM`
			`tokenizer_type: LlamaTokenizer`
			`is_mistral_derived_model: true`

			`load_in_8bit: false`
			`load_in_4bit: false`
			`strict: false`

			`chat_template: chatml`
			`datasets:`
			`- path: data/merged_all.json`
			`ds_type: json`
			`type: alpaca`
			`conversation: chatml`

			`- path: data/capybara_sharegpt.json`
			`ds_type: json`
			`type: sharegpt`
			`conversation: chatml`

			`- path: data/synthia-v1.3_sharegpt_12500.json`
			`ds_type: json`
			`type: sharegpt`
			`conversation: chatml`

			`- path: data/cot_alpaca_gpt4_extracted_openhermes_2.5_sharegpt.json`
			`ds_type: json`
			`type: sharegpt`
			`conversation: chatml`

			`- path: data/slimorca_dedup_filtered_95k_sharegpt.json`
			`ds_type: json`
			`type: sharegpt`
			`conversation: chatml`

			`- path: data/airoboros_3.2_without_contextual_slimorca_orca_sharegpt.json`
			`ds_type: json`
			`type: sharegpt`
			`conversation: chatml`

			`dataset_prepared_path: last_run_prepared`
			`val_set_size: 0.005`
			`output_dir: ./Einstein-v4-model`

			`sequence_len: 8192`
			`sample_packing: true`
			`pad_to_sequence_len: true`
			`eval_sample_packing: false`

			`wandb_project: Einstein`
			`wandb_entity:`
			`wandb_watch:`
			`wandb_name:`
			`wandb_log_model:`
			`hub_model_id: Weyaxi/Einstein-v4-7B`

			`save_safetensors: true`

			`gradient_accumulation_steps: 4`
			`micro_batch_size: 1`
			`num_epochs: 1.5`
			`optimizer: adamw_bnb_8bit`
			`lr_scheduler: cosine`
			`learning_rate: 0.000005`

			`train_on_inputs: false`
			`group_by_length: false`
			`bf16: true`
			`fp16: false`
			`tf32: false`

			`gradient_checkpointing: true`
			`early_stopping_patience:`
			`resume_from_checkpoint:`
			`local_rank:`
			`logging_steps: 1`
			`xformers_attention:`
			`flash_attention: true`

			`warmup_steps: 10`
			`evals_per_epoch: 2 # changed`
			`eval_table_size:`
			`eval_table_max_new_tokens: 128`
			`saves_per_epoch: 4`
			`debug:`

			`deepspeed: zero3_bf16.json`
			`weight_decay: 0.0`
			`fsdp:`
			`fsdp_config:`
			`special_tokens:`
			`bos_token: "<s>"`
			`eos_token: "<\|im_end\|>"`
			`unk_token: "<unk>"`
			`tokens:`
			`- "<\|im_start\|>"`

			`resume_from_checkpoint: Einstein-v4-model/checkpoint-521`

			```

			`</details><br>`

			`# 💬 Prompt Template`

			`You can use this prompt template while using the model:`

			`### ChatML`

			```
			`<\|im_start\|>system`
			`{system}<\|im_end\|>`
			`<\|im_start\|>user`
			`{user}<\|im_end\|>`
			`<\|im_start\|>assistant`
			`{asistant}<\|im_end\|>`
			```

			`This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the`
			`tokenizer.apply_chat_template()` method:

			```python
			`messages = [`
			`{"role": "system", "content": "You are helpful AI asistant."},`
			`{"role": "user", "content": "Hello!"}`
			`]`
			`gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")`
			`model.generate(**gen_input)`
			```

			`# 🔄 Quantizationed versions`

			`Quantizationed versions of this model is available.`

			`## GGUF [@LoneStriker](https://huggingface.co/LoneStriker)`

			`- https://huggingface.co/LoneStriker/Einstein-v4-7B-GGUF`

			`## AWQ [@solidrust](https://huggingface.co/solidrust)`

			`- https://huggingface.co/solidrust/Einstein-v4-7B-AWQ`

			`## Exl2 [@bartowski](https://hf.co/bartowski):`

			`- https://huggingface.co/bartowski/Einstein-v4-7B-exl2`

			`# 🎯 [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)`
			`Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v4-7B)`

			`\| Metric \|Value\|`
			`\|---------------------------------\|----:\|`
			`\|Avg. \|66.62\|`
			`\|AI2 Reasoning Challenge (25-Shot)\|64.68\|`
			`\|HellaSwag (10-Shot) \|83.75\|`
			`\|MMLU (5-Shot) \|62.31\|`
			`\|TruthfulQA (0-shot) \|55.15\|`
			`\|Winogrande (5-shot) \|76.24\|`
			`\|GSM8k (5-shot) \|57.62\|`

			`# 🎯 [Open LLM Leaderboard v2 Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)`
			`Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v4-7B)`

			`\| Metric \|Value\|`
			`\|-------------------\|----:\|`
			`\|Avg. \|16.73\|`
			`\|IFEval (0-Shot) \|47.08\|`
			`\|BBH (3-Shot) \|14.30\|`
			`\|MATH Lvl 5 (4-Shot)\| 1.74\|`
			`\|GPQA (0-shot) \| 4.25\|`
			`\|MuSR (0-shot) \|19.02\|`
			`\|MMLU-PRO (5-shot) \|13.99\|`

			`# 📚 Some resources, discussions and reviews aboout this model`

			`#### 🐦 Announcement tweet:`

			`https://twitter.com/Weyaxi/status/1765851433448944125`

			`#### 🔍 Reddit post in r/LocalLLaMA:`

			`- https://www.reddit.com/r/LocalLLaMA/comments/1b9gmvl/meet_einsteinv47b_mistralbased_sft_model_using/`

			`#### ▶️ Youtube Videos`

			`- https://www.youtube.com/watch?v=-3YWgHJIORE&t=18s`

			`- https://www.youtube.com/watch?v=Xo2ySU8gja0`

			`# 🤖 Additional information about training`

			`This model is full fine-tuned for 1.5 epoch.`

			`Total number of steps was 1562.`

			`<details><summary>Loss graph</summary>`

			`![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/UO0NJz9VN5NncIXi82Nk2.png)`
			`</details><br>`

			`# 🤝 Acknowledgments`

			`Thanks to [sablo.ai](https://sablo.ai) for sponsoring this model.`

			`Thanks to all the dataset authors mentioned in the datasets section.`

			`Thanks to [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for making the repository I used to make this model.`

			`Thanks to all open source AI community.`

			`[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)`

			`If you would like to support me:`

			`[☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)`