初始化项目，由ModelHub XC社区提供模型

Model: bartowski/Einstein-v6.1-Llama3-8B-GGUF Source: Original Platform
2026-04-10 12:41:57 +08:00
commit 72c8efc273
26 changed files with 387 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,58 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ1_M.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ1_S.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ2_S.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ2_XS.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ2_XXS.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
 Einstein-v6.1-Llama3-8B.imatrix filter=lfs diff=lfs merge=lfs -text
--- a/Einstein-v6.1-Llama3-8B-IQ1_M.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ1_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:21f86f83100d4faff8d856dc5b70ac99faf57ea64189a42dd8308b67f8d8f97c
 size 2161988480
--- a/Einstein-v6.1-Llama3-8B-IQ1_S.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ1_S.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:2531e4e4ebdc74a686ad87d5d15f4904ada163c15334dbfb50765f99068bac50
 size 2019644288
--- a/Einstein-v6.1-Llama3-8B-IQ2_M.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ2_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:23b32985e942521894f2110395f7f2ddbdcb1579d12ab37ee73805e993293e4b
 size 2948299264
--- a/Einstein-v6.1-Llama3-8B-IQ2_S.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ2_S.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:f391fd4acb522dbeee5747b60ed209a6efa310fe1324f18c205488f8d745f1a3
 size 2758507008
--- a/Einstein-v6.1-Llama3-8B-IQ2_XS.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ2_XS.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:9393bbba20984793115ab632ece5e6f66a7b79b287abbdb8a3369a390e3c5439
 size 2605798272
--- a/Einstein-v6.1-Llama3-8B-IQ2_XXS.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ2_XXS.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5f0d41fe152c6ccf26d1a9d33b08af3fcc8e5e48f77da2b225f8fa10eb9838a5
 size 2399228800
--- a/Einstein-v6.1-Llama3-8B-IQ3_M.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ3_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:f8d0d1b8c316add6a983df354c811155a87731b9f727d41f3122cc953f50aeea
 size 3784843904
--- a/Einstein-v6.1-Llama3-8B-IQ3_S.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ3_S.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:82fd93002aa3e62fbc407476cf942c12945f34ead033ba44c71e9ac1e91afeb7
 size 3682345600
--- a/Einstein-v6.1-Llama3-8B-IQ3_XS.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ3_XS.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:d285f7591dbbf02a35196bbd551dfcbc64ca4bb96d05e1a844981ad965a076f0
 size 3518767744
--- a/Einstein-v6.1-Llama3-8B-IQ3_XXS.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ3_XXS.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:8aa53ced184d36bd7f438d4fb80ebe24e7b27a09b81676316063b49098429f3b
 size 3274930688
--- a/Einstein-v6.1-Llama3-8B-IQ4_NL.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ4_NL.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:ff8d91e769c701ae25f66565810d81b868a6f7249223852d79da65032b124cdb
 size 4678011648
--- a/Einstein-v6.1-Llama3-8B-IQ4_XS.gguf
+++ b/Einstein-v6.1-Llama3-8B-IQ4_XS.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7a7043679df0c45add3b05e404fa8c9fd36db538deae39f569004ab995f4fcf2
 size 4447684864
--- a/Einstein-v6.1-Llama3-8B-Q2_K.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q2_K.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:2298db15ff26cadffe2eb15307e2eb6d278b2a56751ddd6db7f061f2996353f5
 size 3179150336
--- a/Einstein-v6.1-Llama3-8B-Q3_K_L.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q3_K_L.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4a757b5ed01f279db926ee34e61ab631ead1c5694a36d3da01ae08197af0ee1c
 size 4321976960
--- a/Einstein-v6.1-Llama3-8B-Q3_K_M.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q3_K_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:c010cab23b319e1a20291109a5a99102a27e3b0a4dbd6653caedbe7ea189a320
 size 4018938496
--- a/Einstein-v6.1-Llama3-8B-Q3_K_S.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q3_K_S.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:9712cf94bf34af86e537ba32bf0ba392e8894cb6e01db232c182c5e139c22c91
 size 3664519808
--- a/Einstein-v6.1-Llama3-8B-Q4_K_M.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q4_K_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:447587bd8f60d9050232148d34fdb2d88b15b2413fd7f8e095a4606ec60b45bf
 size 4920756992
--- a/Einstein-v6.1-Llama3-8B-Q4_K_S.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q4_K_S.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:827bbbcaa31ac9cca7b49b60fa4092884e4dea878433153fe4f3b2efb59c0c7a
 size 4692691712
--- a/Einstein-v6.1-Llama3-8B-Q5_K_M.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q5_K_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5d34b6328341946a78e691f0de36b1a595efba37156410a6bd2d7ce4481b6530
 size 5733012224
--- a/Einstein-v6.1-Llama3-8B-Q5_K_S.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q5_K_S.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a2242ad8c5152aadfdb1353b1759a0d4a6eaa207c919e781c5707c983e154bed
 size 5599318784
--- a/Einstein-v6.1-Llama3-8B-Q6_K.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q6_K.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:380964006b5eef42f64ff784398742d28f8c3c978f31ba757df4e9e5cee9a016
 size 6596033408
--- a/Einstein-v6.1-Llama3-8B-Q8_0.gguf
+++ b/Einstein-v6.1-Llama3-8B-Q8_0.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:b381f0d74fe7793a8551e2bf08de7b83ceb2b3935c260b45b946c9a1fb6a5cb5
 size 8540805760
--- a/Einstein-v6.1-Llama3-8B.imatrix
+++ b/Einstein-v6.1-Llama3-8B.imatrix
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a2661c59f42b44d840d48534e20209a871d868a2e0fd697711e8171389a40539
 size 4988166
--- a/README.md
+++ b/README.md
@@ -0,0 +1,259 @@
 ---
 language:
 - en
 license: other
 tags:
 - axolotl
 - generated_from_trainer
 - instruct
 - finetune
 - chatml
 - gpt4
 - synthetic data
 - science
 - physics
 - chemistry
 - biology
 - math
 - llama
 - llama3
 base_model: meta-llama/Meta-Llama-3-8B
 datasets:
 - allenai/ai2_arc
 - camel-ai/physics
 - camel-ai/chemistry
 - camel-ai/biology
 - camel-ai/math
 - metaeval/reclor
 - openbookqa
 - mandyyyyii/scibench
 - derek-thomas/ScienceQA
 - TIGER-Lab/ScienceEval
 - jondurbin/airoboros-3.2
 - LDJnr/Capybara
 - Cot-Alpaca-GPT4-From-OpenHermes-2.5
 - STEM-AI-mtl/Electrical-engineering
 - knowrohit07/saraswati-stem
 - sablo/oasst2_curated
 - lmsys/lmsys-chat-1m
 - TIGER-Lab/MathInstruct
 - bigbio/med_qa
 - meta-math/MetaMathQA-40K
 - openbookqa
 - piqa
 - metaeval/reclor
 - derek-thomas/ScienceQA
 - scibench
 - sciq
 - Open-Orca/SlimOrca
 - migtissera/Synthia-v1.3
 - TIGER-Lab/ScienceEval
 - allenai/WildChat
 - microsoft/orca-math-word-problems-200k
 - openchat/openchat_sharegpt4_dataset
 - teknium/GPTeacher-General-Instruct
 - m-a-p/CodeFeedback-Filtered-Instruction
 - totally-not-an-llm/EverythingLM-data-V3
 - HuggingFaceH4/no_robots
 - OpenAssistant/oasst_top1_2023-08-25
 - WizardLM/WizardLM_evol_instruct_70k
 model-index:
 - name: Einstein-v6.1-Llama3-8B
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 62.46
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 82.41
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 66.19
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 55.1
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 79.32
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 66.11
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
      name: Open LLM Leaderboard
 quantized_by: bartowski
 pipeline_tag: text-generation
 ---
 ## Llamacpp imatrix Quantizations of Einstein-v6.1-Llama3-8B
 Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b2777">b2777</a> for quantization.
 Original model: https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B
 All quants made using imatrix option with dataset provided by Kalomaze [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
 ## Prompt format
 ```
 <|im_start|>system
 {system_prompt}<|im_end|>
 <|im_start|>user
 {prompt}<|im_end|>
 <|im_start|>assistant
 ```
 ## Download a file (not the whole branch) from below:
 | Filename | Quant type | File Size | Description |
 | -------- | ---------- | --------- | ----------- |
 | [Einstein-v6.1-Llama3-8B-Q8_0.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q8_0.gguf) | Q8_0 | 8.54GB | Extremely high quality, generally unneeded but max available quant. |
 | [Einstein-v6.1-Llama3-8B-Q6_K.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q6_K.gguf) | Q6_K | 6.59GB | Very high quality, near perfect, *recommended*. |
 | [Einstein-v6.1-Llama3-8B-Q5_K_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q5_K_M.gguf) | Q5_K_M | 5.73GB | High quality, *recommended*. |
 | [Einstein-v6.1-Llama3-8B-Q5_K_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q5_K_S.gguf) | Q5_K_S | 5.59GB | High quality, *recommended*. |
 | [Einstein-v6.1-Llama3-8B-Q4_K_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q4_K_M.gguf) | Q4_K_M | 4.92GB | Good quality, uses about 4.83 bits per weight, *recommended*. |
 | [Einstein-v6.1-Llama3-8B-Q4_K_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q4_K_S.gguf) | Q4_K_S | 4.69GB | Slightly lower quality with more space savings, *recommended*. |
 | [Einstein-v6.1-Llama3-8B-IQ4_NL.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ4_NL.gguf) | IQ4_NL | 4.67GB | Decent quality, slightly smaller than Q4_K_S with similar performance *recommended*. |
 | [Einstein-v6.1-Llama3-8B-IQ4_XS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ4_XS.gguf) | IQ4_XS | 4.44GB | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
 | [Einstein-v6.1-Llama3-8B-Q3_K_L.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q3_K_L.gguf) | Q3_K_L | 4.32GB | Lower quality but usable, good for low RAM availability. |
 | [Einstein-v6.1-Llama3-8B-Q3_K_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q3_K_M.gguf) | Q3_K_M | 4.01GB | Even lower quality. |
 | [Einstein-v6.1-Llama3-8B-IQ3_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ3_M.gguf) | IQ3_M | 3.78GB | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
 | [Einstein-v6.1-Llama3-8B-IQ3_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ3_S.gguf) | IQ3_S | 3.68GB | Lower quality, new method with decent performance, recommended over Q3_K_S quant, same size with better performance. |
 | [Einstein-v6.1-Llama3-8B-Q3_K_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q3_K_S.gguf) | Q3_K_S | 3.66GB | Low quality, not recommended. |
 | [Einstein-v6.1-Llama3-8B-IQ3_XS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ3_XS.gguf) | IQ3_XS | 3.51GB | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
 | [Einstein-v6.1-Llama3-8B-IQ3_XXS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ3_XXS.gguf) | IQ3_XXS | 3.27GB | Lower quality, new method with decent performance, comparable to Q3 quants. |
 | [Einstein-v6.1-Llama3-8B-Q2_K.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q2_K.gguf) | Q2_K | 3.17GB | Very low quality but surprisingly usable. |
 | [Einstein-v6.1-Llama3-8B-IQ2_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ2_M.gguf) | IQ2_M | 2.94GB | Very low quality, uses SOTA techniques to also be surprisingly usable. |
 | [Einstein-v6.1-Llama3-8B-IQ2_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ2_S.gguf) | IQ2_S | 2.75GB | Very low quality, uses SOTA techniques to be usable. |
 | [Einstein-v6.1-Llama3-8B-IQ2_XS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ2_XS.gguf) | IQ2_XS | 2.60GB | Very low quality, uses SOTA techniques to be usable. |
 | [Einstein-v6.1-Llama3-8B-IQ2_XXS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ2_XXS.gguf) | IQ2_XXS | 2.39GB | Lower quality, uses SOTA techniques to be usable. |
 | [Einstein-v6.1-Llama3-8B-IQ1_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ1_M.gguf) | IQ1_M | 2.16GB | Extremely low quality, *not* recommended. |
 | [Einstein-v6.1-Llama3-8B-IQ1_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ1_S.gguf) | IQ1_S | 2.01GB | Extremely low quality, *not* recommended. |
 ## Downloading using huggingface-cli
 First, make sure you have hugginface-cli installed:
 ```
 pip install -U "huggingface_hub[cli]"
 ```
 Then, you can target the specific file you want:
 ```
 huggingface-cli download bartowski/Einstein-v6.1-Llama3-8B-GGUF --include "Einstein-v6.1-Llama3-8B-Q4_K_M.gguf" --local-dir ./ --local-dir-use-symlinks False
 ```
 If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
 ```
 huggingface-cli download bartowski/Einstein-v6.1-Llama3-8B-GGUF --include "Einstein-v6.1-Llama3-8B-Q8_0.gguf/*" --local-dir Einstein-v6.1-Llama3-8B-Q8_0 --local-dir-use-symlinks False
 ```
 You can either specify a new local-dir (Einstein-v6.1-Llama3-8B-Q8_0) or download them all in place (./)
 ## Which file should I choose?
 A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
 The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
 If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
 If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
 Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
 If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
 If you want to get more into the weeds, you can check out this extremely useful feature chart:
 [llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
 But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
 These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
 The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
 Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
 {"framework": "pytorch", "task": "text-generation", "allow_remote": true}
		`@@ -0,0 +1 @@`
							`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`