2.9 KiB
2.9 KiB
base_model, datasets, inference, language, license, model_creator, model_name, pipeline_tag, quantized_by, tags
| base_model | datasets | inference | language | license | model_creator | model_name | pipeline_tag | quantized_by | tags | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| appvoid/no-prompt-1.3b |
|
false |
|
apache-2.0 | appvoid | no-prompt-1.3b | text-generation | afrideva |
|
appvoid/no-prompt-1.3b-GGUF
Quantized GGUF model files for no-prompt-1.3b from appvoid
| Name | Quant method | Size |
|---|---|---|
| no-prompt-1.3b.fp16.gguf | fp16 | 2.69 GB |
| no-prompt-1.3b.q2_k.gguf | q2_k | 631.52 MB |
| no-prompt-1.3b.q3_k_m.gguf | q3_k_m | 704.72 MB |
| no-prompt-1.3b.q4_k_m.gguf | q4_k_m | 873.27 MB |
| no-prompt-1.3b.q5_k_m.gguf | q5_k_m | 1.00 GB |
| no-prompt-1.3b.q6_k.gguf | q6_k | 1.17 GB |
| no-prompt-1.3b.q8_0.gguf | q8_0 | 1.43 GB |
Original Model Card:
no-prompt
a sheared-llama-1.3b fine-tuning
This model uses an 1.3 billion parameters model as base to be further fine-tuned on the same data as palmer. It works pretty good and even surpasses sota model on hellaswag.
evaluation
| Model | ARC_C | HellaSwag | PIQA | Winogrande |
|---|---|---|---|---|
| tinyllama-2t | 0.2807 | 0.5463 | 0.7067 | 0.5683 |
| palmer-001 | 0.2807 | 0.5524 | 0.7106 | 0.5896 |
| sheared-1.3b | 0.2910 | 0.5935 | 0.7339 | 0.5809 |
| no-prompt-1.3b | 0.3157 | 0.6022 | 0.7334 | 0.5864 |
| falcon-rw-1b-instruct-openorca (sota) | 0.3362 | 0.5997 | 0.7394 | 0.6148 |
This model was trained on less than 25% of the dataset yet achieves competitive performance to current sota on open llm leaderboard.
training
Training took ~5 P100 gpu hours. It was trained on 15,000 gpt-4 shuffled samples. no-prompt was fine-tuned using lower learning rates ensuring it keeps as much general knowledge as possible.
prompt
no prompt
limitations
Hallucinations are frequent, just as any transformer model this size.

