language, library_name, pipeline_tag, datasets, tags, license, model-index
| language |
library_name |
pipeline_tag |
datasets |
tags |
license |
model-index |
|
|
transformers |
text-generation |
| jondurbin/airoboros-2.2 |
| Open-Orca/OpenOrca |
| garage-bAInd/Open-Platypus |
| WizardLM/WizardLM_evol_instruct_V2_196k |
| TokenBender/python_eval_instruct_51k |
| codefuse-ai/Evol-Instruction-66k |
|
|
llama2 |
| name |
results |
| SpeechlessCoder |
| task |
dataset |
metrics |
|
|
| type |
name |
| openai_humaneval |
HumanEval |
|
| name |
type |
value |
verified |
| pass@1 |
pass@1 |
|
false |
|
|
|
|
|
|
speechless-thoughts-mistral-7b
code
speechless-thoughts-mistral-7b is fine-tuned as a baseline of the speechless-sparsetral-16x7b-MoE.
The specific datasets (speechless-thoughts-252k) are as follows:
- jondurbin/airoboros-2.2: Filter categories related to coding, reasoning and planning. 23,462 samples.
- Open-Orca/OpenOrca: Filter the 'cot' category in 1M GPT4 dataset. 74,440 samples.
- garage-bAInd/Open-Platypus: 100%, 24,926 samples.
- WizardLM/WizardLM_evol_instruct_V2_196k: Coding coversation part. 30,185 samples
- TokenBender/python_eval_instruct_51k: “python” in output .40,309 samples
- Spider: 8,659 samples
- codefuse-ai/Evol-Instruction-66k: 100%, 66,862 samples
Alpaca Prompt Format
Usage
HumanEval
| Metric |
Value |
| humaneval-python |
|
lm-evaluation-harness
Detailed results can be found here
| Metric |
Value |
| Avg. |
59.72 |
| ARC (25-shot) |
58.96 |
| HellaSwag (10-shot) |
80.71 |
| MMLU (5-shot) |
60.11 |
| TruthfulQA (0-shot) |
49.91 |
| Winogrande (5-shot) |
77.82 |
| GSM8K (5-shot) |
30.78 |