diff --git a/README.md b/README.md index 60bd193..8bc7d1d 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,109 @@ license: llama2 datasets: - 922-CA/MoCha_v1 +model-index: +- name: monika-ddlc-7b-v1 + results: + - task: + type: text-generation + name: Text Generation + dataset: + name: AI2 Reasoning Challenge (25-Shot) + type: ai2_arc + config: ARC-Challenge + split: test + args: + num_few_shot: 25 + metrics: + - type: acc_norm + value: 54.95 + name: normalized accuracy + source: + url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922-CA/monika-ddlc-7b-v1 + name: Open LLM Leaderboard + - task: + type: text-generation + name: Text Generation + dataset: + name: HellaSwag (10-Shot) + type: hellaswag + split: validation + args: + num_few_shot: 10 + metrics: + - type: acc_norm + value: 76.78 + name: normalized accuracy + source: + url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922-CA/monika-ddlc-7b-v1 + name: Open LLM Leaderboard + - task: + type: text-generation + name: Text Generation + dataset: + name: MMLU (5-Shot) + type: cais/mmlu + config: all + split: test + args: + num_few_shot: 5 + metrics: + - type: acc + value: 45.61 + name: accuracy + source: + url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922-CA/monika-ddlc-7b-v1 + name: Open LLM Leaderboard + - task: + type: text-generation + name: Text Generation + dataset: + name: TruthfulQA (0-shot) + type: truthful_qa + config: multiple_choice + split: validation + args: + num_few_shot: 0 + metrics: + - type: mc2 + value: 43.94 + source: + url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922-CA/monika-ddlc-7b-v1 + name: Open LLM Leaderboard + - task: + type: text-generation + name: Text Generation + dataset: + name: Winogrande (5-shot) + type: winogrande + config: winogrande_xl + split: validation + args: + num_few_shot: 5 + metrics: + - type: acc + value: 72.85 + name: accuracy + source: + url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922-CA/monika-ddlc-7b-v1 + name: Open LLM Leaderboard + - task: + type: text-generation + name: Text Generation + dataset: + name: GSM8k (5-shot) + type: gsm8k + config: main + split: test + args: + num_few_shot: 5 + metrics: + - type: acc + value: 8.79 + name: accuracy + source: + url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=922-CA/monika-ddlc-7b-v1 + name: Open LLM Leaderboard --- # monika-ddlc-7b-v1: * LLaMA-2 7b chat fine-tuned for Monika character from DDLC (still somewhat experimental) @@ -33,4 +136,17 @@ Additionally, being character-focused means that this model may not be the smart Finally, this model is not guaranteed to output aligned or safe outputs, use at your own risk. -Note: Ideally, would have liked to fine-tune on other models (specifically [Zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)). May try soon for later versions. \ No newline at end of file +Note: Ideally, would have liked to fine-tune on other models (specifically [Zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)). May try soon for later versions. +# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) +Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_922-CA__monika-ddlc-7b-v1) + +| Metric |Value| +|---------------------------------|----:| +|Avg. |50.49| +|AI2 Reasoning Challenge (25-Shot)|54.95| +|HellaSwag (10-Shot) |76.78| +|MMLU (5-Shot) |45.61| +|TruthfulQA (0-shot) |43.94| +|Winogrande (5-shot) |72.85| +|GSM8k (5-shot) | 8.79| +