From 019a9d026e2f0bdd34d3a3d81758b8c719ed4502 Mon Sep 17 00:00:00 2001 From: Saurav Muralidharan Date: Tue, 23 Jul 2024 10:36:45 -0700 Subject: [PATCH] Add evaluation preview --- README.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/README.md b/README.md index e3f0265..4f30bba 100644 --- a/README.md +++ b/README.md @@ -53,6 +53,29 @@ print(output_text) Minitron is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf). +## Evaluation Results + +*5-shot performance.* Language Understanding evaluated using [Massive Multitask Language Understanding](https://arxiv.org/abs/2009.03300): + +| Average | +| :---- | +| 63.8 | + +*Zero-shot performance.* Evaluated using select datasets from the [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) with additions: + +HellaSwag | Winogrande | GSM8K| ARC-C | XLSum | +| :------------- | :------------- | :------------- | :------------- | :------------- | +| 80.7 | 79.0 | 51.3 | 52.6 | 31.2 + + +*Code generation performance*. Evaluated using [HumanEval](https://github.com/openai/human-eval): + +| p@1, 0-Shot | +| :------------- | +| 31.6 | + +Please refer to our [paper](https://arxiv.org/abs/2407.14679) for the full set of results. + ## Citation If you find our work helpful, please consider citing our paper: