1.1 KiB
1.1 KiB
| 1 | win_rate | standard_error | n_wins | n_wins_base | n_draws | n_total | mode | avg_length | |
|---|---|---|---|---|---|---|---|---|---|
| 2 | gpt4 | 73.7888198757764 | 1.5359801545073597 | 588 | 205 | 12 | 805 | minimal | 1365 |
| 3 | SOLAR-10.7B-LMCocktail | 73.44527363184079 | 1.5572150363643398 | 590 | 213 | 1 | 804 | community | 1203 |
| 4 | claude | 70.37267080745342 | 1.599519507147828 | 562 | 234 | 9 | 805 | minimal | 1082 |
| 5 | chatgpt | 66.08695652173913 | 1.6626479994330317 | 529 | 270 | 6 | 805 | minimal | 811 |
| 6 | wizardlm-13b | 65.15527950310559 | 1.670034107787565 | 520 | 276 | 9 | 805 | minimal | 985 |
| 7 | vicuna-13b | 64.09937888198758 | 1.6895185863153146 | 515 | 288 | 2 | 805 | minimal | 1037 |
| 8 | guanaco-65b | 62.36024844720497 | 1.7086348811605765 | 502 | 303 | 0 | 805 | minimal | 1249 |
| 9 | oasst-rlhf-llama-33b | 62.0496894409938 | 1.7080028976103514 | 498 | 304 | 3 | 805 | minimal | 1079 |
| 10 | alpaca-farm-ppo-human | 60.24844720496895 | 1.7169496733548772 | 481 | 316 | 8 | 805 | minimal | 803 |
| 11 | falcon-40b-instruct | 56.52173913043478 | 1.7438750520312944 | 453 | 348 | 4 | 805 | minimal | 662 |
| 12 | phi-2-alpaca-gpt4-dpo | 55.59701492537313 | 1.7533719245384989 | 447 | 357 | 0 | 804 | community | 4532 |
| 13 | text_davinci_003 | 50.0 | 0.0 | 0 | 0 | 805 | 805 | minimal | 307 |
| 14 | alpaca-7b | 45.21739130434783 | 1.7375846781579476 | 356 | 433 | 16 | 805 | minimal | 396 |
| 15 | text_davinci_001 | 28.07453416149068 | 1.5602183426587484 | 216 | 569 | 20 | 805 | minimal | 296 |