2.7 KiB
2.7 KiB
| 1 | dataset | prompt | metric | value |
|---|---|---|---|---|
| 2 | xnli_ar | GPT-3 style_armt | accuracy | 0.3333333333333333 |
| 3 | xnli_ar | MNLI crowdsource_armt | accuracy | 0.42891566265060244 |
| 4 | xnli_ar | can we infer_armt | accuracy | 0.3353413654618474 |
| 5 | xnli_ar | guaranteed/possible/impossible_armt | accuracy | 0.3755020080321285 |
| 6 | xnli_ar | justified in saying_armt | accuracy | 0.3349397590361446 |
| 7 | xnli_ar | median | accuracy | 0.3353413654618474 |
| 8 | xnli_es | GPT-3 style_esmt | accuracy | 0.5220883534136547 |
| 9 | xnli_es | MNLI crowdsource_esmt | accuracy | 0.4847389558232932 |
| 10 | xnli_es | can we infer_esmt | accuracy | 0.3333333333333333 |
| 11 | xnli_es | guaranteed/possible/impossible_esmt | accuracy | 0.3449799196787149 |
| 12 | xnli_es | justified in saying_esmt | accuracy | 0.3333333333333333 |
| 13 | xnli_es | median | accuracy | 0.3449799196787149 |
| 14 | xnli_fr | GPT-3 style_frmt | accuracy | 0.4791164658634538 |
| 15 | xnli_fr | MNLI crowdsource_frmt | accuracy | 0.3333333333333333 |
| 16 | xnli_fr | can we infer_frmt | accuracy | 0.42248995983935744 |
| 17 | xnli_fr | guaranteed/possible/impossible_frmt | accuracy | 0.41847389558232934 |
| 18 | xnli_fr | justified in saying_frmt | accuracy | 0.378714859437751 |
| 19 | xnli_fr | median | accuracy | 0.41847389558232934 |
| 20 | xnli_hi | GPT-3 style_himt | accuracy | 0.3389558232931727 |
| 21 | xnli_hi | MNLI crowdsource_himt | accuracy | 0.3333333333333333 |
| 22 | xnli_hi | can we infer_himt | accuracy | 0.3542168674698795 |
| 23 | xnli_hi | guaranteed/possible/impossible_himt | accuracy | 0.3353413654618474 |
| 24 | xnli_hi | justified in saying_himt | accuracy | 0.39879518072289155 |
| 25 | xnli_hi | median | accuracy | 0.3389558232931727 |
| 26 | xnli_sw | GPT-3 style_swmt | accuracy | 0.3333333333333333 |
| 27 | xnli_sw | MNLI crowdsource_swmt | accuracy | 0.3333333333333333 |
| 28 | xnli_sw | can we infer_swmt | accuracy | 0.334136546184739 |
| 29 | xnli_sw | guaranteed/possible/impossible_swmt | accuracy | 0.3236947791164659 |
| 30 | xnli_sw | justified in saying_swmt | accuracy | 0.3321285140562249 |
| 31 | xnli_sw | median | accuracy | 0.3333333333333333 |
| 32 | xnli_ur | GPT-3 style_urmt | accuracy | 0.3751004016064257 |
| 33 | xnli_ur | MNLI crowdsource_urmt | accuracy | 0.3751004016064257 |
| 34 | xnli_ur | can we infer_urmt | accuracy | 0.329718875502008 |
| 35 | xnli_ur | guaranteed/possible/impossible_urmt | accuracy | 0.3337349397590361 |
| 36 | xnli_ur | justified in saying_urmt | accuracy | 0.3285140562248996 |
| 37 | xnli_ur | median | accuracy | 0.3337349397590361 |
| 38 | xnli_vi | GPT-3 style_vimt | accuracy | 0.3333333333333333 |
| 39 | xnli_vi | MNLI crowdsource_vimt | accuracy | 0.3333333333333333 |
| 40 | xnli_vi | can we infer_vimt | accuracy | 0.342570281124498 |
| 41 | xnli_vi | guaranteed/possible/impossible_vimt | accuracy | 0.3333333333333333 |
| 42 | xnli_vi | justified in saying_vimt | accuracy | 0.3365461847389558 |
| 43 | xnli_vi | median | accuracy | 0.3333333333333333 |
| 44 | xnli_zh | GPT-3 style_zhmt | accuracy | 0.3606425702811245 |
| 45 | xnli_zh | MNLI crowdsource_zhmt | accuracy | 0.39598393574297186 |
| 46 | xnli_zh | can we infer_zhmt | accuracy | 0.351004016064257 |
| 47 | xnli_zh | guaranteed/possible/impossible_zhmt | accuracy | 0.3473895582329317 |
| 48 | xnli_zh | justified in saying_zhmt | accuracy | 0.3409638554216867 |
| 49 | xnli_zh | median | accuracy | 0.351004016064257 |
| 50 | multiple | average | multiple | 0.348644578313253 |