Files
ModelHub XC 78a6661ff1 初始化项目,由ModelHub XC社区提供模型
Model: bigscience/bloomz-7b1-p3
Source: Original Platform
2026-06-15 07:40:14 +08:00

10 KiB

1datasetpromptmetricvalue
2anli_dev_r1GPT-3 styleaccuracy0.351
3anli_dev_r1MNLI crowdsourceaccuracy0.334
4anli_dev_r1can we inferaccuracy0.351
5anli_dev_r1guaranteed/possible/impossibleaccuracy0.288
6anli_dev_r1justified in sayingaccuracy0.345
7anli_dev_r1medianaccuracy0.345
8anli_dev_r2GPT-3 styleaccuracy0.339
9anli_dev_r2MNLI crowdsourceaccuracy0.335
10anli_dev_r2can we inferaccuracy0.354
11anli_dev_r2guaranteed/possible/impossibleaccuracy0.297
12anli_dev_r2justified in sayingaccuracy0.345
13anli_dev_r2medianaccuracy0.339
14anli_dev_r3GPT-3 styleaccuracy0.37583333333333335
15anli_dev_r3MNLI crowdsourceaccuracy0.3408333333333333
16anli_dev_r3can we inferaccuracy0.36333333333333334
17anli_dev_r3guaranteed/possible/impossibleaccuracy0.31083333333333335
18anli_dev_r3justified in sayingaccuracy0.34
19anli_dev_r3medianaccuracy0.3408333333333333
20story_cloze_2016Answer Given optionsaccuracy0.8305718866916088
21story_cloze_2016Choose Story Endingaccuracy0.8706574024585783
22story_cloze_2016Generate Endingaccuracy0.7183324425440941
23story_cloze_2016Novel Correct Endingaccuracy0.848743987172635
24story_cloze_2016Story Continuation and Optionsaccuracy0.8466060929983966
25story_cloze_2016medianaccuracy0.8466060929983966
26super_glue_cbGPT-3 styleaccuracy0.625
27super_glue_cbMNLI crowdsourceaccuracy0.08928571428571429
28super_glue_cbcan we inferaccuracy0.5892857142857143
29super_glue_cbguaranteed/possible/impossibleaccuracy0.5
30super_glue_cbjustified in sayingaccuracy0.5357142857142857
31super_glue_cbmedianaccuracy0.5357142857142857
32super_glue_copaC1 or C2? premise, so/because…accuracy0.66
33super_glue_copabest_optionaccuracy0.67
34super_glue_copacause_effectaccuracy0.78
35super_glue_copai_am_hesitatingaccuracy0.8
36super_glue_copaplausible_alternativesaccuracy0.81
37super_glue_copamedianaccuracy0.78
38super_glue_rteGPT-3 styleaccuracy0.7870036101083032
39super_glue_rteMNLI crowdsourceaccuracy0.7220216606498195
40super_glue_rtedoes it follow thataccuracy0.6678700361010831
41super_glue_rteguaranteed trueaccuracy0.6714801444043321
42super_glue_rteshould assumeaccuracy0.6678700361010831
43super_glue_rtemedianaccuracy0.6714801444043321
44winogrande_winogrande_xlReplaceaccuracy0.5406471981057617
45winogrande_winogrande_xlTrue or Falseaccuracy0.5074980268350434
46winogrande_winogrande_xldoes underscore refer toaccuracy0.5177584846093133
47winogrande_winogrande_xlstand foraccuracy0.510655090765588
48winogrande_winogrande_xlunderscore refer toaccuracy0.5256511444356748
49winogrande_winogrande_xlmedianaccuracy0.5177584846093133
50xcopa_idC1 or C2? premise, so/because…accuracy0.47
51xcopa_idbest_optionaccuracy0.51
52xcopa_idcause_effectaccuracy0.65
53xcopa_idi_am_hesitatingaccuracy0.66
54xcopa_idplausible_alternativesaccuracy0.67
55xcopa_idmedianaccuracy0.65
56xcopa_swC1 or C2? premise, so/because…accuracy0.58
57xcopa_swbest_optionaccuracy0.57
58xcopa_swcause_effectaccuracy0.46
59xcopa_swi_am_hesitatingaccuracy0.48
60xcopa_swplausible_alternativesaccuracy0.45
61xcopa_swmedianaccuracy0.48
62xcopa_taC1 or C2? premise, so/because…accuracy0.57
63xcopa_tabest_optionaccuracy0.67
64xcopa_tacause_effectaccuracy0.71
65xcopa_tai_am_hesitatingaccuracy0.71
66xcopa_taplausible_alternativesaccuracy0.69
67xcopa_tamedianaccuracy0.69
68xcopa_viC1 or C2? premise, so/because…accuracy0.55
69xcopa_vibest_optionaccuracy0.61
70xcopa_vicause_effectaccuracy0.67
71xcopa_vii_am_hesitatingaccuracy0.66
72xcopa_viplausible_alternativesaccuracy0.65
73xcopa_vimedianaccuracy0.65
74xcopa_zhC1 or C2? premise, so/because…accuracy0.62
75xcopa_zhbest_optionaccuracy0.61
76xcopa_zhcause_effectaccuracy0.77
77xcopa_zhi_am_hesitatingaccuracy0.72
78xcopa_zhplausible_alternativesaccuracy0.74
79xcopa_zhmedianaccuracy0.72
80xnli_arGPT-3 styleaccuracy0.5040160642570282
81xnli_arMNLI crowdsourceaccuracy0.39879518072289155
82xnli_arcan we inferaccuracy0.506425702811245
83xnli_arguaranteed/possible/impossibleaccuracy0.4799196787148594
84xnli_arjustified in sayingaccuracy0.41526104417670684
85xnli_armedianaccuracy0.4799196787148594
86xnli_enGPT-3 styleaccuracy0.5590361445783133
87xnli_enMNLI crowdsourceaccuracy0.342570281124498
88xnli_encan we inferaccuracy0.5449799196787148
89xnli_enguaranteed/possible/impossibleaccuracy0.41164658634538154
90xnli_enjustified in sayingaccuracy0.4634538152610442
91xnli_enmedianaccuracy0.4634538152610442
92xnli_esGPT-3 styleaccuracy0.5373493975903615
93xnli_esMNLI crowdsourceaccuracy0.40441767068273093
94xnli_escan we inferaccuracy0.5277108433734939
95xnli_esguaranteed/possible/impossibleaccuracy0.44216867469879517
96xnli_esjustified in sayingaccuracy0.4534136546184739
97xnli_esmedianaccuracy0.4534136546184739
98xnli_frGPT-3 styleaccuracy0.5248995983935743
99xnli_frMNLI crowdsourceaccuracy0.3895582329317269
100xnli_frcan we inferaccuracy0.5337349397590362
101xnli_frguaranteed/possible/impossibleaccuracy0.42971887550200805
102xnli_frjustified in sayingaccuracy0.4738955823293173
103xnli_frmedianaccuracy0.4738955823293173
104xnli_hiGPT-3 styleaccuracy0.4983935742971888
105xnli_hiMNLI crowdsourceaccuracy0.38714859437751004
106xnli_hican we inferaccuracy0.45542168674698796
107xnli_higuaranteed/possible/impossibleaccuracy0.41405622489959837
108xnli_hijustified in sayingaccuracy0.38795180722891565
109xnli_himedianaccuracy0.41405622489959837
110xnli_swGPT-3 styleaccuracy0.43493975903614457
111xnli_swMNLI crowdsourceaccuracy0.363855421686747
112xnli_swcan we inferaccuracy0.42891566265060244
113xnli_swguaranteed/possible/impossibleaccuracy0.3457831325301205
114xnli_swjustified in sayingaccuracy0.3650602409638554
115xnli_swmedianaccuracy0.3650602409638554
116xnli_urGPT-3 styleaccuracy0.43493975903614457
117xnli_urMNLI crowdsourceaccuracy0.3895582329317269
118xnli_urcan we inferaccuracy0.45180722891566266
119xnli_urguaranteed/possible/impossibleaccuracy0.40120481927710844
120xnli_urjustified in sayingaccuracy0.37630522088353413
121xnli_urmedianaccuracy0.40120481927710844
122xnli_viGPT-3 styleaccuracy0.5196787148594377
123xnli_viMNLI crowdsourceaccuracy0.38112449799196785
124xnli_vican we inferaccuracy0.5080321285140562
125xnli_viguaranteed/possible/impossibleaccuracy0.38393574297188754
126xnli_vijustified in sayingaccuracy0.43614457831325304
127xnli_vimedianaccuracy0.43614457831325304
128xnli_zhGPT-3 styleaccuracy0.5052208835341365
129xnli_zhMNLI crowdsourceaccuracy0.4
130xnli_zhcan we inferaccuracy0.5228915662650603
131xnli_zhguaranteed/possible/impossibleaccuracy0.4738955823293173
132xnli_zhjustified in sayingaccuracy0.45863453815261046
133xnli_zhmedianaccuracy0.4738955823293173
134xstory_cloze_arAnswer Given optionsaccuracy0.7518199867637326
135xstory_cloze_arChoose Story Endingaccuracy0.7749834546657842
136xstory_cloze_arGenerate Endingaccuracy0.586366644606221
137xstory_cloze_arNovel Correct Endingaccuracy0.7518199867637326
138xstory_cloze_arStory Continuation and Optionsaccuracy0.7438782263401721
139xstory_cloze_armedianaccuracy0.7518199867637326
140xstory_cloze_esAnswer Given optionsaccuracy0.7835870284579749
141xstory_cloze_esChoose Story Endingaccuracy0.8292521508934481
142xstory_cloze_esGenerate Endingaccuracy0.6399735274652548
143xstory_cloze_esNovel Correct Endingaccuracy0.7935142289874255
144xstory_cloze_esStory Continuation and Optionsaccuracy0.7888815354070152
145xstory_cloze_esmedianaccuracy0.7888815354070152
146xstory_cloze_euAnswer Given optionsaccuracy0.7041694242223693
147xstory_cloze_euChoose Story Endingaccuracy0.6823295830575777
148xstory_cloze_euGenerate Endingaccuracy0.5625413633355394
149xstory_cloze_euNovel Correct Endingaccuracy0.6671078755790867
150xstory_cloze_euStory Continuation and Optionsaccuracy0.671740569159497
151xstory_cloze_eumedianaccuracy0.671740569159497
152xstory_cloze_hiAnswer Given optionsaccuracy0.6915949702183984
153xstory_cloze_hiChoose Story Endingaccuracy0.7220383851753805
154xstory_cloze_hiGenerate Endingaccuracy0.5883520847121112
155xstory_cloze_hiNovel Correct Endingaccuracy0.6743878226340172
156xstory_cloze_hiStory Continuation and Optionsaccuracy0.6816677696889477
157xstory_cloze_himedianaccuracy0.6816677696889477
158xstory_cloze_idAnswer Given optionsaccuracy0.7445400397088021
159xstory_cloze_idChoose Story Endingaccuracy0.771012574454004
160xstory_cloze_idGenerate Endingaccuracy0.6029119788219722
161xstory_cloze_idNovel Correct Endingaccuracy0.7485109199205824
162xstory_cloze_idStory Continuation and Optionsaccuracy0.7438782263401721
163xstory_cloze_idmedianaccuracy0.7445400397088021
164xstory_cloze_zhAnswer Given optionsaccuracy0.7610853739245532
165xstory_cloze_zhChoose Story Endingaccuracy0.7961614824619457
166xstory_cloze_zhGenerate Endingaccuracy0.6214427531436135
167xstory_cloze_zhNovel Correct Endingaccuracy0.7696889477167439
168xstory_cloze_zhStory Continuation and Optionsaccuracy0.7670416942422237
169xstory_cloze_zhmedianaccuracy0.7670416942422237
170xwinograd_enReplaceaccuracy0.5225806451612903
171xwinograd_enTrue or Falseaccuracy0.48946236559139783
172xwinograd_endoes underscore refer toaccuracy0.5281720430107527
173xwinograd_enstand foraccuracy0.5062365591397849
174xwinograd_enunderscore refer toaccuracy0.5372043010752688
175xwinograd_enmedianaccuracy0.5225806451612903
176xwinograd_frReplaceaccuracy0.5060240963855421
177xwinograd_frTrue or Falseaccuracy0.5421686746987951
178xwinograd_frdoes underscore refer toaccuracy0.5542168674698795
179xwinograd_frstand foraccuracy0.4819277108433735
180xwinograd_frunderscore refer toaccuracy0.5301204819277109
181xwinograd_frmedianaccuracy0.5301204819277109
182xwinograd_ptReplaceaccuracy0.5133079847908745
183xwinograd_ptTrue or Falseaccuracy0.4714828897338403
184xwinograd_ptdoes underscore refer toaccuracy0.5209125475285171
185xwinograd_ptstand foraccuracy0.5019011406844106
186xwinograd_ptunderscore refer toaccuracy0.5399239543726235
187xwinograd_ptmedianaccuracy0.5133079847908745
188xwinograd_zhReplaceaccuracy0.5257936507936508
189xwinograd_zhTrue or Falseaccuracy0.5297619047619048
190xwinograd_zhdoes underscore refer toaccuracy0.5218253968253969
191xwinograd_zhstand foraccuracy0.4444444444444444
192xwinograd_zhunderscore refer toaccuracy0.5198412698412699
193xwinograd_zhmedianaccuracy0.5218253968253969
194multipleaveragemultiple0.5631550819200618