tags, model-index, license, language
tags
model-index
license
language
sentence-transformers
feature-extraction
sentence-similarity
transformers
mteb
name
results
bge-base-en-v1.5
task
dataset
metrics
type
name
config
split
revision
mteb/amazon_counterfactual
MTEB AmazonCounterfactualClassification (en)
en
test
e8379541af4e31359cca9fbcf4b00f2671dba205
type
value
accuracy
76.14925373134328
type
value
ap
39.32336517995478
type
value
f1
70.16902252611425
task
dataset
metrics
type
name
config
split
revision
mteb/amazon_polarity
MTEB AmazonPolarityClassification
default
test
e2d317d38cd51312af73b3d32a06d1a08b442046
type
value
accuracy
93.386825
type
value
ap
90.21276917991995
type
value
f1
93.37741030006174
task
dataset
metrics
type
name
config
split
revision
mteb/amazon_reviews_multi
MTEB AmazonReviewsClassification (en)
en
test
1399c76144fd37290681b995c656ef9b2e06e26d
type
value
accuracy
48.846000000000004
type
value
f1
48.14646269778261
task
dataset
metrics
type
name
config
split
revision
arguana
MTEB ArguAna
default
test
None
type
value
map_at_1
40.754000000000005
type
value
map_at_10
55.761
type
value
map_at_100
56.330999999999996
type
value
map_at_1000
56.333999999999996
type
value
map_at_3
51.92
type
value
map_at_5
54.010999999999996
type
value
mrr_at_1
41.181
type
value
mrr_at_10
55.967999999999996
type
value
mrr_at_100
56.538
type
value
mrr_at_1000
56.542
type
value
mrr_at_3
51.980000000000004
type
value
mrr_at_5
54.208999999999996
type
value
ndcg_at_1
40.754000000000005
type
value
ndcg_at_10
63.605000000000004
type
value
ndcg_at_100
66.05199999999999
type
value
ndcg_at_1000
66.12
type
value
ndcg_at_3
55.708
type
value
ndcg_at_5
59.452000000000005
type
value
precision_at_1
40.754000000000005
type
value
precision_at_10
8.841000000000001
type
value
precision_at_100
0.991
type
value
precision_at_1000
0.1
type
value
precision_at_3
22.238
type
value
precision_at_5
15.149000000000001
type
value
recall_at_1
40.754000000000005
type
value
recall_at_10
88.407
type
value
recall_at_100
99.14699999999999
type
value
recall_at_1000
99.644
type
value
recall_at_3
66.714
type
value
recall_at_5
75.747
task
dataset
metrics
type
name
config
split
revision
mteb/arxiv-clustering-p2p
MTEB ArxivClusteringP2P
default
test
a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
type
value
v_measure
48.74884539679369
task
dataset
metrics
type
name
config
split
revision
mteb/arxiv-clustering-s2s
MTEB ArxivClusteringS2S
default
test
f910caf1a6075f7329cdf8c1a6135696f37dbd53
type
value
v_measure
42.8075893810716
task
dataset
metrics
type
name
config
split
revision
mteb/askubuntudupquestions-reranking
MTEB AskUbuntuDupQuestions
default
test
2000358ca161889fa9c082cb41daa8dcfb161a54
type
value
map
62.128470519187736
type
value
mrr
74.28065778481289
task
dataset
metrics
type
name
config
split
revision
mteb/biosses-sts
MTEB BIOSSES
default
test
d3fb88f8f02e40887cd149695127462bbcf29b4a
type
value
cos_sim_pearson
89.24629081484655
type
value
cos_sim_spearman
86.93752309911496
type
value
euclidean_pearson
87.58589628573816
type
value
euclidean_spearman
88.05622328825284
type
value
manhattan_pearson
87.5594959805773
type
value
manhattan_spearman
88.19658793233961
task
dataset
metrics
type
name
config
split
revision
mteb/banking77
MTEB Banking77Classification
default
test
0fd18e25b25c072e09e0d92ab615fda904d66300
type
value
accuracy
86.9512987012987
type
value
f1
86.92515357973708
task
dataset
metrics
type
name
config
split
revision
mteb/biorxiv-clustering-p2p
MTEB BiorxivClusteringP2P
default
test
65b79d1d13f80053f67aca9498d9402c2d9f1f40
type
value
v_measure
39.10263762928872
task
dataset
metrics
type
name
config
split
revision
mteb/biorxiv-clustering-s2s
MTEB BiorxivClusteringS2S
default
test
258694dd0231531bc1fd9de6ceb52a0853c6d908
type
value
v_measure
36.69711517426737
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackAndroidRetrieval
default
test
None
type
value
map_at_1
32.327
type
value
map_at_10
44.099
type
value
map_at_100
45.525
type
value
map_at_1000
45.641999999999996
type
value
map_at_3
40.47
type
value
map_at_5
42.36
type
value
mrr_at_1
39.199
type
value
mrr_at_10
49.651
type
value
mrr_at_100
50.29
type
value
mrr_at_1000
50.329
type
value
mrr_at_3
46.924
type
value
mrr_at_5
48.548
type
value
ndcg_at_1
39.199
type
value
ndcg_at_10
50.773
type
value
ndcg_at_100
55.67999999999999
type
value
ndcg_at_1000
57.495
type
value
ndcg_at_3
45.513999999999996
type
value
ndcg_at_5
47.703
type
value
precision_at_1
39.199
type
value
precision_at_10
9.914000000000001
type
value
precision_at_100
1.5310000000000001
type
value
precision_at_1000
0.198
type
value
precision_at_3
21.984
type
value
precision_at_5
15.737000000000002
type
value
recall_at_1
32.327
type
value
recall_at_10
63.743
type
value
recall_at_100
84.538
type
value
recall_at_1000
96.089
type
value
recall_at_3
48.065000000000005
type
value
recall_at_5
54.519
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackEnglishRetrieval
default
test
None
type
value
map_at_1
32.671
type
value
map_at_10
42.954
type
value
map_at_100
44.151
type
value
map_at_1000
44.287
type
value
map_at_3
39.912
type
value
map_at_5
41.798
type
value
mrr_at_1
41.465
type
value
mrr_at_10
49.351
type
value
mrr_at_100
49.980000000000004
type
value
mrr_at_1000
50.016000000000005
type
value
mrr_at_3
47.144000000000005
type
value
mrr_at_5
48.592999999999996
type
value
ndcg_at_1
41.465
type
value
ndcg_at_10
48.565999999999995
type
value
ndcg_at_100
52.76499999999999
type
value
ndcg_at_1000
54.749
type
value
ndcg_at_3
44.57
type
value
ndcg_at_5
46.759
type
value
precision_at_1
41.465
type
value
precision_at_10
9.107999999999999
type
value
precision_at_100
1.433
type
value
precision_at_1000
0.191
type
value
precision_at_3
21.423000000000002
type
value
precision_at_5
15.414
type
value
recall_at_1
32.671
type
value
recall_at_10
57.738
type
value
recall_at_100
75.86500000000001
type
value
recall_at_1000
88.36
type
value
recall_at_3
45.626
type
value
recall_at_5
51.812000000000005
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackGamingRetrieval
default
test
None
type
value
map_at_1
41.185
type
value
map_at_10
53.929
type
value
map_at_100
54.92
type
value
map_at_1000
54.967999999999996
type
value
map_at_3
50.70400000000001
type
value
map_at_5
52.673
type
value
mrr_at_1
47.398
type
value
mrr_at_10
57.303000000000004
type
value
mrr_at_100
57.959
type
value
mrr_at_1000
57.985
type
value
mrr_at_3
54.932
type
value
mrr_at_5
56.464999999999996
type
value
ndcg_at_1
47.398
type
value
ndcg_at_10
59.653
type
value
ndcg_at_100
63.627
type
value
ndcg_at_1000
64.596
type
value
ndcg_at_3
54.455
type
value
ndcg_at_5
57.245000000000005
type
value
precision_at_1
47.398
type
value
precision_at_10
9.524000000000001
type
value
precision_at_100
1.243
type
value
precision_at_1000
0.13699999999999998
type
value
precision_at_3
24.389
type
value
precision_at_5
16.752
type
value
recall_at_1
41.185
type
value
recall_at_10
73.193
type
value
recall_at_100
90.357
type
value
recall_at_1000
97.253
type
value
recall_at_3
59.199999999999996
type
value
recall_at_5
66.118
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackGisRetrieval
default
test
None
type
value
map_at_1
27.27
type
value
map_at_10
36.223
type
value
map_at_100
37.218
type
value
map_at_1000
37.293
type
value
map_at_3
33.503
type
value
map_at_5
35.097
type
value
mrr_at_1
29.492
type
value
mrr_at_10
38.352000000000004
type
value
mrr_at_100
39.188
type
value
mrr_at_1000
39.247
type
value
mrr_at_3
35.876000000000005
type
value
mrr_at_5
37.401
type
value
ndcg_at_1
29.492
type
value
ndcg_at_10
41.239
type
value
ndcg_at_100
46.066
type
value
ndcg_at_1000
47.992000000000004
type
value
ndcg_at_3
36.11
type
value
ndcg_at_5
38.772
type
value
precision_at_1
29.492
type
value
precision_at_10
6.260000000000001
type
value
precision_at_100
0.914
type
value
precision_at_1000
0.11100000000000002
type
value
precision_at_3
15.104000000000001
type
value
precision_at_5
10.644
type
value
recall_at_1
27.27
type
value
recall_at_10
54.589
type
value
recall_at_100
76.70700000000001
type
value
recall_at_1000
91.158
type
value
recall_at_3
40.974
type
value
recall_at_5
47.327000000000005
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackMathematicaRetrieval
default
test
None
type
value
map_at_1
17.848
type
value
map_at_10
26.207
type
value
map_at_100
27.478
type
value
map_at_1000
27.602
type
value
map_at_3
23.405
type
value
map_at_5
24.98
type
value
mrr_at_1
21.891
type
value
mrr_at_10
31.041999999999998
type
value
mrr_at_100
32.092
type
value
mrr_at_1000
32.151999999999994
type
value
mrr_at_3
28.358
type
value
mrr_at_5
29.969
type
value
ndcg_at_1
21.891
type
value
ndcg_at_10
31.585
type
value
ndcg_at_100
37.531
type
value
ndcg_at_1000
40.256
type
value
ndcg_at_3
26.508
type
value
ndcg_at_5
28.894
type
value
precision_at_1
21.891
type
value
precision_at_10
5.795999999999999
type
value
precision_at_100
0.9990000000000001
type
value
precision_at_1000
0.13799999999999998
type
value
precision_at_3
12.769
type
value
precision_at_5
9.279
type
value
recall_at_1
17.848
type
value
recall_at_10
43.452
type
value
recall_at_100
69.216
type
value
recall_at_1000
88.102
type
value
recall_at_3
29.18
type
value
recall_at_5
35.347
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackPhysicsRetrieval
default
test
None
type
value
map_at_1
30.94
type
value
map_at_10
41.248000000000005
type
value
map_at_100
42.495
type
value
map_at_1000
42.602000000000004
type
value
map_at_3
37.939
type
value
map_at_5
39.924
type
value
mrr_at_1
37.824999999999996
type
value
mrr_at_10
47.041
type
value
mrr_at_100
47.83
type
value
mrr_at_1000
47.878
type
value
mrr_at_3
44.466
type
value
mrr_at_5
46.111999999999995
type
value
ndcg_at_1
37.824999999999996
type
value
ndcg_at_10
47.223
type
value
ndcg_at_100
52.394
type
value
ndcg_at_1000
54.432
type
value
ndcg_at_3
42.032000000000004
type
value
ndcg_at_5
44.772
type
value
precision_at_1
37.824999999999996
type
value
precision_at_10
8.393
type
value
precision_at_100
1.2890000000000001
type
value
precision_at_1000
0.164
type
value
precision_at_3
19.698
type
value
precision_at_5
14.013
type
value
recall_at_1
30.94
type
value
recall_at_10
59.316
type
value
recall_at_100
80.783
type
value
recall_at_1000
94.15400000000001
type
value
recall_at_3
44.712
type
value
recall_at_5
51.932
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackProgrammersRetrieval
default
test
None
type
value
map_at_1
27.104
type
value
map_at_10
36.675999999999995
type
value
map_at_100
38.076
type
value
map_at_1000
38.189
type
value
map_at_3
33.733999999999995
type
value
map_at_5
35.287
type
value
mrr_at_1
33.904
type
value
mrr_at_10
42.55
type
value
mrr_at_100
43.434
type
value
mrr_at_1000
43.494
type
value
mrr_at_3
40.126
type
value
mrr_at_5
41.473
type
value
ndcg_at_1
33.904
type
value
ndcg_at_10
42.414
type
value
ndcg_at_100
48.203
type
value
ndcg_at_1000
50.437
type
value
ndcg_at_3
37.633
type
value
ndcg_at_5
39.67
type
value
precision_at_1
33.904
type
value
precision_at_10
7.82
type
value
precision_at_100
1.2409999999999999
type
value
precision_at_1000
0.159
type
value
precision_at_3
17.884
type
value
precision_at_5
12.648000000000001
type
value
recall_at_1
27.104
type
value
recall_at_10
53.563
type
value
recall_at_100
78.557
type
value
recall_at_1000
93.533
type
value
recall_at_3
39.92
type
value
recall_at_5
45.457
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackRetrieval
default
test
None
type
value
map_at_1
27.707749999999997
type
value
map_at_10
36.961
type
value
map_at_100
38.158833333333334
type
value
map_at_1000
38.270333333333326
type
value
map_at_3
34.07183333333334
type
value
map_at_5
35.69533333333334
type
value
mrr_at_1
32.81875
type
value
mrr_at_10
41.293
type
value
mrr_at_100
42.116499999999995
type
value
mrr_at_1000
42.170249999999996
type
value
mrr_at_3
38.83983333333333
type
value
mrr_at_5
40.29775
type
value
ndcg_at_1
32.81875
type
value
ndcg_at_10
42.355
type
value
ndcg_at_100
47.41374999999999
type
value
ndcg_at_1000
49.5805
type
value
ndcg_at_3
37.52825
type
value
ndcg_at_5
39.83266666666667
type
value
precision_at_1
32.81875
type
value
precision_at_10
7.382416666666666
type
value
precision_at_100
1.1640833333333334
type
value
precision_at_1000
0.15383333333333335
type
value
precision_at_3
17.134166666666665
type
value
precision_at_5
12.174833333333336
type
value
recall_at_1
27.707749999999997
type
value
recall_at_10
53.945
type
value
recall_at_100
76.191
type
value
recall_at_1000
91.101
type
value
recall_at_3
40.39083333333334
type
value
recall_at_5
46.40083333333333
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackStatsRetrieval
default
test
None
type
value
map_at_1
26.482
type
value
map_at_10
33.201
type
value
map_at_100
34.107
type
value
map_at_1000
34.197
type
value
map_at_3
31.174000000000003
type
value
map_at_5
32.279
type
value
mrr_at_1
29.908
type
value
mrr_at_10
36.235
type
value
mrr_at_100
37.04
type
value
mrr_at_1000
37.105
type
value
mrr_at_3
34.355999999999995
type
value
mrr_at_5
35.382999999999996
type
value
ndcg_at_1
29.908
type
value
ndcg_at_10
37.325
type
value
ndcg_at_100
41.795
type
value
ndcg_at_1000
44.105
type
value
ndcg_at_3
33.555
type
value
ndcg_at_5
35.266999999999996
type
value
precision_at_1
29.908
type
value
precision_at_10
5.721
type
value
precision_at_100
0.8630000000000001
type
value
precision_at_1000
0.11299999999999999
type
value
precision_at_3
14.008000000000001
type
value
precision_at_5
9.754999999999999
type
value
recall_at_1
26.482
type
value
recall_at_10
47.072
type
value
recall_at_100
67.27
type
value
recall_at_1000
84.371
type
value
recall_at_3
36.65
type
value
recall_at_5
40.774
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackTexRetrieval
default
test
None
type
value
map_at_1
18.815
type
value
map_at_10
26.369999999999997
type
value
map_at_100
27.458
type
value
map_at_1000
27.588
type
value
map_at_3
23.990000000000002
type
value
map_at_5
25.345000000000002
type
value
mrr_at_1
22.953000000000003
type
value
mrr_at_10
30.342999999999996
type
value
mrr_at_100
31.241000000000003
type
value
mrr_at_1000
31.319000000000003
type
value
mrr_at_3
28.16
type
value
mrr_at_5
29.406
type
value
ndcg_at_1
22.953000000000003
type
value
ndcg_at_10
31.151
type
value
ndcg_at_100
36.309000000000005
type
value
ndcg_at_1000
39.227000000000004
type
value
ndcg_at_3
26.921
type
value
ndcg_at_5
28.938000000000002
type
value
precision_at_1
22.953000000000003
type
value
precision_at_10
5.602
type
value
precision_at_100
0.9530000000000001
type
value
precision_at_1000
0.13899999999999998
type
value
precision_at_3
12.606
type
value
precision_at_5
9.119
type
value
recall_at_1
18.815
type
value
recall_at_10
41.574
type
value
recall_at_100
64.84400000000001
type
value
recall_at_1000
85.406
type
value
recall_at_3
29.694
type
value
recall_at_5
34.935
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackUnixRetrieval
default
test
None
type
value
map_at_1
27.840999999999998
type
value
map_at_10
36.797999999999995
type
value
map_at_100
37.993
type
value
map_at_1000
38.086999999999996
type
value
map_at_3
34.050999999999995
type
value
map_at_5
35.379
type
value
mrr_at_1
32.649
type
value
mrr_at_10
41.025
type
value
mrr_at_100
41.878
type
value
mrr_at_1000
41.929
type
value
mrr_at_3
38.573
type
value
mrr_at_5
39.715
type
value
ndcg_at_1
32.649
type
value
ndcg_at_10
42.142
type
value
ndcg_at_100
47.558
type
value
ndcg_at_1000
49.643
type
value
ndcg_at_3
37.12
type
value
ndcg_at_5
38.983000000000004
type
value
precision_at_1
32.649
type
value
precision_at_10
7.08
type
value
precision_at_100
1.1039999999999999
type
value
precision_at_1000
0.13899999999999998
type
value
precision_at_3
16.698
type
value
precision_at_5
11.511000000000001
type
value
recall_at_1
27.840999999999998
type
value
recall_at_10
54.245
type
value
recall_at_100
77.947
type
value
recall_at_1000
92.36999999999999
type
value
recall_at_3
40.146
type
value
recall_at_5
44.951
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackWebmastersRetrieval
default
test
None
type
value
map_at_1
26.529000000000003
type
value
map_at_10
35.010000000000005
type
value
map_at_100
36.647
type
value
map_at_1000
36.857
type
value
map_at_3
31.968000000000004
type
value
map_at_5
33.554
type
value
mrr_at_1
31.818
type
value
mrr_at_10
39.550999999999995
type
value
mrr_at_100
40.54
type
value
mrr_at_1000
40.596
type
value
mrr_at_3
36.726
type
value
mrr_at_5
38.416
type
value
ndcg_at_1
31.818
type
value
ndcg_at_10
40.675
type
value
ndcg_at_100
46.548
type
value
ndcg_at_1000
49.126
type
value
ndcg_at_3
35.829
type
value
ndcg_at_5
38.0
type
value
precision_at_1
31.818
type
value
precision_at_10
7.826
type
value
precision_at_100
1.538
type
value
precision_at_1000
0.24
type
value
precision_at_3
16.601
type
value
precision_at_5
12.095
type
value
recall_at_1
26.529000000000003
type
value
recall_at_10
51.03
type
value
recall_at_100
77.556
type
value
recall_at_1000
93.804
type
value
recall_at_3
36.986000000000004
type
value
recall_at_5
43.096000000000004
task
dataset
metrics
type
name
config
split
revision
BeIR/cqadupstack
MTEB CQADupstackWordpressRetrieval
default
test
None
type
value
map_at_1
23.480999999999998
type
value
map_at_10
30.817
type
value
map_at_100
31.838
type
value
map_at_1000
31.932
type
value
map_at_3
28.011999999999997
type
value
map_at_5
29.668
type
value
mrr_at_1
25.323
type
value
mrr_at_10
33.072
type
value
mrr_at_100
33.926
type
value
mrr_at_1000
33.993
type
value
mrr_at_3
30.436999999999998
type
value
mrr_at_5
32.092
type
value
ndcg_at_1
25.323
type
value
ndcg_at_10
35.514
type
value
ndcg_at_100
40.489000000000004
type
value
ndcg_at_1000
42.908
type
value
ndcg_at_3
30.092000000000002
type
value
ndcg_at_5
32.989000000000004
type
value
precision_at_1
25.323
type
value
precision_at_10
5.545
type
value
precision_at_100
0.861
type
value
precision_at_1000
0.117
type
value
precision_at_3
12.446
type
value
precision_at_5
9.131
type
value
recall_at_1
23.480999999999998
type
value
recall_at_10
47.825
type
value
recall_at_100
70.652
type
value
recall_at_1000
88.612
type
value
recall_at_3
33.537
type
value
recall_at_5
40.542
task
dataset
metrics
type
name
config
split
revision
climate-fever
MTEB ClimateFEVER
default
test
None
type
value
map_at_1
13.333999999999998
type
value
map_at_10
22.524
type
value
map_at_100
24.506
type
value
map_at_1000
24.715
type
value
map_at_3
19.022
type
value
map_at_5
20.693
type
value
mrr_at_1
29.186
type
value
mrr_at_10
41.22
type
value
mrr_at_100
42.16
type
value
mrr_at_1000
42.192
type
value
mrr_at_3
38.013000000000005
type
value
mrr_at_5
39.704
type
value
ndcg_at_1
29.186
type
value
ndcg_at_10
31.167
type
value
ndcg_at_100
38.879000000000005
type
value
ndcg_at_1000
42.376000000000005
type
value
ndcg_at_3
25.817
type
value
ndcg_at_5
27.377000000000002
type
value
precision_at_1
29.186
type
value
precision_at_10
9.693999999999999
type
value
precision_at_100
1.8030000000000002
type
value
precision_at_1000
0.246
type
value
precision_at_3
19.11
type
value
precision_at_5
14.344999999999999
type
value
recall_at_1
13.333999999999998
type
value
recall_at_10
37.092000000000006
type
value
recall_at_100
63.651
type
value
recall_at_1000
83.05
type
value
recall_at_3
23.74
type
value
recall_at_5
28.655
task
dataset
metrics
type
name
config
split
revision
dbpedia-entity
MTEB DBPedia
default
test
None
type
value
map_at_1
9.151
type
value
map_at_10
19.653000000000002
type
value
map_at_100
28.053
type
value
map_at_1000
29.709000000000003
type
value
map_at_3
14.191
type
value
map_at_5
16.456
type
value
mrr_at_1
66.25
type
value
mrr_at_10
74.4
type
value
mrr_at_100
74.715
type
value
mrr_at_1000
74.726
type
value
mrr_at_3
72.417
type
value
mrr_at_5
73.667
type
value
ndcg_at_1
54.25
type
value
ndcg_at_10
40.77
type
value
ndcg_at_100
46.359
type
value
ndcg_at_1000
54.193000000000005
type
value
ndcg_at_3
44.832
type
value
ndcg_at_5
42.63
type
value
precision_at_1
66.25
type
value
precision_at_10
32.175
type
value
precision_at_100
10.668
type
value
precision_at_1000
2.067
type
value
precision_at_3
47.667
type
value
precision_at_5
41.3
type
value
recall_at_1
9.151
type
value
recall_at_10
25.003999999999998
type
value
recall_at_100
52.976
type
value
recall_at_1000
78.315
type
value
recall_at_3
15.487
type
value
recall_at_5
18.999
task
dataset
metrics
type
name
config
split
revision
mteb/emotion
MTEB EmotionClassification
default
test
4f58c6b202a23cf9a4da393831edf4f9183cad37
type
value
accuracy
51.89999999999999
type
value
f1
46.47777925067403
task
dataset
metrics
type
name
config
split
revision
fever
MTEB FEVER
default
test
None
type
value
map_at_1
73.706
type
value
map_at_10
82.423
type
value
map_at_100
82.67999999999999
type
value
map_at_1000
82.694
type
value
map_at_3
81.328
type
value
map_at_5
82.001
type
value
mrr_at_1
79.613
type
value
mrr_at_10
87.07000000000001
type
value
mrr_at_100
87.169
type
value
mrr_at_1000
87.17
type
value
mrr_at_3
86.404
type
value
mrr_at_5
86.856
type
value
ndcg_at_1
79.613
type
value
ndcg_at_10
86.289
type
value
ndcg_at_100
87.201
type
value
ndcg_at_1000
87.428
type
value
ndcg_at_3
84.625
type
value
ndcg_at_5
85.53699999999999
type
value
precision_at_1
79.613
type
value
precision_at_10
10.399
type
value
precision_at_100
1.1079999999999999
type
value
precision_at_1000
0.11499999999999999
type
value
precision_at_3
32.473
type
value
precision_at_5
20.132
type
value
recall_at_1
73.706
type
value
recall_at_10
93.559
type
value
recall_at_100
97.188
type
value
recall_at_1000
98.555
type
value
recall_at_3
88.98700000000001
type
value
recall_at_5
91.373
task
dataset
metrics
type
name
config
split
revision
fiqa
MTEB FiQA2018
default
test
None
type
value
map_at_1
19.841
type
value
map_at_10
32.643
type
value
map_at_100
34.575
type
value
map_at_1000
34.736
type
value
map_at_3
28.317999999999998
type
value
map_at_5
30.964000000000002
type
value
mrr_at_1
39.660000000000004
type
value
mrr_at_10
48.620000000000005
type
value
mrr_at_100
49.384
type
value
mrr_at_1000
49.415
type
value
mrr_at_3
45.988
type
value
mrr_at_5
47.361
type
value
ndcg_at_1
39.660000000000004
type
value
ndcg_at_10
40.646
type
value
ndcg_at_100
47.657
type
value
ndcg_at_1000
50.428
type
value
ndcg_at_3
36.689
type
value
ndcg_at_5
38.211
type
value
precision_at_1
39.660000000000004
type
value
precision_at_10
11.235000000000001
type
value
precision_at_100
1.8530000000000002
type
value
precision_at_1000
0.23600000000000002
type
value
precision_at_3
24.587999999999997
type
value
precision_at_5
18.395
type
value
recall_at_1
19.841
type
value
recall_at_10
48.135
type
value
recall_at_100
74.224
type
value
recall_at_1000
90.826
type
value
recall_at_3
33.536
type
value
recall_at_5
40.311
task
dataset
metrics
type
name
config
split
revision
hotpotqa
MTEB HotpotQA
default
test
None
type
value
map_at_1
40.358
type
value
map_at_10
64.497
type
value
map_at_100
65.362
type
value
map_at_1000
65.41900000000001
type
value
map_at_3
61.06700000000001
type
value
map_at_5
63.317
type
value
mrr_at_1
80.716
type
value
mrr_at_10
86.10799999999999
type
value
mrr_at_100
86.265
type
value
mrr_at_1000
86.27
type
value
mrr_at_3
85.271
type
value
mrr_at_5
85.82499999999999
type
value
ndcg_at_1
80.716
type
value
ndcg_at_10
72.597
type
value
ndcg_at_100
75.549
type
value
ndcg_at_1000
76.61
type
value
ndcg_at_3
67.874
type
value
ndcg_at_5
70.655
type
value
precision_at_1
80.716
type
value
precision_at_10
15.148
type
value
precision_at_100
1.745
type
value
precision_at_1000
0.188
type
value
precision_at_3
43.597
type
value
precision_at_5
28.351
type
value
recall_at_1
40.358
type
value
recall_at_10
75.739
type
value
recall_at_100
87.259
type
value
recall_at_1000
94.234
type
value
recall_at_3
65.39500000000001
type
value
recall_at_5
70.878
task
dataset
metrics
type
name
config
split
revision
mteb/imdb
MTEB ImdbClassification
default
test
3d86128a09e091d6018b6d26cad27f2739fc2db7
type
value
accuracy
90.80799999999998
type
value
ap
86.81350378180757
type
value
f1
90.79901248314215
task
dataset
metrics
type
name
config
split
revision
msmarco
MTEB MSMARCO
default
dev
None
type
value
map_at_1
22.096
type
value
map_at_10
34.384
type
value
map_at_100
35.541
type
value
map_at_1000
35.589999999999996
type
value
map_at_3
30.496000000000002
type
value
map_at_5
32.718
type
value
mrr_at_1
22.750999999999998
type
value
mrr_at_10
35.024
type
value
mrr_at_100
36.125
type
value
mrr_at_1000
36.168
type
value
mrr_at_3
31.225
type
value
mrr_at_5
33.416000000000004
type
value
ndcg_at_1
22.750999999999998
type
value
ndcg_at_10
41.351
type
value
ndcg_at_100
46.92
type
value
ndcg_at_1000
48.111
type
value
ndcg_at_3
33.439
type
value
ndcg_at_5
37.407000000000004
type
value
precision_at_1
22.750999999999998
type
value
precision_at_10
6.564
type
value
precision_at_100
0.935
type
value
precision_at_1000
0.104
type
value
precision_at_3
14.288
type
value
precision_at_5
10.581999999999999
type
value
recall_at_1
22.096
type
value
recall_at_10
62.771
type
value
recall_at_100
88.529
type
value
recall_at_1000
97.55
type
value
recall_at_3
41.245
type
value
recall_at_5
50.788
task
dataset
metrics
type
name
config
split
revision
mteb/mtop_domain
MTEB MTOPDomainClassification (en)
en
test
d80d48c1eb48d3562165c59d59d0034df9fff0bf
type
value
accuracy
94.16780665754673
type
value
f1
93.96331194859894
task
dataset
metrics
type
name
config
split
revision
mteb/mtop_intent
MTEB MTOPIntentClassification (en)
en
test
ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
type
value
accuracy
76.90606475148198
type
value
f1
58.58344986604187
task
dataset
metrics
type
name
config
split
revision
mteb/amazon_massive_intent
MTEB MassiveIntentClassification (en)
en
test
31efe3c427b0bae9c22cbb560b8f15491cc6bed7
type
value
accuracy
76.14660390047075
type
value
f1
74.31533923533614
task
dataset
metrics
type
name
config
split
revision
mteb/amazon_massive_scenario
MTEB MassiveScenarioClassification (en)
en
test
7d571f92784cd94a019292a1f45445077d0ef634
type
value
accuracy
80.16139878950908
type
value
f1
80.18532656824924
task
dataset
metrics
type
name
config
split
revision
mteb/medrxiv-clustering-p2p
MTEB MedrxivClusteringP2P
default
test
e7a26af6f3ae46b30dde8737f02c07b1505bcc73
type
value
v_measure
32.949880906135085
task
dataset
metrics
type
name
config
split
revision
mteb/medrxiv-clustering-s2s
MTEB MedrxivClusteringS2S
default
test
35191c8c0dca72d8ff3efcd72aa802307d469663
type
value
v_measure
31.56300351524862
task
dataset
metrics
type
name
config
split
revision
mteb/mind_small
MTEB MindSmallReranking
default
test
3bdac13927fdc888b903db93b2ffdbd90b295a69
type
value
map
31.196521894371315
type
value
mrr
32.22644231694389
task
dataset
metrics
type
name
config
split
revision
nfcorpus
MTEB NFCorpus
default
test
None
type
value
map_at_1
6.783
type
value
map_at_10
14.549000000000001
type
value
map_at_100
18.433
type
value
map_at_1000
19.949
type
value
map_at_3
10.936
type
value
map_at_5
12.514
type
value
mrr_at_1
47.368
type
value
mrr_at_10
56.42
type
value
mrr_at_100
56.908
type
value
mrr_at_1000
56.95
type
value
mrr_at_3
54.283
type
value
mrr_at_5
55.568
type
value
ndcg_at_1
45.666000000000004
type
value
ndcg_at_10
37.389
type
value
ndcg_at_100
34.253
type
value
ndcg_at_1000
43.059999999999995
type
value
ndcg_at_3
42.725
type
value
ndcg_at_5
40.193
type
value
precision_at_1
47.368
type
value
precision_at_10
27.988000000000003
type
value
precision_at_100
8.672
type
value
precision_at_1000
2.164
type
value
precision_at_3
40.248
type
value
precision_at_5
34.737
type
value
recall_at_1
6.783
type
value
recall_at_10
17.838
type
value
recall_at_100
33.672000000000004
type
value
recall_at_1000
66.166
type
value
recall_at_3
11.849
type
value
recall_at_5
14.205000000000002
task
dataset
metrics
type
name
config
split
revision
nq
MTEB NQ
default
test
None
type
value
map_at_1
31.698999999999998
type
value
map_at_10
46.556
type
value
map_at_100
47.652
type
value
map_at_1000
47.68
type
value
map_at_3
42.492000000000004
type
value
map_at_5
44.763999999999996
type
value
mrr_at_1
35.747
type
value
mrr_at_10
49.242999999999995
type
value
mrr_at_100
50.052
type
value
mrr_at_1000
50.068
type
value
mrr_at_3
45.867000000000004
type
value
mrr_at_5
47.778999999999996
type
value
ndcg_at_1
35.717999999999996
type
value
ndcg_at_10
54.14600000000001
type
value
ndcg_at_100
58.672999999999995
type
value
ndcg_at_1000
59.279
type
value
ndcg_at_3
46.407
type
value
ndcg_at_5
50.181
type
value
precision_at_1
35.717999999999996
type
value
precision_at_10
8.844000000000001
type
value
precision_at_100
1.139
type
value
precision_at_1000
0.12
type
value
precision_at_3
20.993000000000002
type
value
precision_at_5
14.791000000000002
type
value
recall_at_1
31.698999999999998
type
value
recall_at_10
74.693
type
value
recall_at_100
94.15299999999999
type
value
recall_at_1000
98.585
type
value
recall_at_3
54.388999999999996
type
value
recall_at_5
63.08200000000001
task
dataset
metrics
type
name
config
split
revision
quora
MTEB QuoraRetrieval
default
test
None
type
value
map_at_1
71.283
type
value
map_at_10
85.24000000000001
type
value
map_at_100
85.882
type
value
map_at_1000
85.897
type
value
map_at_3
82.326
type
value
map_at_5
84.177
type
value
mrr_at_1
82.21000000000001
type
value
mrr_at_10
88.228
type
value
mrr_at_100
88.32
type
value
mrr_at_1000
88.32
type
value
mrr_at_3
87.323
type
value
mrr_at_5
87.94800000000001
type
value
ndcg_at_1
82.17999999999999
type
value
ndcg_at_10
88.9
type
value
ndcg_at_100
90.079
type
value
ndcg_at_1000
90.158
type
value
ndcg_at_3
86.18299999999999
type
value
ndcg_at_5
87.71799999999999
type
value
precision_at_1
82.17999999999999
type
value
precision_at_10
13.464
type
value
precision_at_100
1.533
type
value
precision_at_1000
0.157
type
value
precision_at_3
37.693
type
value
precision_at_5
24.792
type
value
recall_at_1
71.283
type
value
recall_at_10
95.742
type
value
recall_at_100
99.67200000000001
type
value
recall_at_1000
99.981
type
value
recall_at_3
87.888
type
value
recall_at_5
92.24
task
dataset
metrics
type
name
config
split
revision
mteb/reddit-clustering
MTEB RedditClustering
default
test
24640382cdbf8abc73003fb0fa6d111a705499eb
type
value
v_measure
56.24267063669042
task
dataset
metrics
type
name
config
split
revision
mteb/reddit-clustering-p2p
MTEB RedditClusteringP2P
default
test
282350215ef01743dc01b456c7f5241fa8937f16
type
value
v_measure
62.88056988932578
task
dataset
metrics
type
name
config
split
revision
scidocs
MTEB SCIDOCS
default
test
None
type
value
map_at_1
4.903
type
value
map_at_10
13.202
type
value
map_at_100
15.5
type
value
map_at_1000
15.870999999999999
type
value
map_at_3
9.407
type
value
map_at_5
11.238
type
value
mrr_at_10
35.867
type
value
mrr_at_100
37.001
type
value
mrr_at_1000
37.043
type
value
mrr_at_5
34.35
type
value
ndcg_at_1
24.2
type
value
ndcg_at_10
21.731
type
value
ndcg_at_100
30.7
type
value
ndcg_at_1000
36.618
type
value
ndcg_at_3
20.72
type
value
ndcg_at_5
17.954
type
value
precision_at_1
24.2
type
value
precision_at_10
11.33
type
value
precision_at_100
2.4410000000000003
type
value
precision_at_1000
0.386
type
value
precision_at_3
19.667
type
value
precision_at_5
15.86
type
value
recall_at_1
4.903
type
value
recall_at_10
22.962
type
value
recall_at_100
49.563
type
value
recall_at_1000
78.238
type
value
recall_at_3
11.953
type
value
recall_at_5
16.067999999999998
task
dataset
metrics
type
name
config
split
revision
mteb/sickr-sts
MTEB SICK-R
default
test
a6ea5a8cab320b040a23452cc28066d9beae2cee
type
value
cos_sim_pearson
84.12694254604078
type
value
cos_sim_spearman
80.30141815181918
type
value
euclidean_pearson
81.34015449877128
type
value
euclidean_spearman
80.13984197010849
type
value
manhattan_pearson
81.31767068124086
type
value
manhattan_spearman
80.11720513114103
task
dataset
metrics
type
name
config
split
revision
mteb/sts12-sts
MTEB STS12
default
test
a0d554a64d88156834ff5ae9920b964011b16384
type
value
cos_sim_pearson
86.13112984010417
type
value
cos_sim_spearman
78.03063573402875
type
value
euclidean_pearson
83.51928418844804
type
value
euclidean_spearman
78.4045235411144
type
value
manhattan_pearson
83.49981637388689
type
value
manhattan_spearman
78.4042575139372
task
dataset
metrics
type
name
config
split
revision
mteb/sts13-sts
MTEB STS13
default
test
7e90230a92c190f1bf69ae9002b8cea547a64cca
type
value
cos_sim_pearson
82.50327987379504
type
value
cos_sim_spearman
84.18556767756205
type
value
euclidean_pearson
82.69684424327679
type
value
euclidean_spearman
83.5368106038335
type
value
manhattan_pearson
82.57967581007374
type
value
manhattan_spearman
83.43009053133697
task
dataset
metrics
type
name
config
split
revision
mteb/sts14-sts
MTEB STS14
default
test
6031580fec1f6af667f0bd2da0a551cf4f0b2375
type
value
cos_sim_pearson
82.50756863007814
type
value
cos_sim_spearman
82.27204331279108
type
value
euclidean_pearson
81.39535251429741
type
value
euclidean_spearman
81.84386626336239
type
value
manhattan_pearson
81.34281737280695
type
value
manhattan_spearman
81.81149375673166
task
dataset
metrics
type
name
config
split
revision
mteb/sts15-sts
MTEB STS15
default
test
ae752c7c21bf194d8b67fd573edf7ae58183cbe3
type
value
cos_sim_pearson
86.8727714856726
type
value
cos_sim_spearman
87.95738287792312
type
value
euclidean_pearson
86.62920602795887
type
value
euclidean_spearman
87.05207355381243
type
value
manhattan_pearson
86.53587918472225
type
value
manhattan_spearman
86.95382961029586
task
dataset
metrics
type
name
config
split
revision
mteb/sts16-sts
MTEB STS16
default
test
4d8694f8f0e0100860b497b999b3dbed754a0513
type
value
cos_sim_pearson
83.52240359769479
type
value
cos_sim_spearman
85.47685776238286
type
value
euclidean_pearson
84.25815333483058
type
value
euclidean_spearman
85.27415639683198
type
value
manhattan_pearson
84.29127757025637
type
value
manhattan_spearman
85.30226224917351
task
dataset
metrics
type
name
config
split
revision
mteb/sts17-crosslingual-sts
MTEB STS17 (en-en)
en-en
test
af5e6fb845001ecf41f4c1e033ce921939a2a68d
type
value
cos_sim_pearson
86.42501708915708
type
value
cos_sim_spearman
86.42276182795041
type
value
euclidean_pearson
86.5408207354761
type
value
euclidean_spearman
85.46096321750838
type
value
manhattan_pearson
86.54177303026881
type
value
manhattan_spearman
85.50313151916117
task
dataset
metrics
type
name
config
split
revision
mteb/sts22-crosslingual-sts
MTEB STS22 (en)
en
test
6d1ba47164174a496b7fa5d3569dae26a6813b80
type
value
cos_sim_pearson
64.86521089250766
type
value
cos_sim_spearman
65.94868540323003
type
value
euclidean_pearson
67.16569626533084
type
value
euclidean_spearman
66.37667004134917
type
value
manhattan_pearson
67.1482365102333
type
value
manhattan_spearman
66.53240122580029
task
dataset
metrics
type
name
config
split
revision
mteb/stsbenchmark-sts
MTEB STSBenchmark
default
test
b0fddb56ed78048fa8b90373c8a3cfc37b684831
type
value
cos_sim_pearson
84.64746265365318
type
value
cos_sim_spearman
86.41888825906786
type
value
euclidean_pearson
85.27453642725811
type
value
euclidean_spearman
85.94095796602544
type
value
manhattan_pearson
85.28643660505334
type
value
manhattan_spearman
85.95028003260744
task
dataset
metrics
type
name
config
split
revision
mteb/scidocs-reranking
MTEB SciDocsRR
default
test
d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
type
value
map
87.48903153618527
type
value
mrr
96.41081503826601
task
dataset
metrics
type
name
config
split
revision
scifact
MTEB SciFact
default
test
None
type
value
map_at_1
58.594
type
value
map_at_10
69.296
type
value
map_at_100
69.782
type
value
map_at_1000
69.795
type
value
map_at_3
66.23
type
value
map_at_5
68.293
type
value
mrr_at_1
61.667
type
value
mrr_at_10
70.339
type
value
mrr_at_100
70.708
type
value
mrr_at_1000
70.722
type
value
mrr_at_5
69.56700000000001
type
value
ndcg_at_1
61.667
type
value
ndcg_at_10
74.039
type
value
ndcg_at_100
76.103
type
value
ndcg_at_1000
76.47800000000001
type
value
ndcg_at_3
68.967
type
value
ndcg_at_5
71.96900000000001
type
value
precision_at_1
61.667
type
value
precision_at_10
9.866999999999999
type
value
precision_at_100
1.097
type
value
precision_at_1000
0.11299999999999999
type
value
precision_at_3
27.111
type
value
precision_at_5
18.2
type
value
recall_at_1
58.594
type
value
recall_at_10
87.422
type
value
recall_at_100
96.667
type
value
recall_at_1000
99.667
type
value
recall_at_3
74.217
type
value
recall_at_5
81.539
task
dataset
metrics
type
name
config
split
revision
mteb/sprintduplicatequestions-pairclassification
MTEB SprintDuplicateQuestions
default
test
d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
type
value
cos_sim_accuracy
99.85049504950496
type
value
cos_sim_ap
96.33111544137081
type
value
cos_sim_f1
92.35443037974684
type
value
cos_sim_precision
93.53846153846153
type
value
cos_sim_recall
91.2
type
value
dot_accuracy
99.82376237623762
type
value
dot_ap
95.38082527310888
type
value
dot_f1
90.90909090909092
type
value
dot_precision
92.90187891440502
type
value
dot_recall
89.0
type
value
euclidean_accuracy
99.84851485148515
type
value
euclidean_ap
96.32316003996347
type
value
euclidean_f1
92.2071392659628
type
value
euclidean_precision
92.71991911021233
type
value
euclidean_recall
91.7
type
value
manhattan_accuracy
99.84851485148515
type
value
manhattan_ap
96.3655668249217
type
value
manhattan_f1
92.18356026222895
type
value
manhattan_precision
92.98067141403867
type
value
manhattan_recall
91.4
type
value
max_accuracy
99.85049504950496
type
value
max_ap
96.3655668249217
type
value
max_f1
92.35443037974684
task
dataset
metrics
type
name
config
split
revision
mteb/stackexchange-clustering
MTEB StackExchangeClustering
default
test
6cbc1f7b2bc0622f2e39d2c77fa502909748c259
type
value
v_measure
65.94861371629051
task
dataset
metrics
type
name
config
split
revision
mteb/stackexchange-clustering-p2p
MTEB StackExchangeClusteringP2P
default
test
815ca46b2622cec33ccafc3735d572c266efdb44
type
value
v_measure
35.009430451385
task
dataset
metrics
type
name
config
split
revision
mteb/stackoverflowdupquestions-reranking
MTEB StackOverflowDupQuestions
default
test
e185fbe320c72810689fc5848eb6114e1ef5ec69
type
value
map
54.61164066427969
type
value
mrr
55.49710603938544
task
dataset
metrics
type
name
config
split
revision
mteb/summeval
MTEB SummEval
default
test
cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
type
value
cos_sim_pearson
30.622620124907662
type
value
cos_sim_spearman
31.0678351356163
type
value
dot_pearson
30.863727693306814
type
value
dot_spearman
31.230306567021255
task
dataset
metrics
type
name
config
split
revision
trec-covid
MTEB TRECCOVID
default
test
None
type
value
map_at_10
2.011
type
value
map_at_100
10.974
type
value
map_at_1000
25.819
type
value
map_at_3
0.6649999999999999
type
value
map_at_5
1.076
type
value
mrr_at_10
91.8
type
value
mrr_at_100
91.8
type
value
mrr_at_1000
91.8
type
value
ndcg_at_1
82.0
type
value
ndcg_at_10
78.07300000000001
type
value
ndcg_at_100
58.231
type
value
ndcg_at_1000
51.153000000000006
type
value
ndcg_at_3
81.123
type
value
ndcg_at_5
81.059
type
value
precision_at_1
86.0
type
value
precision_at_10
83.0
type
value
precision_at_100
59.38
type
value
precision_at_1000
22.55
type
value
precision_at_3
87.333
type
value
precision_at_5
86.8
type
value
recall_at_1
0.22
type
value
recall_at_10
2.2079999999999997
type
value
recall_at_100
14.069
type
value
recall_at_1000
47.678
type
value
recall_at_3
0.7040000000000001
type
value
recall_at_5
1.161
task
dataset
metrics
type
name
config
split
revision
webis-touche2020
MTEB Touche2020
default
test
None
type
value
map_at_1
2.809
type
value
map_at_10
10.394
type
value
map_at_100
16.598
type
value
map_at_1000
18.142
type
value
map_at_3
5.572
type
value
map_at_5
7.1370000000000005
type
value
mrr_at_1
32.653
type
value
mrr_at_10
46.564
type
value
mrr_at_100
47.469
type
value
mrr_at_1000
47.469
type
value
mrr_at_3
42.177
type
value
mrr_at_5
44.524
type
value
ndcg_at_1
30.612000000000002
type
value
ndcg_at_10
25.701
type
value
ndcg_at_100
37.532
type
value
ndcg_at_1000
48.757
type
value
ndcg_at_3
28.199999999999996
type
value
ndcg_at_5
25.987
type
value
precision_at_1
32.653
type
value
precision_at_10
23.469
type
value
precision_at_100
7.9799999999999995
type
value
precision_at_1000
1.5350000000000001
type
value
precision_at_3
29.932
type
value
precision_at_5
26.122
type
value
recall_at_1
2.809
type
value
recall_at_10
16.887
type
value
recall_at_100
48.67
type
value
recall_at_1000
82.89699999999999
type
value
recall_at_3
6.521000000000001
type
value
recall_at_5
9.609
task
dataset
metrics
type
name
config
split
revision
mteb/toxic_conversations_50k
MTEB ToxicConversationsClassification
default
test
d7c0de2777da35d6aae2200a62c6e0e5af397c4c
type
value
accuracy
71.57860000000001
type
value
ap
13.82629211536393
type
value
f1
54.59860966183956
task
dataset
metrics
type
name
config
split
revision
mteb/tweet_sentiment_extraction
MTEB TweetSentimentExtractionClassification
default
test
d604517c81ca91fe16a244d1248fc021f9ecee7a
type
value
accuracy
59.38030560271647
type
value
f1
59.69685552567865
task
dataset
metrics
type
name
config
split
revision
mteb/twentynewsgroups-clustering
MTEB TwentyNewsgroupsClustering
default
test
6125ec4e24fa026cec8a478383ee943acfbd5449
type
value
v_measure
51.4736717043405
task
dataset
metrics
type
name
config
split
revision
mteb/twittersemeval2015-pairclassification
MTEB TwitterSemEval2015
default
test
70970daeab8776df92f5ea462b6173c0b46fd2d1
type
value
cos_sim_accuracy
86.92853311080646
type
value
cos_sim_ap
77.67872502591382
type
value
cos_sim_f1
70.33941236068895
type
value
cos_sim_precision
67.63273258645884
type
value
cos_sim_recall
73.27176781002639
type
value
dot_accuracy
85.79603027954938
type
value
dot_ap
73.73786190233379
type
value
dot_f1
67.3437901774235
type
value
dot_precision
65.67201604814443
type
value
dot_recall
69.10290237467018
type
value
euclidean_accuracy
86.94045419324074
type
value
euclidean_ap
77.6687791535167
type
value
euclidean_f1
70.47209214023542
type
value
euclidean_precision
67.7207492094381
type
value
euclidean_recall
73.45646437994723
type
value
manhattan_accuracy
86.87488823985218
type
value
manhattan_ap
77.63373392430728
type
value
manhattan_f1
70.40920716112532
type
value
manhattan_precision
68.31265508684864
type
value
manhattan_recall
72.63852242744063
type
value
max_accuracy
86.94045419324074
type
value
max_ap
77.67872502591382
type
value
max_f1
70.47209214023542
task
dataset
metrics
type
name
config
split
revision
mteb/twitterurlcorpus-pairclassification
MTEB TwitterURLCorpus
default
test
8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
type
value
cos_sim_accuracy
88.67155664221679
type
value
cos_sim_ap
85.64591703003417
type
value
cos_sim_f1
77.59531005352656
type
value
cos_sim_precision
73.60967184801382
type
value
cos_sim_recall
82.03726516784724
type
value
dot_accuracy
88.41541506578181
type
value
dot_ap
84.6482788957769
type
value
dot_f1
77.04748541466657
type
value
dot_precision
74.02440754931176
type
value
dot_recall
80.3279950723745
type
value
euclidean_accuracy
88.63080684596576
type
value
euclidean_ap
85.44570045321562
type
value
euclidean_f1
77.28769403336106
type
value
euclidean_precision
72.90600040958427
type
value
euclidean_recall
82.22975053895904
type
value
manhattan_accuracy
88.59393798269105
type
value
manhattan_ap
85.40271361038187
type
value
manhattan_f1
77.17606419344392
type
value
manhattan_precision
72.4447747078295
type
value
manhattan_recall
82.5685247921158
type
value
max_accuracy
88.67155664221679
type
value
max_ap
85.64591703003417
type
value
max_f1
77.59531005352656
mit
FlagEmbedding
Model List |
FAQ |
Usage |
Evaluation |
Train |
Contact |
Citation |
License
For more details please refer to our Github: FlagEmbedding .
If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3 .
English | 中文
FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently:
News
1/30/2024: Release BGE-M3 , a new member to BGE model series! M3 stands for M ulti-linguality (100+ languages), M ulti-granularities (input length up to 8192), M ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval).
It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks.
Technical Report and Code . 🔥
1/9/2024: Release Activation-Beacon , an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report 🔥
12/24/2023: Release LLaRA , a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report 🔥
11/23/2023: Release LM-Cocktail , a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report 🔥
10/12/2023: Release LLM-Embedder , a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report
09/15/2023: The technical report and massive training data of BGE has been released
09/12/2023: New models:
New reranker model : release cross-encoder models BAAI/bge-reranker-base and BAAI/bge-reranker-large, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
update embedding model : release bge-*-v1.5 embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More
09/07/2023: Update fine-tune code : Add script to mine hard negatives and support adding instruction during fine-tuning.
08/09/2023: BGE Models are integrated into Langchain , you can use it like this ; C-MTEB leaderboard is available .
08/05/2023: Release base-scale and small-scale models, best performance among the models of the same size 🤗
08/02/2023: Release bge-large-*(short for BAAI General Embedding) Models, rank 1st on MTEB and C-MTEB benchmark! 🎉 🎉
08/01/2023: We release the Chinese Massive Text Embedding Benchmark (C-MTEB ), consisting of 31 test dataset.
Model List
bge is short for BAAI general embedding.
[1]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, no instruction needs to be added to passages.
[2]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.
For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results.
All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI .
If you cannot open the Huggingface Hub, you also can download the models at https://model.baai.ac.cn/models .
Frequently asked questions
1. How to fine-tune bge embedding model?
Following this example to prepare data and fine-tune your model.
Some suggestions:
Mine hard negatives following this example , which can improve the retrieval performance.
If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity.
If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2. The similarity score between two dissimilar sentences is higher than 0.5
Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.
Since we finetune the models by contrastive learning with a temperature of 0.01,
the similarity distribution of the current BGE model is about in the interval [0.6, 1].
So a similarity score greater than 0.5 does not indicate that the two sentences are similar.
For downstream tasks, such as passage retrieval or semantic similarity,
what matters is the relative order of the scores, not the absolute value.
If you need to filter similar sentences based on a similarity threshold,
please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used
For the bge-*-v1.5, we improve its retrieval ability when not using instruction.
No instruction only has a slight degradation in retrieval performance compared with using instruction.
So you can generate embedding without instruction in all cases for convenience.
For a retrieval task that uses short queries to find long related documents,
it is recommended to add instructions for these short queries.
The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.
In all cases, the documents/passages do not need to add the instruction.
Usage
Usage for Embedding Model
Here are some examples for using bge models with
FlagEmbedding , Sentence-Transformers , Langchain , or Huggingface Transformers .
Using FlagEmbedding
If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding.
For the value of the argument query_instruction_for_retrieval, see Model List .
By default, FlagModel will use all available GPUs when encoding. Please set os.environ["CUDA_VISIBLE_DEVICES"] to select specific GPUs.
You also can set os.environ["CUDA_VISIBLE_DEVICES"]="" to make all GPUs unavailable.
Using Sentence-Transformers
You can also use the bge models with sentence-transformers :
For s2p(short query to long passage) retrieval task,
each short query should start with an instruction (instructions see Model List ).
But the instruction is not needed for passages.
Using Langchain
You can use bge in langchain like this:
Using HuggingFace Transformers
With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.
Usage of the ONNX files
Usage via infinity
Its also possible to deploy the onnx files with the infinity_emb pip package.
Usage for Reranker
Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.
You can get a relevance score by inputting query and passage to the reranker.
The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
Using FlagEmbedding
Get relevance scores (higher scores indicate more relevance):
Using Huggingface transformers
Evaluation
baai-general-embedding models achieve state-of-the-art performance on both MTEB and C-MTEB leaderboard!
For more details and evaluation tools see our scripts .
Model Name
Dimension
Sequence Length
Average (56)
Retrieval (15)
Clustering (11)
Pair Classification (3)
Reranking (4)
STS (10)
Summarization (1)
Classification (12)
BAAI/bge-large-en-v1.5
1024
512
64.23
54.29
46.08
87.12
60.03
83.11
31.61
75.97
BAAI/bge-base-en-v1.5
768
512
63.55
53.25
45.77
86.55
58.86
82.4
31.07
75.53
BAAI/bge-small-en-v1.5
384
512
62.17
51.68
43.82
84.92
58.36
81.59
30.12
74.14
bge-large-en
1024
512
63.98
53.9
46.98
85.8
59.48
81.56
32.06
76.21
bge-base-en
768
512
63.36
53.0
46.32
85.86
58.7
81.84
29.27
75.27
gte-large
1024
512
63.13
52.22
46.84
85.00
59.13
83.35
31.66
73.33
gte-base
768
512
62.39
51.14
46.2
84.57
58.61
82.3
31.17
73.01
e5-large-v2
1024
512
62.25
50.56
44.49
86.03
56.61
82.05
30.19
75.24
bge-small-en
384
512
62.11
51.82
44.31
83.78
57.97
80.72
30.53
74.37
instructor-xl
768
512
61.79
49.26
44.74
86.62
57.29
83.06
32.32
61.79
e5-base-v2
768
512
61.5
50.29
43.80
85.73
55.91
81.05
30.28
73.84
gte-small
384
512
61.36
49.46
44.89
83.54
57.7
82.07
30.42
72.31
text-embedding-ada-002
1536
8192
60.99
49.25
45.9
84.89
56.32
80.97
30.8
70.93
e5-small-v2
384
512
59.93
49.04
39.92
84.67
54.32
80.39
31.16
72.94
sentence-t5-xxl
768
512
59.51
42.24
43.72
85.06
56.42
82.63
30.08
73.42
all-mpnet-base-v2
768
514
57.78
43.81
43.69
83.04
59.36
80.28
27.49
65.07
sgpt-bloom-7b1-msmarco
4096
2048
57.59
48.22
38.93
81.9
55.65
77.74
33.6
66.19
C-MTEB :
We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks.
Please refer to C_MTEB for a detailed introduction.
Reranking :
See C_MTEB for evaluation script.
Model
T2Reranking
T2RerankingZh2En*
T2RerankingEn2Zh*
MMarcoReranking
CMedQAv1
CMedQAv2
Avg
text2vec-base-multilingual
64.66
62.94
62.51
14.37
48.46
48.6
50.26
multilingual-e5-small
65.62
60.94
56.41
29.91
67.26
66.54
57.78
multilingual-e5-large
64.55
61.61
54.28
28.6
67.42
67.92
57.4
multilingual-e5-base
64.21
62.13
54.68
29.5
66.23
66.98
57.29
m3e-base
66.03
62.74
56.07
17.51
77.05
76.76
59.36
m3e-large
66.13
62.72
56.1
16.46
77.76
78.27
59.57
bge-base-zh-v1.5
66.49
63.25
57.02
29.74
80.47
84.88
63.64
bge-large-zh-v1.5
65.74
63.39
57.03
28.74
83.45
85.44
63.97
BAAI/bge-reranker-base
67.28
63.95
60.45
35.46
81.26
84.1
65.42
BAAI/bge-reranker-large
67.6
64.03
61.44
37.16
82.15
84.18
66.09
* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks
Train
BAAI Embedding
We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning.
You can fine-tune the embedding model on your data following our examples .
We also provide a pre-train example .
Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned.
More training details for bge see baai_general_embedding .
BGE Reranker
Cross-encoder will perform full-attention over the input pair,
which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model.
Therefore, it can be used to re-rank the top-k documents returned by embedding model.
We train the cross-encoder on a multilingual pair data,
The data format is the same as embedding model, so you can fine-tune it easily following our example .
More details please refer to ./FlagEmbedding/reranker/README.md
Contact
If you have any question or suggestion related to this project, feel free to open an issue or pull request.
You also can email Shitao Xiao(stxiao@baai.ac.cn ) and Zheng Liu(liuzheng@baai.ac.cn ).
Citation
If you find this repository useful, please consider giving a star ⭐ and citation
License
FlagEmbedding is licensed under the MIT License . The released models can be used for commercial purposes free of charge.