ModelHub XC d561db1f48 初始化项目,由ModelHub XC社区提供模型
Model: aspire/acge_text_embedding
Source: Original Platform
2026-05-14 14:59:48 +08:00

pipeline_tag, tags, model-index
pipeline_tag tags model-index
sentence-similarity
mteb
sentence-transformers
feature-extraction
sentence-similarity
name results
acge_text_embedding
task dataset metrics
type
STS
type name config split revision
C-MTEB/AFQMC MTEB AFQMC default validation b44c3b011063adb25877c13823db83bb193913c4
type value
cos_sim_pearson 54.03434872650919
type value
cos_sim_spearman 58.80730796688325
type value
euclidean_pearson 57.47231387497989
type value
euclidean_spearman 58.80775026351807
type value
manhattan_pearson 57.46332720141574
type value
manhattan_spearman 58.80196022940078
task dataset metrics
type
STS
type name config split revision
C-MTEB/ATEC MTEB ATEC default test 0f319b1142f28d00e055a6770f3f726ae9b7d865
type value
cos_sim_pearson 53.52621290548175
type value
cos_sim_spearman 57.945227768312144
type value
euclidean_pearson 61.17041394151802
type value
euclidean_spearman 57.94553287835657
type value
manhattan_pearson 61.168327500057885
type value
manhattan_spearman 57.94477516925043
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_reviews_multi MTEB AmazonReviewsClassification (zh) zh test 1399c76144fd37290681b995c656ef9b2e06e26d
type value
accuracy 48.538000000000004
type value
f1 46.59920995594044
task dataset metrics
type
STS
type name config split revision
C-MTEB/BQ MTEB BQ default test e3dda5e115e487b39ec7e618c0c6a29137052a55
type value
cos_sim_pearson 68.27529991817154
type value
cos_sim_spearman 70.37095914176643
type value
euclidean_pearson 69.42690712802727
type value
euclidean_spearman 70.37017971889912
type value
manhattan_pearson 69.40264877917839
type value
manhattan_spearman 70.34786744049524
task dataset metrics
type
Clustering
type name config split revision
C-MTEB/CLSClusteringP2P MTEB CLSClusteringP2P default test 4b6227591c6c1a73bc76b1055f3b7f3588e72476
type value
v_measure 47.08027536192709
task dataset metrics
type
Clustering
type name config split revision
C-MTEB/CLSClusteringS2S MTEB CLSClusteringS2S default test e458b3f5414b62b7f9f83499ac1f5497ae2e869f
type value
v_measure 44.0526024940363
task dataset metrics
type
Reranking
type name config split revision
C-MTEB/CMedQAv1-reranking MTEB CMedQAv1 default test 8d7f1e942507dac42dc58017c1a001c3717da7df
type value
map 88.65974993133156
type value
mrr 90.64761904761905
task dataset metrics
type
Reranking
type name config split revision
C-MTEB/CMedQAv2-reranking MTEB CMedQAv2 default test 23d186750531a14a0357ca22cd92d712fd512ea0
type value
map 88.90396838907245
type value
mrr 90.90932539682541
task dataset metrics
type
Retrieval
type name config split revision
C-MTEB/CmedqaRetrieval MTEB CmedqaRetrieval default dev cd540c506dae1cf9e9a59c3e06f42030d54e7301
type value
map_at_1 26.875
type value
map_at_10 39.995999999999995
type value
map_at_100 41.899
type value
map_at_1000 42.0
type value
map_at_3 35.414
type value
map_at_5 38.019
type value
mrr_at_1 40.635
type value
mrr_at_10 48.827
type value
mrr_at_100 49.805
type value
mrr_at_1000 49.845
type value
mrr_at_3 46.145
type value
mrr_at_5 47.693999999999996
type value
ndcg_at_1 40.635
type value
ndcg_at_10 46.78
type value
ndcg_at_100 53.986999999999995
type value
ndcg_at_1000 55.684
type value
ndcg_at_3 41.018
type value
ndcg_at_5 43.559
type value
precision_at_1 40.635
type value
precision_at_10 10.427999999999999
type value
precision_at_100 1.625
type value
precision_at_1000 0.184
type value
precision_at_3 23.139000000000003
type value
precision_at_5 17.004
type value
recall_at_1 26.875
type value
recall_at_10 57.887
type value
recall_at_100 87.408
type value
recall_at_1000 98.721
type value
recall_at_3 40.812
type value
recall_at_5 48.397
task dataset metrics
type
PairClassification
type name config split revision
C-MTEB/CMNLI MTEB Cmnli default validation 41bc36f332156f7adc9e38f53777c959b2ae9766
type value
cos_sim_accuracy 83.43956704750451
type value
cos_sim_ap 90.49172854352659
type value
cos_sim_f1 84.28475486903963
type value
cos_sim_precision 80.84603822203135
type value
cos_sim_recall 88.02899228431144
type value
dot_accuracy 83.43956704750451
type value
dot_ap 90.46317132695233
type value
dot_f1 84.28794294628929
type value
dot_precision 80.51948051948052
type value
dot_recall 88.4264671498714
type value
euclidean_accuracy 83.43956704750451
type value
euclidean_ap 90.49171785256486
type value
euclidean_f1 84.28235820561584
type value
euclidean_precision 80.8022308022308
type value
euclidean_recall 88.07575403320084
type value
manhattan_accuracy 83.55983162958509
type value
manhattan_ap 90.48046779812815
type value
manhattan_f1 84.45354259069714
type value
manhattan_precision 82.21877767936226
type value
manhattan_recall 86.81318681318682
type value
max_accuracy 83.55983162958509
type value
max_ap 90.49172854352659
type value
max_f1 84.45354259069714
task dataset metrics
type
Retrieval
type name config split revision
C-MTEB/CovidRetrieval MTEB CovidRetrieval default dev 1271c7809071a13532e05f25fb53511ffce77117
type value
map_at_1 68.54599999999999
type value
map_at_10 77.62400000000001
type value
map_at_100 77.886
type value
map_at_1000 77.89
type value
map_at_3 75.966
type value
map_at_5 76.995
type value
mrr_at_1 68.915
type value
mrr_at_10 77.703
type value
mrr_at_100 77.958
type value
mrr_at_1000 77.962
type value
mrr_at_3 76.08
type value
mrr_at_5 77.118
type value
ndcg_at_1 68.809
type value
ndcg_at_10 81.563
type value
ndcg_at_100 82.758
type value
ndcg_at_1000 82.864
type value
ndcg_at_3 78.29
type value
ndcg_at_5 80.113
type value
precision_at_1 68.809
type value
precision_at_10 9.463000000000001
type value
precision_at_100 1.001
type value
precision_at_1000 0.101
type value
precision_at_3 28.486
type value
precision_at_5 18.019
type value
recall_at_1 68.54599999999999
type value
recall_at_10 93.625
type value
recall_at_100 99.05199999999999
type value
recall_at_1000 99.895
type value
recall_at_3 84.879
type value
recall_at_5 89.252
task dataset metrics
type
Retrieval
type name config split revision
C-MTEB/DuRetrieval MTEB DuRetrieval default dev a1a333e290fe30b10f3f56498e3a0d911a693ced
type value
map_at_1 25.653
type value
map_at_10 79.105
type value
map_at_100 81.902
type value
map_at_1000 81.947
type value
map_at_3 54.54599999999999
type value
map_at_5 69.226
type value
mrr_at_1 89.35
type value
mrr_at_10 92.69
type value
mrr_at_100 92.77
type value
mrr_at_1000 92.774
type value
mrr_at_3 92.425
type value
mrr_at_5 92.575
type value
ndcg_at_1 89.35
type value
ndcg_at_10 86.55199999999999
type value
ndcg_at_100 89.35300000000001
type value
ndcg_at_1000 89.782
type value
ndcg_at_3 85.392
type value
ndcg_at_5 84.5
type value
precision_at_1 89.35
type value
precision_at_10 41.589999999999996
type value
precision_at_100 4.781
type value
precision_at_1000 0.488
type value
precision_at_3 76.683
type value
precision_at_5 65.06
type value
recall_at_1 25.653
type value
recall_at_10 87.64999999999999
type value
recall_at_100 96.858
type value
recall_at_1000 99.13300000000001
type value
recall_at_3 56.869
type value
recall_at_5 74.024
task dataset metrics
type
Retrieval
type name config split revision
C-MTEB/EcomRetrieval MTEB EcomRetrieval default dev 687de13dc7294d6fd9be10c6945f9e8fec8166b9
type value
map_at_1 52.1
type value
map_at_10 62.629999999999995
type value
map_at_100 63.117000000000004
type value
map_at_1000 63.134
type value
map_at_3 60.267
type value
map_at_5 61.777
type value
mrr_at_1 52.1
type value
mrr_at_10 62.629999999999995
type value
mrr_at_100 63.117000000000004
type value
mrr_at_1000 63.134
type value
mrr_at_3 60.267
type value
mrr_at_5 61.777
type value
ndcg_at_1 52.1
type value
ndcg_at_10 67.596
type value
ndcg_at_100 69.95
type value
ndcg_at_1000 70.33500000000001
type value
ndcg_at_3 62.82600000000001
type value
ndcg_at_5 65.546
type value
precision_at_1 52.1
type value
precision_at_10 8.309999999999999
type value
precision_at_100 0.941
type value
precision_at_1000 0.097
type value
precision_at_3 23.400000000000002
type value
precision_at_5 15.36
type value
recall_at_1 52.1
type value
recall_at_10 83.1
type value
recall_at_100 94.1
type value
recall_at_1000 97.0
type value
recall_at_3 70.19999999999999
type value
recall_at_5 76.8
task dataset metrics
type
Classification
type name config split revision
C-MTEB/IFlyTek-classification MTEB IFlyTek default validation 421605374b29664c5fc098418fe20ada9bd55f8a
type value
accuracy 51.773759138130046
type value
f1 40.341407912920054
task dataset metrics
type
Classification
type name config split revision
C-MTEB/JDReview-classification MTEB JDReview default test b7c64bd89eb87f8ded463478346f76731f07bf8b
type value
accuracy 86.69793621013133
type value
ap 55.46718958939327
type value
f1 81.48228915952436
task dataset metrics
type
STS
type name config split revision
C-MTEB/LCQMC MTEB LCQMC default test 17f9b096f80380fce5ed12a9be8be7784b337daf
type value
cos_sim_pearson 71.1397780205448
type value
cos_sim_spearman 78.17368193033309
type value
euclidean_pearson 77.4849177602368
type value
euclidean_spearman 78.17369079663212
type value
manhattan_pearson 77.47344305182406
type value
manhattan_spearman 78.16454335155387
task dataset metrics
type
Reranking
type name config split revision
C-MTEB/Mmarco-reranking MTEB MMarcoReranking default dev 8e0c766dbe9e16e1d221116a3f36795fbade07f6
type value
map 27.76160559006673
type value
mrr 28.02420634920635
task dataset metrics
type
Retrieval
type name config split revision
C-MTEB/MMarcoRetrieval MTEB MMarcoRetrieval default dev 539bbde593d947e2a124ba72651aafc09eb33fc2
type value
map_at_1 65.661
type value
map_at_10 74.752
type value
map_at_100 75.091
type value
map_at_1000 75.104
type value
map_at_3 72.997
type value
map_at_5 74.119
type value
mrr_at_1 67.923
type value
mrr_at_10 75.376
type value
mrr_at_100 75.673
type value
mrr_at_1000 75.685
type value
mrr_at_3 73.856
type value
mrr_at_5 74.82799999999999
type value
ndcg_at_1 67.923
type value
ndcg_at_10 78.424
type value
ndcg_at_100 79.95100000000001
type value
ndcg_at_1000 80.265
type value
ndcg_at_3 75.101
type value
ndcg_at_5 76.992
type value
precision_at_1 67.923
type value
precision_at_10 9.474
type value
precision_at_100 1.023
type value
precision_at_1000 0.105
type value
precision_at_3 28.319
type value
precision_at_5 17.986
type value
recall_at_1 65.661
type value
recall_at_10 89.09899999999999
type value
recall_at_100 96.023
type value
recall_at_1000 98.455
type value
recall_at_3 80.314
type value
recall_at_5 84.81
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_massive_intent MTEB MassiveIntentClassification (zh-CN) zh-CN test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
type value
accuracy 75.86751849361131
type value
f1 73.04918450508
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_massive_scenario MTEB MassiveScenarioClassification (zh-CN) zh-CN test 7d571f92784cd94a019292a1f45445077d0ef634
type value
accuracy 78.4364492266308
type value
f1 78.120686034844
task dataset metrics
type
Retrieval
type name config split revision
C-MTEB/MedicalRetrieval MTEB MedicalRetrieval default dev 2039188fb5800a9803ba5048df7b76e6fb151fc6
type value
map_at_1 55.00000000000001
type value
map_at_10 61.06399999999999
type value
map_at_100 61.622
type value
map_at_1000 61.663000000000004
type value
map_at_3 59.583
type value
map_at_5 60.373
type value
mrr_at_1 55.2
type value
mrr_at_10 61.168
type value
mrr_at_100 61.726000000000006
type value
mrr_at_1000 61.767
type value
mrr_at_3 59.683
type value
mrr_at_5 60.492999999999995
type value
ndcg_at_1 55.00000000000001
type value
ndcg_at_10 64.098
type value
ndcg_at_100 67.05
type value
ndcg_at_1000 68.262
type value
ndcg_at_3 61.00600000000001
type value
ndcg_at_5 62.439
type value
precision_at_1 55.00000000000001
type value
precision_at_10 7.37
type value
precision_at_100 0.881
type value
precision_at_1000 0.098
type value
precision_at_3 21.7
type value
precision_at_5 13.719999999999999
type value
recall_at_1 55.00000000000001
type value
recall_at_10 73.7
type value
recall_at_100 88.1
type value
recall_at_1000 97.8
type value
recall_at_3 65.10000000000001
type value
recall_at_5 68.60000000000001
task dataset metrics
type
Classification
type name config split revision
C-MTEB/MultilingualSentiment-classification MTEB MultilingualSentiment default validation 46958b007a63fdbf239b7672c25d0bea67b5ea1a
type value
accuracy 77.52666666666667
type value
f1 77.49784731367215
task dataset metrics
type
PairClassification
type name config split revision
C-MTEB/OCNLI MTEB Ocnli default validation 66e76a618a34d6d565d5538088562851e6daa7ec
type value
cos_sim_accuracy 81.10449377368705
type value
cos_sim_ap 85.17742765935606
type value
cos_sim_f1 83.00094966761633
type value
cos_sim_precision 75.40983606557377
type value
cos_sim_recall 92.29144667370645
type value
dot_accuracy 81.10449377368705
type value
dot_ap 85.17143850809614
type value
dot_f1 83.01707779886148
type value
dot_precision 75.36606373815677
type value
dot_recall 92.39704329461456
type value
euclidean_accuracy 81.10449377368705
type value
euclidean_ap 85.17856775343333
type value
euclidean_f1 83.00094966761633
type value
euclidean_precision 75.40983606557377
type value
euclidean_recall 92.29144667370645
type value
manhattan_accuracy 81.05035192203573
type value
manhattan_ap 85.14464459395809
type value
manhattan_f1 82.96155671570953
type value
manhattan_precision 75.3448275862069
type value
manhattan_recall 92.29144667370645
type value
max_accuracy 81.10449377368705
type value
max_ap 85.17856775343333
type value
max_f1 83.01707779886148
task dataset metrics
type
Classification
type name config split revision
C-MTEB/OnlineShopping-classification MTEB OnlineShopping default test e610f2ebd179a8fda30ae534c3878750a96db120
type value
accuracy 93.71000000000001
type value
ap 91.83202232349356
type value
f1 93.69900560334331
task dataset metrics
type
STS
type name config split revision
C-MTEB/PAWSX MTEB PAWSX default test 9c6a90e430ac22b5779fb019a23e820b11a8b5e1
type value
cos_sim_pearson 39.175047651512415
type value
cos_sim_spearman 45.51434675777896
type value
euclidean_pearson 44.864110004132286
type value
euclidean_spearman 45.516433048896076
type value
manhattan_pearson 44.87153627706517
type value
manhattan_spearman 45.52862617925012
task dataset metrics
type
STS
type name config split revision
C-MTEB/QBQTC MTEB QBQTC default test 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7
type value
cos_sim_pearson 34.249579701429084
type value
cos_sim_spearman 37.30903127368978
type value
euclidean_pearson 35.129438425253355
type value
euclidean_spearman 37.308544018709085
type value
manhattan_pearson 35.08936153503652
type value
manhattan_spearman 37.25582901077839
task dataset metrics
type
STS
type name config split revision
mteb/sts22-crosslingual-sts MTEB STS22 (zh) zh test eea2b4fe26a775864c896887d910b76a8098ad3f
type value
cos_sim_pearson 61.29309637460004
type value
cos_sim_spearman 65.85136090376717
type value
euclidean_pearson 64.04783990953557
type value
euclidean_spearman 65.85036859610366
type value
manhattan_pearson 63.995852552712186
type value
manhattan_spearman 65.86508416749417
task dataset metrics
type
STS
type name config split revision
C-MTEB/STSB MTEB STSB default test 0cde68302b3541bb8b3c340dc0644b0b745b3dc0
type value
cos_sim_pearson 81.5595940455587
type value
cos_sim_spearman 82.72654634579749
type value
euclidean_pearson 82.4892721061365
type value
euclidean_spearman 82.72678504228253
type value
manhattan_pearson 82.4770861422454
type value
manhattan_spearman 82.71137469783162
task dataset metrics
type
Reranking
type name config split revision
C-MTEB/T2Reranking MTEB T2Reranking default dev 76631901a18387f85eaa53e5450019b87ad58ef9
type value
map 66.6159547610527
type value
mrr 76.35739406347057
task dataset metrics
type
Retrieval
type name config split revision
C-MTEB/T2Retrieval MTEB T2Retrieval default dev 8731a845f1bf500a4f111cf1070785c793d10e64
type value
map_at_1 27.878999999999998
type value
map_at_10 77.517
type value
map_at_100 81.139
type value
map_at_1000 81.204
type value
map_at_3 54.728
type value
map_at_5 67.128
type value
mrr_at_1 90.509
type value
mrr_at_10 92.964
type value
mrr_at_100 93.045
type value
mrr_at_1000 93.048
type value
mrr_at_3 92.551
type value
mrr_at_5 92.81099999999999
type value
ndcg_at_1 90.509
type value
ndcg_at_10 85.075
type value
ndcg_at_100 88.656
type value
ndcg_at_1000 89.25699999999999
type value
ndcg_at_3 86.58200000000001
type value
ndcg_at_5 85.138
type value
precision_at_1 90.509
type value
precision_at_10 42.05
type value
precision_at_100 5.013999999999999
type value
precision_at_1000 0.516
type value
precision_at_3 75.551
type value
precision_at_5 63.239999999999995
type value
recall_at_1 27.878999999999998
type value
recall_at_10 83.941
type value
recall_at_100 95.568
type value
recall_at_1000 98.55000000000001
type value
recall_at_3 56.374
type value
recall_at_5 70.435
task dataset metrics
type
Classification
type name config split revision
C-MTEB/TNews-classification MTEB TNews default validation 317f262bf1e6126357bbe89e875451e4b0938fe4
type value
accuracy 53.687
type value
f1 51.86911933364655
task dataset metrics
type
Clustering
type name config split revision
C-MTEB/ThuNewsClusteringP2P MTEB ThuNewsClusteringP2P default test 5798586b105c0434e4f0fe5e767abe619442cf93
type value
v_measure 74.65887489872564
task dataset metrics
type
Clustering
type name config split revision
C-MTEB/ThuNewsClusteringS2S MTEB ThuNewsClusteringS2S default test 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d
type value
v_measure 69.00410995984436
task dataset metrics
type
Retrieval
type name config split revision
C-MTEB/VideoRetrieval MTEB VideoRetrieval default dev 58c2597a5943a2ba48f4668c3b90d796283c5639
type value
map_at_1 59.4
type value
map_at_10 69.214
type value
map_at_100 69.72699999999999
type value
map_at_1000 69.743
type value
map_at_3 67.717
type value
map_at_5 68.782
type value
mrr_at_1 59.4
type value
mrr_at_10 69.214
type value
mrr_at_100 69.72699999999999
type value
mrr_at_1000 69.743
type value
mrr_at_3 67.717
type value
mrr_at_5 68.782
type value
ndcg_at_1 59.4
type value
ndcg_at_10 73.32300000000001
type value
ndcg_at_100 75.591
type value
ndcg_at_1000 75.98700000000001
type value
ndcg_at_3 70.339
type value
ndcg_at_5 72.246
type value
precision_at_1 59.4
type value
precision_at_10 8.59
type value
precision_at_100 0.96
type value
precision_at_1000 0.099
type value
precision_at_3 25.967000000000002
type value
precision_at_5 16.5
type value
recall_at_1 59.4
type value
recall_at_10 85.9
type value
recall_at_100 96.0
type value
recall_at_1000 99.1
type value
recall_at_3 77.9
type value
recall_at_5 82.5
task dataset metrics
type
Classification
type name config split revision
C-MTEB/waimai-classification MTEB Waimai default test 339287def212450dcaa9df8c22bf93e9980c7023
type value
accuracy 88.53
type value
ap 73.56216166534062
type value
f1 87.06093694294485
icon

acge model

acge模型来自于合合信息技术团队,对外技术试用平台TextIn, github开源链接为github。合合信息是行业领先的人工智能及大数据科技企业致力于通过智能文字识别及商业大数据领域的核心技术、C端和B端产品以及行业解决方案为全球企业和个人用户提供创新的数字化、智能化服务。

技术交流请联系yanhui_he@intsig.net,商务合作请联系simon_liu@intsig.net,可以点击图片,扫面二维码来加入我们的微信社群。想加入合合信息,做“文档解析”、“文档检索”、“文档预研”的同学可以投简历给min_du@intsig.net,也可直接添加HR微信详聊岗位内容。

acge是一个通用的文本编码模型是一个可变长度的向量化模型使用了Matryoshka Representation Learning,如图所示:

matryoshka-small

建议使用的维度为1024或者1792

Model Name Model Size (GB) Dimension Sequence Length Language Need instruction for retrieval?
acge-text-embedding 0.65 [1024, 1792] 1024 Chinese NO

Metric

C-MTEB leaderboard (Chinese)

测试的时候因为数据的随机性、显卡、推理的数据类型导致每次推理的结果不一致我总共测试了4次不同的显卡(A10 A100)不同的数据类型测试结果放在了result文件夹中选取了一个精度最低的测试作为最终的精度测试。 根据infgrad的建议选取不用的输入的长度作为测试Sequence Length为512时测试最佳。

Model Name GPU tensor-type Model Size (GB) Dimension Sequence Length Average (35) Classification (9) Clustering (4) Pair Classification (2) Reranking (4) Retrieval (8) STS (8)
acge_text_embedding NVIDIA TESLA A10 bfloat16 0.65 1792 1024 68.91 72.76 58.22 87.82 67.67 72.48 62.24
acge_text_embedding NVIDIA TESLA A100 bfloat16 0.65 1792 1024 68.91 72.77 58.35 87.82 67.53 72.48 62.24
acge_text_embedding NVIDIA TESLA A100 float16 0.65 1792 1024 68.99 72.76 58.68 87.84 67.89 72.49 62.24
acge_text_embedding NVIDIA TESLA A100 float32 0.65 1792 1024 68.98 72.76 58.58 87.83 67.91 72.49 62.24
acge_text_embedding NVIDIA TESLA A100 float16 0.65 1792 768 68.95 72.76 58.68 87.84 67.86 72.48 62.07
acge_text_embedding NVIDIA TESLA A100 float16 0.65 1792 512 69.07 72.75 58.7 87.84 67.99 72.93 62.09

Reproduce our results

C-MTEB:

import torch
import argparse
import functools
from C_MTEB.tasks import *
from typing import List, Dict
from sentence_transformers import SentenceTransformer
from mteb import MTEB, DRESModel


class RetrievalModel(DRESModel):
    def __init__(self, encoder, **kwargs):
        self.encoder = encoder

    def encode_queries(self, queries: List[str], **kwargs) -> np.ndarray:
        input_texts = ['{}'.format(q) for q in queries]
        return self._do_encode(input_texts)

    def encode_corpus(self, corpus: List[Dict[str, str]], **kwargs) -> np.ndarray:
        input_texts = ['{} {}'.format(doc.get('title', ''), doc['text']).strip() for doc in corpus]
        input_texts = ['{}'.format(t) for t in input_texts]
        return self._do_encode(input_texts)

    @torch.no_grad()
    def _do_encode(self, input_texts: List[str]) -> np.ndarray:
        return self.encoder.encode(
            sentences=input_texts,
            batch_size=512,
            normalize_embeddings=True,
            convert_to_numpy=True
        )


def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--model_name_or_path', default="acge_text_embedding", type=str)
    parser.add_argument('--task_type', default=None, type=str)
    parser.add_argument('--pooling_method', default='cls', type=str)
    parser.add_argument('--output_dir', default='zh_results',
                        type=str, help='output directory')
    parser.add_argument('--max_len', default=1024, type=int, help='max length')
    return parser.parse_args()


if __name__ == '__main__':
    args = get_args()
    encoder = SentenceTransformer(args.model_name_or_path).half()
    encoder.encode = functools.partial(encoder.encode, normalize_embeddings=True)
    encoder.max_seq_length = int(args.max_len)

    task_names = [t.description["name"] for t in MTEB(task_types=args.task_type,
                                                      task_langs=['zh', 'zh-CN']).tasks]
    TASKS_WITH_PROMPTS = ["T2Retrieval", "MMarcoRetrieval", "DuRetrieval", "CovidRetrieval", "CmedqaRetrieval",
                          "EcomRetrieval", "MedicalRetrieval", "VideoRetrieval"]
    for task in task_names:
        evaluation = MTEB(tasks=[task], task_langs=['zh', 'zh-CN'])
        if task in TASKS_WITH_PROMPTS:
            evaluation.run(RetrievalModel(encoder), output_folder=args.output_dir, overwrite_results=False)
        else:
            evaluation.run(encoder, output_folder=args.output_dir, overwrite_results=False)


Usage

acge 中文系列模型

在sentence-transformer库中的使用方法

from sentence_transformers import SentenceTransformer

sentences = ["数据1", "数据2"]
model = SentenceTransformer('acge_text_embedding')
print(model.max_seq_length)
embeddings_1 = model.encode(sentences, normalize_embeddings=True)
embeddings_2 = model.encode(sentences, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)

在sentence-transformer库中的使用方法选取不同的维度

from sklearn.preprocessing import normalize
from sentence_transformers import SentenceTransformer

sentences = ["数据1", "数据2"]
model = SentenceTransformer('acge_text_embedding')
embeddings = model.encode(sentences, normalize_embeddings=False)
matryoshka_dim = 1024
embeddings = embeddings[..., :matryoshka_dim]  # Shrink the embedding dimensions
embeddings = normalize(embeddings, norm="l2", axis=1)
print(embeddings.shape)
# => (2, 1024)

Description
Model synced from source: aspire/acge_text_embedding
Readme 550 KiB
Languages
Text 100%