初始化项目,由ModelHub XC社区提供模型

Model: Muennighoff/SGPT-1.3B-weightedmean-nli-bitfit
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-13 15:46:11 +08:00
commit 2463007aee
18 changed files with 150618 additions and 0 deletions

27
.gitattributes vendored Normal file
View File

@@ -0,0 +1,27 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

9
1_Pooling/config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"word_embedding_dimension": 2048,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": false,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false,
"pooling_mode_weightedmean_tokens": true,
"pooling_mode_lasttoken": false
}

72
README.md Normal file
View File

@@ -0,0 +1,72 @@
---
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
---
# SGPT-1.3B-weightedmean-nli-bitfit
## Usage
For usage instructions, refer to our codebase: https://github.com/Muennighoff/sgpt
## Evaluation Results
For eval results, refer to the eval folder or our paper: https://arxiv.org/abs/2202.08904
## Training
The model was trained with the parameters:
**DataLoader**:
`sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader` of length 93941 with parameters:
```
{'batch_size': 6}
```
**Loss**:
`sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss` with parameters:
```
{'scale': 20.0, 'similarity_fct': 'cos_sim'}
```
Parameters of the fit()-Method:
```
{
"epochs": 1,
"evaluation_steps": 9394,
"evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
"max_grad_norm": 1,
"optimizer_class": "<class 'transformers.optimization.AdamW'>",
"optimizer_params": {
"lr": 0.0001
},
"scheduler": "WarmupLinear",
"steps_per_epoch": null,
"warmup_steps": 9395,
"weight_decay": 0.01
}
```
## Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 75, 'do_lower_case': False}) with Transformer model: GPTNeoModel
(1): Pooling({'word_embedding_dimension': 2048, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': True, 'pooling_mode_lasttoken': False})
)
```
## Citing & Authors
```bibtex
@article{muennighoff2022sgpt,
title={SGPT: GPT Sentence Embeddings for Semantic Search},
author={Muennighoff, Niklas},
journal={arXiv preprint arXiv:2202.08904},
year={2022}
}
```

74
config.json Normal file
View File

@@ -0,0 +1,74 @@
{
"_name_or_path": "EleutherAI/gpt-neo-1.3B",
"activation_function": "gelu_new",
"architectures": [
"GPTNeoModel"
],
"attention_dropout": 0,
"attention_layers": [
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local"
],
"attention_types": [
[
[
"global",
"local"
],
12
]
],
"bos_token_id": 50256,
"embed_dropout": 0,
"eos_token_id": 50256,
"gradient_checkpointing": false,
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": null,
"layer_norm_epsilon": 1e-05,
"max_position_embeddings": 2048,
"model_type": "gpt_neo",
"num_heads": 16,
"num_layers": 24,
"resid_dropout": 0,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50,
"temperature": 0.9
}
},
"tokenizer_class": "GPT2Tokenizer",
"torch_dtype": "float32",
"transformers_version": "4.20.0.dev0",
"use_cache": true,
"vocab_size": 50257,
"window_size": 256
}

View File

@@ -0,0 +1,7 @@
{
"__version__": {
"sentence_transformers": "2.1.0",
"transformers": "4.20.0.dev0",
"pytorch": "1.10.2"
}
}

View File

@@ -0,0 +1,7 @@
{
"askubuntu": 55.97,
"cqadupstack": 13.47,
"twitterpara": 73.06,
"scidocs": 72.77,
"avg": 53.817499999999995
}

View File

@@ -0,0 +1,66 @@
{
"askubuntu": {
"map_askubuntu_title": 55.97,
"p@1_askubuntu_title": 52.69,
"p@5_askubuntu_title": 41.94,
"mrr_askubuntu_title": 68.08
},
"cqadupstack": {
"map@100_cqadupstack_unix": 11.88,
"ndcg@10_cqadupstack_unix": 13.74,
"map@100_cqadupstack_gaming": 29.28,
"ndcg@10_cqadupstack_gaming": 31.75,
"map@100_cqadupstack_wordpress": 4.34,
"ndcg@10_cqadupstack_wordpress": 5.12,
"map@100_cqadupstack_stats": 14.68,
"ndcg@10_cqadupstack_stats": 16.12,
"map@100_cqadupstack_tex": 8.04,
"ndcg@10_cqadupstack_tex": 8.43,
"map@100_cqadupstack_english": 14.07,
"ndcg@10_cqadupstack_english": 15.59,
"map@100_cqadupstack_programmers": 10.78,
"ndcg@10_cqadupstack_programmers": 11.23,
"map@100_cqadupstack_mathematica": 10.74,
"ndcg@10_cqadupstack_mathematica": 12.57,
"map@100_cqadupstack_physics": 16.04,
"ndcg@10_cqadupstack_physics": 17.58,
"map@100_cqadupstack_gis": 14.81,
"ndcg@10_cqadupstack_gis": 16.19,
"map@100_cqadupstack_webmasters": 9.84,
"ndcg@10_cqadupstack_webmasters": 10.36,
"map@100_cqadupstack_android": 17.19,
"ndcg@10_cqadupstack_android": 19.08,
"map@100_cqadupstack_avg": 13.47,
"ndcg@10_cqadupstack_avg": 14.81
},
"twitterpara": {
"ap_twitter_twitterurl": 75.43,
"spearman_twitter_twitterurl": 70.6,
"ap_twitter_pit": 70.69,
"spearman_twitter_pit": 55.71,
"ap_twitter_avg": 73.06,
"spearman_twitter_avg": 63.15
},
"scidocs": {
"map_scidocs_cite_euclidean": 70.1,
"ndcg_scidocs_cite_euclidean": 85.17,
"map_scidocs_cite_cosine": 70.1,
"ndcg_scidocs_cite_cosine": 85.17,
"map_scidocs_cocite_euclidean": 72.87,
"ndcg_scidocs_cocite_euclidean": 86.72,
"map_scidocs_cocite_cosine": 72.87,
"ndcg_scidocs_cocite_cosine": 86.72,
"map_scidocs_coview_euclidean": 74.95,
"ndcg_scidocs_coview_euclidean": 87.03,
"map_scidocs_coview_cosine": 74.95,
"ndcg_scidocs_coview_cosine": 87.03,
"map_scidocs_coread_euclidean": 73.15,
"ndcg_scidocs_coread_euclidean": 86.15,
"map_scidocs_coread_cosine": 73.15,
"ndcg_scidocs_coread_cosine": 86.15,
"map_scidocs_euclidean_avg": 72.77,
"ndcg_scidocs_euclidean_avg": 86.27,
"map_scidocs_cosine_avg": 72.77,
"ndcg_scidocs_cosine_avg": 86.27
}
}

1
eval/quora.json Normal file
View File

@@ -0,0 +1 @@
{"SGPT-1.3B-weightedmean-nli-bitfit": {"quora": {"NDCG@1": 0.7423, "NDCG@3": 0.78936, "NDCG@5": 0.80689, "NDCG@10": 0.8233, "NDCG@100": 0.84217, "NDCG@1000": 0.84504}}}

View File

@@ -0,0 +1,12 @@
epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
0,440,0.8486729726641167,0.8514738189278961,0.8515245319252214,0.8509027260070884,0.8540666757323956,0.8537290162137693,0.7630773568229123,0.7600358216618573
0,880,0.860418177580361,0.8656263802961531,0.859274657391518,0.8610670106016408,0.8614574446998846,0.8635698806563004,0.775971737698288,0.7738330802938131
0,1320,0.8646586808912774,0.8708047895637789,0.8600953037639533,0.8630166525165047,0.8623501857502746,0.8655507432332792,0.774750540444686,0.7725645527279972
0,1760,0.8628892948335536,0.8689022188557769,0.8611001993825963,0.8627626879284295,0.8634677606209161,0.8653777840391851,0.7684411984607661,0.7660388115697093
0,2200,0.8623264709023419,0.8684038857716583,0.8599140937391133,0.8623163529776595,0.8621011652259446,0.8648639001116789,0.7658429961100489,0.7630158219390278
0,2640,0.8633934631804789,0.8700248299507874,0.8583066084846345,0.8610312177479946,0.8604516944303623,0.8632651971720137,0.7708747899267866,0.7695161449303083
0,3080,0.864875512993908,0.8708072415227665,0.8570051556310841,0.8598815222390387,0.8592210627649711,0.8622960844035745,0.7675942152106912,0.765566539001796
0,3520,0.8664358867199037,0.8717878560785026,0.8577414799031283,0.8608674339514554,0.8600366294063135,0.8633069107239323,0.7678182184536243,0.7655516315181986
0,3960,0.8660615852729263,0.8715945872618516,0.8580011946328364,0.8608961297007961,0.8603219261222281,0.8635694785207915,0.7658642808961628,0.7634980873963996
0,4400,0.8660032674381255,0.8715157046451364,0.8576564771305891,0.8606430352200829,0.8599938864592154,0.8633518022139872,0.7663838558727445,0.7645750276413869
0,-1,0.865995009654422,0.8715109608696208,0.857644450885013,0.8606063092160902,0.8599858692389015,0.8633254320890273,0.7663788803033962,0.7645777465044731
1 epoch steps cosine_pearson cosine_spearman euclidean_pearson euclidean_spearman manhattan_pearson manhattan_spearman dot_pearson dot_spearman
2 0 440 0.8486729726641167 0.8514738189278961 0.8515245319252214 0.8509027260070884 0.8540666757323956 0.8537290162137693 0.7630773568229123 0.7600358216618573
3 0 880 0.860418177580361 0.8656263802961531 0.859274657391518 0.8610670106016408 0.8614574446998846 0.8635698806563004 0.775971737698288 0.7738330802938131
4 0 1320 0.8646586808912774 0.8708047895637789 0.8600953037639533 0.8630166525165047 0.8623501857502746 0.8655507432332792 0.774750540444686 0.7725645527279972
5 0 1760 0.8628892948335536 0.8689022188557769 0.8611001993825963 0.8627626879284295 0.8634677606209161 0.8653777840391851 0.7684411984607661 0.7660388115697093
6 0 2200 0.8623264709023419 0.8684038857716583 0.8599140937391133 0.8623163529776595 0.8621011652259446 0.8648639001116789 0.7658429961100489 0.7630158219390278
7 0 2640 0.8633934631804789 0.8700248299507874 0.8583066084846345 0.8610312177479946 0.8604516944303623 0.8632651971720137 0.7708747899267866 0.7695161449303083
8 0 3080 0.864875512993908 0.8708072415227665 0.8570051556310841 0.8598815222390387 0.8592210627649711 0.8622960844035745 0.7675942152106912 0.765566539001796
9 0 3520 0.8664358867199037 0.8717878560785026 0.8577414799031283 0.8608674339514554 0.8600366294063135 0.8633069107239323 0.7678182184536243 0.7655516315181986
10 0 3960 0.8660615852729263 0.8715945872618516 0.8580011946328364 0.8608961297007961 0.8603219261222281 0.8635694785207915 0.7658642808961628 0.7634980873963996
11 0 4400 0.8660032674381255 0.8715157046451364 0.8576564771305891 0.8606430352200829 0.8599938864592154 0.8633518022139872 0.7663838558727445 0.7645750276413869
12 0 -1 0.865995009654422 0.8715109608696208 0.857644450885013 0.8606063092160902 0.8599858692389015 0.8633254320890273 0.7663788803033962 0.7645777465044731

View File

@@ -0,0 +1,2 @@
epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
-1,-1,0.8329852176094534,0.8386309954374512,0.8291196910761947,0.828296436242254,0.8302104318397378,0.8293978465982256,0.7205795699601987,0.7008266718943091
1 epoch steps cosine_pearson cosine_spearman euclidean_pearson euclidean_spearman manhattan_pearson manhattan_spearman dot_pearson dot_spearman
2 -1 -1 0.8329852176094534 0.8386309954374512 0.8291196910761947 0.828296436242254 0.8302104318397378 0.8293978465982256 0.7205795699601987 0.7008266718943091

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

14
modules.json Normal file
View File

@@ -0,0 +1,14 @@
[
{
"idx": 0,
"name": "0",
"path": "",
"type": "sentence_transformers.models.Transformer"
},
{
"idx": 1,
"name": "1",
"path": "1_Pooling",
"type": "sentence_transformers.models.Pooling"
}
]

3
pytorch_model.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9e05f11a38a0abb8e97717b109e2d346a74cf1244e419ac5659416d9874487c8
size 5363081601

View File

@@ -0,0 +1,4 @@
{
"max_seq_length": 75,
"do_lower_case": false
}

1
special_tokens_map.json Normal file
View File

@@ -0,0 +1 @@
{"bos_token": "<|endoftext|>", "eos_token": "<|endoftext|>", "unk_token": "<|endoftext|>", "pad_token": "<|endoftext|>"}

100316
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

1
tokenizer_config.json Normal file
View File

@@ -0,0 +1 @@
{"unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "model_max_length": 2048, "special_tokens_map_file": null, "name_or_path": "EleutherAI/gpt-neo-1.3B", "errors": "replace", "pad_token": null, "add_bos_token": false, "tokenizer_class": "GPT2Tokenizer"}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long