初始化项目,由ModelHub XC社区提供模型
Model: dwulff/mpnet-personality Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
10
1_Pooling/config.json
Normal file
10
1_Pooling/config.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"word_embedding_dimension": 768,
|
||||
"pooling_mode_cls_token": false,
|
||||
"pooling_mode_mean_tokens": true,
|
||||
"pooling_mode_max_tokens": false,
|
||||
"pooling_mode_mean_sqrt_len_tokens": false,
|
||||
"pooling_mode_weightedmean_tokens": false,
|
||||
"pooling_mode_lasttoken": false,
|
||||
"include_prompt": true
|
||||
}
|
||||
112
README.md
Normal file
112
README.md
Normal file
@@ -0,0 +1,112 @@
|
||||
---
|
||||
library_name: sentence-transformers
|
||||
pipeline_tag: sentence-similarity
|
||||
tags:
|
||||
- sentence-transformers
|
||||
- feature-extraction
|
||||
- sentence-similarity
|
||||
license: cc-by-sa-4.0
|
||||
language:
|
||||
- en
|
||||
base_model:
|
||||
- sentence-transformers/all-mpnet-base-v2
|
||||
---
|
||||
|
||||
# dwulff/mpnet-personality
|
||||
|
||||
This is a [sentence-transformers](https://www.SBERT.net) model that maps personality-related items or texts into a 768-dimensional dense vector space and can be used for many tasks in personality psychology, such as clustering personality items and scales, mapping personality scales to personality constructs, and others.
|
||||
|
||||
The model has been generated by fine-tuning [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) using unsigned empirical correlations of 200k pairs of personality items. The model, therefore, encodes the content of personality-related texts independent of the direction (e.g., negation).
|
||||
|
||||
See [Wulff & Mata (2025)](https://doi.org/10.1038/s41562-024-02089-y) (see [Supplement](https://static-content.springer.com/esm/art%3A10.1038%2Fs41562-024-02089-y/MediaObjects/41562_2024_2089_MOESM1_ESM.pdf)) for details.
|
||||
|
||||
## Usage
|
||||
|
||||
Make sure [sentence-transformers](https://www.SBERT.net) is installed:
|
||||
|
||||
```
|
||||
# latest version
|
||||
pip install -U sentence-transformers
|
||||
|
||||
# latest dev version
|
||||
pip install git+https://github.com/UKPLab/sentence-transformers.git
|
||||
```
|
||||
|
||||
You can extract embeddings in the following way:
|
||||
|
||||
```python
|
||||
from sentence_transformers import SentenceTransformer
|
||||
|
||||
# personality sentences
|
||||
sentences = ["Rarely think about how I feel.", "Make decisions quickly."]
|
||||
|
||||
# load model
|
||||
model = SentenceTransformer('dwulff/mpnet-personality')
|
||||
|
||||
# extract embeddings
|
||||
embeddings = model.encode(sentences)
|
||||
print(embeddings)
|
||||
```
|
||||
|
||||
## Evaluation Results
|
||||
|
||||
The model has been evaluated on public personality data. For standard personality inventories, such as the BIG5 or HEXACO inventories, the model predicts the empirical correlations between personality items at Pearson r ~ .6 and empirical correlations between scales at Pearson r ~ .7.
|
||||
|
||||
Performance can be higher on the many common personality items it has been trained on due to memorization (r ~ .9). Performance will be worse for more specialized personality assessments and texts beyond personality items, as well as for personality factors due to the reduced variance in correlations.
|
||||
|
||||
See [Wulff & Mata (2025)](https://doi.org/10.1038/s41562-024-02089-y) (see [Supplement](https://static-content.springer.com/esm/art%3A10.1038%2Fs41562-024-02089-y/MediaObjects/41562_2024_2089_MOESM1_ESM.pdf)) for details.
|
||||
|
||||
## Citing
|
||||
|
||||
|
||||
```
|
||||
@article{wulff2024taxonomic,
|
||||
author = {Wulff, Dirk U. and Mata, Rui},
|
||||
title = {Semantic embeddings reveal and address taxonomic incommensurability in psychological measurement},
|
||||
journal = {Nature Human Behavior},
|
||||
doi = {https://doi.org/10.1038/s41562-024-02089-y}
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
## Training
|
||||
The model was trained with the parameters:
|
||||
|
||||
**DataLoader**:
|
||||
|
||||
`torch.utils.data.dataloader.DataLoader` of length 3125 with parameters:
|
||||
```
|
||||
{'batch_size': 64, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
|
||||
```
|
||||
|
||||
**Loss**:
|
||||
|
||||
`sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
|
||||
|
||||
Parameters of the fit()-Method:
|
||||
```
|
||||
{
|
||||
"epochs": 3,
|
||||
"evaluation_steps": 0,
|
||||
"evaluator": "NoneType",
|
||||
"max_grad_norm": 1,
|
||||
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
|
||||
"optimizer_params": {
|
||||
"lr": 2e-05
|
||||
},
|
||||
"scheduler": "WarmupLinear",
|
||||
"steps_per_epoch": null,
|
||||
"warmup_steps": 625,
|
||||
"weight_decay": 0.01
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Full Model Architecture
|
||||
```
|
||||
SentenceTransformer(
|
||||
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
|
||||
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
||||
(2): Normalize()
|
||||
)
|
||||
```
|
||||
24
config.json
Normal file
24
config.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"_name_or_path": "sentence-transformers/all-mpnet-base-v2",
|
||||
"architectures": [
|
||||
"MPNetModel"
|
||||
],
|
||||
"attention_probs_dropout_prob": 0.1,
|
||||
"bos_token_id": 0,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "gelu",
|
||||
"hidden_dropout_prob": 0.1,
|
||||
"hidden_size": 768,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 3072,
|
||||
"layer_norm_eps": 1e-05,
|
||||
"max_position_embeddings": 514,
|
||||
"model_type": "mpnet",
|
||||
"num_attention_heads": 12,
|
||||
"num_hidden_layers": 12,
|
||||
"pad_token_id": 1,
|
||||
"relative_attention_num_buckets": 32,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.37.2",
|
||||
"vocab_size": 30527
|
||||
}
|
||||
9
config_sentence_transformers.json
Normal file
9
config_sentence_transformers.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"__version__": {
|
||||
"sentence_transformers": "2.0.0",
|
||||
"transformers": "4.6.1",
|
||||
"pytorch": "1.8.1"
|
||||
},
|
||||
"prompts": {},
|
||||
"default_prompt_name": null
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:45aaed527d0bb5b9e69e230872e808d2e44f2e648e7f7eca9389845a2851353c
|
||||
size 437967672
|
||||
20
modules.json
Normal file
20
modules.json
Normal file
@@ -0,0 +1,20 @@
|
||||
[
|
||||
{
|
||||
"idx": 0,
|
||||
"name": "0",
|
||||
"path": "",
|
||||
"type": "sentence_transformers.models.Transformer"
|
||||
},
|
||||
{
|
||||
"idx": 1,
|
||||
"name": "1",
|
||||
"path": "1_Pooling",
|
||||
"type": "sentence_transformers.models.Pooling"
|
||||
},
|
||||
{
|
||||
"idx": 2,
|
||||
"name": "2",
|
||||
"path": "2_Normalize",
|
||||
"type": "sentence_transformers.models.Normalize"
|
||||
}
|
||||
]
|
||||
4
sentence_bert_config.json
Normal file
4
sentence_bert_config.json
Normal file
@@ -0,0 +1,4 @@
|
||||
{
|
||||
"max_seq_length": 384,
|
||||
"do_lower_case": false
|
||||
}
|
||||
51
special_tokens_map.json
Normal file
51
special_tokens_map.json
Normal file
@@ -0,0 +1,51 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"cls_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"mask_token": {
|
||||
"content": "<mask>",
|
||||
"lstrip": true,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<pad>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"sep_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "[UNK]",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
30636
tokenizer.json
Normal file
30636
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
72
tokenizer_config.json
Normal file
72
tokenizer_config.json
Normal file
@@ -0,0 +1,72 @@
|
||||
{
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<pad>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"3": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"104": {
|
||||
"content": "[UNK]",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"30526": {
|
||||
"content": "<mask>",
|
||||
"lstrip": true,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"bos_token": "<s>",
|
||||
"clean_up_tokenization_spaces": true,
|
||||
"cls_token": "<s>",
|
||||
"do_lower_case": true,
|
||||
"eos_token": "</s>",
|
||||
"mask_token": "<mask>",
|
||||
"max_length": 128,
|
||||
"model_max_length": 512,
|
||||
"pad_to_multiple_of": null,
|
||||
"pad_token": "<pad>",
|
||||
"pad_token_type_id": 0,
|
||||
"padding_side": "right",
|
||||
"sep_token": "</s>",
|
||||
"stride": 0,
|
||||
"strip_accents": null,
|
||||
"tokenize_chinese_chars": true,
|
||||
"tokenizer_class": "MPNetTokenizer",
|
||||
"truncation_side": "right",
|
||||
"truncation_strategy": "longest_first",
|
||||
"unk_token": "[UNK]"
|
||||
}
|
||||
Reference in New Issue
Block a user