PwC-Embedding_expr

SamilPwC-AXNode-GenAI/PwC-Embedding_expr

Go to file

ModelHub XC 6f9cde7915 初始化项目，由ModelHub XC社区提供模型

Model: SamilPwC-AXNode-GenAI/PwC-Embedding_expr
Source: Original Platform

2026-05-14 18:02:42 +08:00

1_Pooling

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

config_sentence_transformers.json

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

modules.json

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

sentence_bert_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-14 18:02:42 +08:00

README.md

language, license, tags

language

license

PwC-Embedding-expr

We trained the PwC-Embedding-expr model on top of the multilingual-e5-large-instruct embedding model.
To enhance performance in Korean, we applied our curated augmentation to STS datasets and fine-tuned the E5 model using a carefully balanced ratio across datasets.

⚠️ This is an experimental model and is under continuous development.

To-do

MTEB Leaderboard
Technical Report

MTEB

PwC-Embedding_expr was evaluated on the Korean subset of MTEB.
A leaderboard link will be added once it is published.

Task	PwC-Embedding_expr
KLUE-STS	0.88
KLUE-TC	0.73
Ko-StrategyQA	0.80
KorSTS	0.84
MIRACL-Reranking	0.72
MIRACL-Retrieval	0.65
Average	0.77

Model

Base Model: intfloat/multilingual-e5-large-instruct
Model Size: 0.56B
Embedding Dimension: 1024
Max Input Tokens: 514

Requirements

It works with the dependencies included in the latest version of MTEB.

Citation

TBD (technical report expected September 2025)