初始化项目,由ModelHub XC社区提供模型
Model: SamilPwC-AXNode-GenAI/PwC-Embedding_expr Source: Original Platform
This commit is contained in:
51
README.md
Normal file
51
README.md
Normal file
@@ -0,0 +1,51 @@
|
||||
---
|
||||
language:
|
||||
- ko
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- sentence-transformers
|
||||
- sentence-similarity
|
||||
- transformers
|
||||
---
|
||||
|
||||
## PwC-Embedding-expr
|
||||
|
||||
We trained the **PwC-Embedding-expr** model on top of the [multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) embedding model.
|
||||
To enhance performance in Korean, we applied our curated augmentation to STS datasets and fine-tuned the E5 model using a carefully balanced ratio across datasets.
|
||||
|
||||
> ⚠️ This is an experimental model and is under continuous development.
|
||||
|
||||
### To-do
|
||||
- [x] MTEB Leaderboard
|
||||
- [ ] Technical Report
|
||||
|
||||
|
||||
## MTEB
|
||||
PwC-Embedding_expr was evaluated on the Korean subset of MTEB.
|
||||
A leaderboard link will be added once it is published.
|
||||
|
||||
| Task | PwC-Embedding_expr |
|
||||
|------------------|--------------------|
|
||||
| KLUE-STS | 0.88 |
|
||||
| KLUE-TC | 0.73 |
|
||||
| Ko-StrategyQA | 0.80 |
|
||||
| KorSTS | 0.84 |
|
||||
| MIRACL-Reranking | 0.72 |
|
||||
| MIRACL-Retrieval | 0.65 |
|
||||
| **Average** | **0.77** |
|
||||
|
||||
|
||||
## Model
|
||||
- Base Model: [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct)
|
||||
- Model Size: 0.56B
|
||||
- Embedding Dimension: 1024
|
||||
- Max Input Tokens: 514
|
||||
|
||||
|
||||
## Requirements
|
||||
It works with the dependencies included in the latest version of MTEB.
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
TBD (technical report expected September 2025)
|
||||
Reference in New Issue
Block a user