---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:4992
- loss:MultipleNegativesRankingLoss
base_model: google/embeddinggemma-300m
widget:
- source_sentence: Client onboarding, implementation, project management, communication,
Salesforce, G-suite, Asana, Single Sign-On (SSO), SFTP, data analysis
sentences:
- A software engineer uses Python and GitHub to automate testing processes.
- Setting up clients in Salesforce and G-suite efficiently requires strong project
management and clear communication.
- Choosing between cloud storage solutions like Dropbox and Google Drive can be
challenging.
- source_sentence: SQL, Excel, stakeholder management, product management
sentences:
- FastAPI and Flask both enable developers to build robust RESTful APIs efficiently.
- Data analysis using SQL and Excel for stakeholder updates in product management.
- Project scheduling and Gantt charts for timeline tracking.
- source_sentence: Power Platform,Robotic Process Automation,Power Automate Cloud
& Desktop,Automation Anywhere,PL900,SAP ECC,Generative AI,Power BI
sentences:
- SAP ECC and PL900 are essential for financial management systems.
- Automation Anywhere, Power Platform, and Power Automate Cloud & Desktop are key
tools for streamlining business processes.
- Guidewire uses a test automation framework to ensure continuous integration and
security testing.
- source_sentence: Critical Care,ICU
sentences:
- Guiding students through Java programming basics is crucial for their computer
engineering education.
- Intensive Care, ICU unit
- Pediatric Clinic, outpatient
- source_sentence: successfactors,algorithms,sap,data analysis,natural language processing,software
testing,neural networks,development methodologies
sentences:
- successfactors offers travel packages and vacation deals through its partnership
with various hotels.
- Azure Data Lake and Cosmos DB are key components of the Microsoft data ecosystem.
- successfactors uses advanced algorithms to enhance sap software testing and improve
data analysis accuracy.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on google/embeddinggemma-300m
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m)
- **Maximum Sequence Length:** 2048 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- csv
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(4): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("AROY76/Embedding-gemma-300M-skills")
# Run inference
queries = [
"successfactors,algorithms,sap,data analysis,natural language processing,software testing,neural networks,development methodologies",
]
documents = [
'successfactors uses advanced algorithms to enhance sap software testing and improve data analysis accuracy.',
'successfactors offers travel packages and vacation deals through its partnership with various hotels.',
'Azure Data Lake and Cosmos DB are key components of the Microsoft data ecosystem.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.7328, 0.0418, 0.0872]])
```
## Training Details
### Training Dataset
#### csv
* Dataset: csv
* Size: 4,992 training samples
* Columns: anchor, positive, and negative
* Approximate statistics based on the first 1000 samples:
| | anchor | positive | negative |
|:--------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
| type | string | string | string |
| details |
Statistical analysis, SQL, Scripting (Ruby, Python, etc.), Version control (git), Web design/UX, Monte Carlo simulations, RoR, Front-end JS, Growth hacking, Airflow, Pandas | Data analysis, databases, programming languages like Ruby or Python, software versioning, user interface design, probability modeling, Ruby on Rails, JavaScript for interfaces, customer growth strategies, workflow automation, data manipulation tools | Cloud storage, hardware configuration, network security, project management methodologies, graphic design software, database normalization techniques, agile development practices, server administration, marketing analytics, containerization technologies |
| Graphic Design, digital design, print design, web design, environmental/experiential design, interaction design, brand design, visual design, communication, user research, illustration, digital design systems | Visual design, graphic design, communication, user research, illustration, digital design systems, web design, brand design, interaction design, print design, environmental/experiential design | Project management, software development, networking, cybersecurity, database administration, IT infrastructure, agile methodologies, cloud computing, hardware engineering, quality assurance |
| problem solving, customer support, writing, grammar | improving writing skills to enhance clarity and grammar in customer support communications | designing website layouts for better user experience |
* Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 2
- `warmup_ratio`: 0.1
- `prompts`: task: sentence similarity | query:
#### All Hyperparameters