IMCatalina-v1.0/README.md

---
language:
- en
base_model:
- microsoft/phi-4
pipeline_tag: text-generation
library_name: transformers
tags:
- phi
- fine-tuned
- full-finetune
- instruction-tuning
- text-generation
- recruitment
- resume-parsing
- job-description-generation
---

# IMCatalina-v1.0

## Model summary
**IMCatalina-v1.0** is a **fully fine-tuned** version of **Phi-4** specialized in **recruitment document processing**.

The model focuses exclusively on:
- Parsing unstructured CVs/resumes
- Converting CV content into structured formats (JSON / YAML)
- Generating professional job descriptions from structured inputs

This model was trained end-to-end (full fine-tuning) and **does not perform candidate scoring, ranking, or hiring decisions**.

---

## Intended use

### Primary use cases
- CV and resume parsing
- Structured CV normalization (JSON / YAML)
- Extraction of skills, roles, education, and experience
- Job description generation for recruitment platforms
- Preprocessing for ATS and HR systems

### Explicitly out-of-scope
- Candidate ranking or scoring
- Hiring recommendations
- Candidate–job matching
- Automated decision-making
- Psychological or behavioral inference

---

## Model details
- **Base model:** microsoft/phi-4
- **Model type:** Decoder-only causal language model
- **Architecture:** Transformer (Phi family)
- **Parameters:** ~14B
- **Context length:** up to 16k tokens
- **Languages:** English
- **Training type:** Full fine-tuning

---

## Training

### Training data
- **Domain:** Recruitment and HR documentation
- **Data type:** Synthetic and curated structured data
- **Formats:**
  - Instruction–response
  - Schema-constrained generation
- **Content includes:**
  - CVs and resumes
  - Job descriptions
  - Skills, roles, education, and experience fields
- **Data processing:**
  - Deduplication
  - Schema validation
  - Removal of malformed samples
  - Consistency and format checks

> No real personal data was intentionally included in the training datasets.