84 lines
2.3 KiB
Markdown
84 lines
2.3 KiB
Markdown
---
|
|
library_name: transformers
|
|
tags:
|
|
- medical
|
|
license: apache-2.0
|
|
language:
|
|
- fr
|
|
- en
|
|
base_model:
|
|
- ik-ram28/BioMistral-CPT-7B
|
|
- BioMistral/BioMistral-7B
|
|
---
|
|
|
|
## Model Description
|
|
|
|
BioMistral-CPT-SFT-7B is a French medical language model based on BioMistral-7B, adapted for French medical domain applications through a combined approach of Continual Pre-Training (CPT) followed by Supervised Fine-Tuning (SFT).
|
|
|
|
## Model Details
|
|
|
|
- **Model Type**: Causal Language Model
|
|
- **Base Model**: BioMistral-7B
|
|
- **Language**: French (adapted from English medical model)
|
|
- **Domain**: Medical/Healthcare
|
|
- **Parameters**: 7 billion
|
|
- **License**: Apache 2.0
|
|
- **Paper**: [Adaptation des connaissances médicales pour les grands modèles de langue : Stratégies et analyse comparative](https://github.com/ikram28/medllm-strategies)
|
|
|
|
## Training Details
|
|
|
|
### Continual Pre-Training (CPT)
|
|
- **Dataset**: NACHOS corpus (opeN crAwled frenCh Healthcare cOrpuS)
|
|
- **Size**: 7.4 GB of French medical texts
|
|
- **Word Count**: Over 1 billion words
|
|
- **Sources**: 24 French medical websites
|
|
- **Training Duration**: 2.8 epochs
|
|
- **Hardware**: 32 NVIDIA H100 80GB GPUs
|
|
- **Training Time**: 11 hours
|
|
- **Optimizer**: AdamW
|
|
- **Learning Rate**: 2e-5
|
|
- **Weight Decay**: 0.01
|
|
- **Batch Size**: 16 with gradient accumulation of 2
|
|
|
|
### Supervised Fine-Tuning (SFT)
|
|
- **Dataset**: 30K French medical question-answer pairs
|
|
- 10K native French medical questions
|
|
- 10K translated medical questions from English resources
|
|
- 10K generated questions from French medical texts
|
|
- **Method**: DoRA (Weight-Decomposed Low-Rank Adaptation)
|
|
- **Training Duration**: 10 epochs
|
|
- **Hardware**: 1 NVIDIA H100 80GB GPU
|
|
- **Training Time**: 42 hours
|
|
- **Rank**: 16
|
|
- **Alpha**: 16
|
|
- **Learning Rate**: 2e-5
|
|
- **Batch Size**: 4
|
|
|
|
|
|
|
|
|
|
## Computational Impact
|
|
|
|
- **Total Training Time**: 53 hours (11h CPT + 42h SFT)
|
|
- **Hardware**: 32 GPU H100 + 1 GPU H100
|
|
- **Carbon Emissions**: 10.11 kgCO2e (9.04 + 1.07)
|
|
|
|
|
|
|
|
## Ethical Considerations
|
|
|
|
- **Medical Accuracy**: This model is for research and educational purposes only. Performance limitations make it unsuitable for critical medical applications
|
|
- **Bias**: May contain biases from both English and French medical literature
|
|
|
|
|
|
## Citation
|
|
|
|
If you use this model, please cite:
|
|
|
|
```bibtex
|
|
|
|
```
|
|
|
|
## Contact
|
|
|
|
For questions about this model, please contact: ikram.belmadani@lis-lab.fr |