初始化项目,由ModelHub XC社区提供模型
Model: ik-ram28/BioMistral-CPT-SFT-7B Source: Original Platform
This commit is contained in:
84
README.md
Normal file
84
README.md
Normal file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
library_name: transformers
|
||||
tags:
|
||||
- medical
|
||||
license: apache-2.0
|
||||
language:
|
||||
- fr
|
||||
- en
|
||||
base_model:
|
||||
- ik-ram28/BioMistral-CPT-7B
|
||||
- BioMistral/BioMistral-7B
|
||||
---
|
||||
|
||||
## Model Description
|
||||
|
||||
BioMistral-CPT-SFT-7B is a French medical language model based on BioMistral-7B, adapted for French medical domain applications through a combined approach of Continual Pre-Training (CPT) followed by Supervised Fine-Tuning (SFT).
|
||||
|
||||
## Model Details
|
||||
|
||||
- **Model Type**: Causal Language Model
|
||||
- **Base Model**: BioMistral-7B
|
||||
- **Language**: French (adapted from English medical model)
|
||||
- **Domain**: Medical/Healthcare
|
||||
- **Parameters**: 7 billion
|
||||
- **License**: Apache 2.0
|
||||
- **Paper**: [Adaptation des connaissances médicales pour les grands modèles de langue : Stratégies et analyse comparative](https://github.com/ikram28/medllm-strategies)
|
||||
|
||||
## Training Details
|
||||
|
||||
### Continual Pre-Training (CPT)
|
||||
- **Dataset**: NACHOS corpus (opeN crAwled frenCh Healthcare cOrpuS)
|
||||
- **Size**: 7.4 GB of French medical texts
|
||||
- **Word Count**: Over 1 billion words
|
||||
- **Sources**: 24 French medical websites
|
||||
- **Training Duration**: 2.8 epochs
|
||||
- **Hardware**: 32 NVIDIA H100 80GB GPUs
|
||||
- **Training Time**: 11 hours
|
||||
- **Optimizer**: AdamW
|
||||
- **Learning Rate**: 2e-5
|
||||
- **Weight Decay**: 0.01
|
||||
- **Batch Size**: 16 with gradient accumulation of 2
|
||||
|
||||
### Supervised Fine-Tuning (SFT)
|
||||
- **Dataset**: 30K French medical question-answer pairs
|
||||
- 10K native French medical questions
|
||||
- 10K translated medical questions from English resources
|
||||
- 10K generated questions from French medical texts
|
||||
- **Method**: DoRA (Weight-Decomposed Low-Rank Adaptation)
|
||||
- **Training Duration**: 10 epochs
|
||||
- **Hardware**: 1 NVIDIA H100 80GB GPU
|
||||
- **Training Time**: 42 hours
|
||||
- **Rank**: 16
|
||||
- **Alpha**: 16
|
||||
- **Learning Rate**: 2e-5
|
||||
- **Batch Size**: 4
|
||||
|
||||
|
||||
|
||||
|
||||
## Computational Impact
|
||||
|
||||
- **Total Training Time**: 53 hours (11h CPT + 42h SFT)
|
||||
- **Hardware**: 32 GPU H100 + 1 GPU H100
|
||||
- **Carbon Emissions**: 10.11 kgCO2e (9.04 + 1.07)
|
||||
|
||||
|
||||
|
||||
## Ethical Considerations
|
||||
|
||||
- **Medical Accuracy**: This model is for research and educational purposes only. Performance limitations make it unsuitable for critical medical applications
|
||||
- **Bias**: May contain biases from both English and French medical literature
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this model, please cite:
|
||||
|
||||
```bibtex
|
||||
|
||||
```
|
||||
|
||||
## Contact
|
||||
|
||||
For questions about this model, please contact: ikram.belmadani@lis-lab.fr
|
||||
Reference in New Issue
Block a user