MediLlama-3.2/README.md

---
library_name: transformers
tags:
- unsloth
- trl
- sft
license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.2-3B-Instruct
pipeline_tag: text-generation
metrics:
- accuracy
- bleu
- rouge
---

# Model Card for MediLlama-3.2

A fine-tuned version of Meta's LLaMA 3.2 (3B Instruct) for domain-specific applications in healthcare and medicine. This model is optimized for tasks such as medical Q&A, symptom checking, and patient education.

## Model Details

### Model Description

This model is a domain-adapted version of LLaMA 3.2 3B Instruct. It has been fine-tuned using supervised fine-tuning (SFT) on medical datasets to handle English-language healthcare scenarios including diagnostic queries, treatment suggestions, and general medical advice.

- **Developed by:** InferenceLab  
- **Model type:** Medical Chatbot  
- **Language(s) (NLP):** English  
- **License:** Apache 2.0  
- **Finetuned from model:** meta-llama/Llama-3.2-3B-Instruct  

## Uses

### Direct Use

MediLlama-3.2 can be used directly as a chatbot or virtual assistant in medical and health-related applications. Ideal for educational content, initial symptom triage, and research purposes.

### Downstream Use

Can be integrated into larger telehealth systems, clinical documentation tools, or diagnostic assistants after further task-specific fine-tuning.

### Out-of-Scope Use

- Should not be used for real-time diagnosis or treatment decisions without expert validation.  
- Not suitable for high-risk or life-threatening emergency response.  
- Not trained on pediatric or highly specialized medical domains.  

## Bias, Risks, and Limitations

While the model is trained on medical data, it may still exhibit:  
- Biases from source data  
- Hallucinations or incorrect suggestions  
- Outdated or non-region-specific medical advice  

### Recommendations

Users should validate outputs with certified medical professionals. This model is for research and prototyping only, not for clinical deployment without regulatory compliance.

## How to Get Started with the Model

```python
import torch
from transformers import pipeline

model_id = "InferenceLab/MediLlama-3.2"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful Medical assistant."},
    {"role": "user", "content": "Hi! How are you?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

````

## Training Details

### Training Data

Model trained using cleaned and preprocessed medical QA datasets, synthetic doctor-patient conversations, and publicly available health forums. Protected health information (PHI) was removed.

### Training Procedure

Supervised fine-tuning (SFT) using TRL and Unsloth libraries.

#### Preprocessing

Tokenization using LLaMA tokenizer with special medical instruction formatting.

#### Training Hyperparameters

* **Training regime:** bf16 mixed precision
* **Learning rate:** 1e-5

#### Speeds, Sizes, Times

* **Training time:** \~12 hours on 4×A100 GPUs

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

Subset of unseen medical QA pairs, synthetic test cases, and MedQA-derived examples.

#### Factors

* Input prompt complexity
* Use of medical terminology
* Chat length

#### Metrics

* **Accuracy:** 81.3%
* **BLEU:** 34.5
* **ROUGE-L:** 62.2

### Results

#### Summary

Model shows good generalization to unseen prompts and performs competitively for general medical dialogue. Further tuning needed for specialty areas like oncology or rare diseases.

## Model Examination

Explainability tools like LLaMA-MedLens (if available) are suggested to interpret model decisions.

## Environmental Impact

* **Hardware Type:** 4×NVIDIA A100 40GB
* **Hours used:** 12
* **Cloud Provider:** AWS
* **Compute Region:** us-west-2
* **Carbon Emitted:** \~35.8 kg CO2eq (estimated)

## Technical Specifications

### Model Architecture and Objective

* Based on Meta LLaMA 3.2 3B Instruct
* Decoder-only transformer
* Objective: Causal Language Modeling (CLM) with instruction fine-tuning

### Compute Infrastructure

#### Hardware

* 4×NVIDIA A100 40GB

#### Software

* Python 3.10
* Transformers (v4.40+)
* TRL
* Unsloth
* PyTorch 2.1


## Glossary

* **SFT**: Supervised Fine-Tuning
* **BLEU**: Bilingual Evaluation Understudy
* **ROUGE**: Recall-Oriented Understudy for Gisting Evaluation

## More Information

For collaborations, deployment help, or fine-tuning extensions, please contact the developers.

## Model Card Authors

* InferenceLab Team