Llama-3-8B-Hernia-Analyst-6…/README.md

---
license: apache-2.0
language: en
library_name: transformers
tags:
- text-generation
- json
- qol
- hernia
- healthcare
- llama-3
- fine-tuned
- 8k-context
base_model: meta-llama/Meta-Llama-3-8B-Instruct
---

# Llama-3-8B-Hernia-Analyst-600-Patients-8k

This is a specialized, fine-tuned version of `meta-llama/Meta-Llama-3-8B-Instruct`, designed to function as an expert "AI Research Assistant" for analyzing patient narratives related to Abdominal Wall Hernia (AWH).

This model represents a significant upgrade over previous versions, as it was fine-tuned on a larger dataset of **600 synthetic patients** and trained using the **full 8192 token context window**. This enables it to analyze longer, more complex patient narratives without truncation, resulting in a more accurate and comprehensive analysis.

The model's primary function is to take unstructured, free-text patient stories as input and transform them into a structured, multi-level JSON output. This output adheres to a specific Quality of Life (QoL) framework derived from clinical research, notably the work published in *Hernia (2022) 26:795–808*.

## Model Description

The core objective of this model is to automate and standardize the process of qualitative analysis for patient-reported outcomes. It has been trained to identify and structure information across five key domains:
- Body Image
- Mental Health
- Symptoms and Function
- Interpersonal Relationships
- Employment

The model produces a detailed JSON object that includes an executive summary, a ranked list of the most prominent QoL domains, and a deep-dive analysis for each domain, identifying relevant subthemes and clinical concepts mentioned by the patient.

## Intended Use

This model is intended for **research and prototyping purposes only**. Its primary use case is to process long-form patient narratives (e.g., from detailed interview transcripts or comprehensive questionnaires) and generate a structured, machine-readable analysis. This can be used for large-scale research, data visualization, or to assist clinicians in rapidly understanding the key QoL issues for a patient.

**Disclaimer: This is not a medical device.** The output should not be used for clinical diagnosis, treatment decisions, or any direct patient care without verification and interpretation by a qualified healthcare professional.

## How to Use

The model expects prompts formatted in the Llama 3 Instruct template. The following Python code demonstrates how to load the model and run inference on a new patient narrative, making it a powerful tool for offline analysis.

```python
# This installs specific, stable versions of the libraries known to work well together
# in the Colab environment.

!pip uninstall -y sentence-transformers
!pip install torch==2.3.1+cu121 torchvision==0.18.1+cu121 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
!pip install -q "transformers==4.43.2" "datasets==2.18.0" "accelerate==0.29.3" "peft==0.10.0" "bitsandbytes==0.43.1" "trl==0.8.6" "protobuf==3.20.3"
!pip install -q einops scipy sentencepiece tensorboard

# # After installation, we need to restart the runtime one time for the changes to take effect.
# # This is a standard procedure in Colab.
import os
os.kill(os.getpid(), 9)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import json

# --- 1. CONFIGURATION ---
# The unique ID of your powerful, 8k-context model on the Hugging Face Hub
model_name = "Laxmikant17/Llama-3-8B-Hernia-Analyst-600-Patients-8k"

# --- 2. LOAD MODEL AND TOKENIZER ---
print(f"Loading fine-tuned model: {model_name}")

# Use 4-bit quantization for efficient inference on consumer GPUs (like in Colab)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load the model from the Hub with quantization
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto" # Automatically use the GPU if available
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model.eval() # Set the model to evaluation mode

print("✅ Model loaded successfully!")

# --- 3. PREPARE YOUR INPUT ---
# This can be a very long and detailed patient narrative
test_narrative = """
The pain is the worst part. It's a constant, burning sensation that gets worse when I stand for more than ten minutes. I can't even lift my grocery bags without feeling a sharp pull. I also feel deformed. I avoid looking at myself without a shirt on. I just want to feel normal again. It's been really tough mentally. I feel a sense of dread every morning when I wake up, just knowing the discomfort is waiting for me. I've become irritable and I'm not pleasant to be around, which is straining my relationship with my family.
"""

# Format the input using the exact Llama 3 Instruct template the model was trained on
instruction = "Your sole function is to be a structured data generator. Analyze the patient narrative and produce a single, valid JSON object as your only output. Adhere strictly to the required format and terminology from the provided knowledge base."
prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{instruction}\n\n**Patient Narrative:**\n{test_narrative}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# --- 4. GENERATE ANALYSIS ---
print("\n🚀 Generating analysis...")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=4096, # Give the model plenty of space for its JSON output
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

# Robustly extract and print the JSON from the model's response
decoded_output = tokenizer.decode(outputs, skip_special_tokens=True)
try:
    assistant_marker = 'assistant\n\n'
    assistant_response_start = decoded_output.find(assistant_marker)
    response_part = decoded_output[assistant_response_start + len(assistant_marker):].strip()
    json_start = response_part.find('{')
    json_end = response_part.rfind('}') + 1
    json_string = response_part[json_start:json_end]

    print("\n--- ✅ MODEL-GENERATED ANALYSIS ---")
    parsed_json = json.loads(json_string)
    print(json.dumps(parsed_json, indent=2))
except Exception as e:
    print(f"\n--- 🚨 ERROR: Could not parse the model's response. ---")
    print(f"Error: {e}")
    print("\nFull raw output for debugging:")
    print(decoded_output)
```