--- license: apache-2.0 language: en library_name: transformers tags: - text-generation - json - qol - hernia - healthcare - llama-3 - fine-tuned - 8k-context base_model: meta-llama/Meta-Llama-3-8B-Instruct --- # Llama-3-8B-Hernia-Analyst-600-Patients-8k This is a specialized, fine-tuned version of `meta-llama/Meta-Llama-3-8B-Instruct`, designed to function as an expert "AI Research Assistant" for analyzing patient narratives related to Abdominal Wall Hernia (AWH). This model represents a significant upgrade over previous versions, as it was fine-tuned on a larger dataset of **600 synthetic patients** and trained using the **full 8192 token context window**. This enables it to analyze longer, more complex patient narratives without truncation, resulting in a more accurate and comprehensive analysis. The model's primary function is to take unstructured, free-text patient stories as input and transform them into a structured, multi-level JSON output. This output adheres to a specific Quality of Life (QoL) framework derived from clinical research, notably the work published in *Hernia (2022) 26:795–808*. ## Model Description The core objective of this model is to automate and standardize the process of qualitative analysis for patient-reported outcomes. It has been trained to identify and structure information across five key domains: - Body Image - Mental Health - Symptoms and Function - Interpersonal Relationships - Employment The model produces a detailed JSON object that includes an executive summary, a ranked list of the most prominent QoL domains, and a deep-dive analysis for each domain, identifying relevant subthemes and clinical concepts mentioned by the patient. ## Intended Use This model is intended for **research and prototyping purposes only**. Its primary use case is to process long-form patient narratives (e.g., from detailed interview transcripts or comprehensive questionnaires) and generate a structured, machine-readable analysis. This can be used for large-scale research, data visualization, or to assist clinicians in rapidly understanding the key QoL issues for a patient. **Disclaimer: This is not a medical device.** The output should not be used for clinical diagnosis, treatment decisions, or any direct patient care without verification and interpretation by a qualified healthcare professional. ## How to Use The model expects prompts formatted in the Llama 3 Instruct template. The following Python code demonstrates how to load the model and run inference on a new patient narrative, making it a powerful tool for offline analysis. ```python # This installs specific, stable versions of the libraries known to work well together # in the Colab environment. !pip uninstall -y sentence-transformers !pip install torch==2.3.1+cu121 torchvision==0.18.1+cu121 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121 !pip install -q "transformers==4.43.2" "datasets==2.18.0" "accelerate==0.29.3" "peft==0.10.0" "bitsandbytes==0.43.1" "trl==0.8.6" "protobuf==3.20.3" !pip install -q einops scipy sentencepiece tensorboard # # After installation, we need to restart the runtime one time for the changes to take effect. # # This is a standard procedure in Colab. import os os.kill(os.getpid(), 9) import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig import json # --- 1. CONFIGURATION --- # The unique ID of your powerful, 8k-context model on the Hugging Face Hub model_name = "Laxmikant17/Llama-3-8B-Hernia-Analyst-600-Patients-8k" # --- 2. LOAD MODEL AND TOKENIZER --- print(f"Loading fine-tuned model: {model_name}") # Use 4-bit quantization for efficient inference on consumer GPUs (like in Colab) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) # Load the model from the Hub with quantization model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map="auto" # Automatically use the GPU if available ) tokenizer = AutoTokenizer.from_pretrained(model_name) model.eval() # Set the model to evaluation mode print("āœ… Model loaded successfully!") # --- 3. PREPARE YOUR INPUT --- # This can be a very long and detailed patient narrative test_narrative = """ The pain is the worst part. It's a constant, burning sensation that gets worse when I stand for more than ten minutes. I can't even lift my grocery bags without feeling a sharp pull. I also feel deformed. I avoid looking at myself without a shirt on. I just want to feel normal again. It's been really tough mentally. I feel a sense of dread every morning when I wake up, just knowing the discomfort is waiting for me. I've become irritable and I'm not pleasant to be around, which is straining my relationship with my family. """ # Format the input using the exact Llama 3 Instruct template the model was trained on instruction = "Your sole function is to be a structured data generator. Analyze the patient narrative and produce a single, valid JSON object as your only output. Adhere strictly to the required format and terminology from the provided knowledge base." prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{instruction}\n\n**Patient Narrative:**\n{test_narrative}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # --- 4. GENERATE ANALYSIS --- print("\nšŸš€ Generating analysis...") with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=4096, # Give the model plenty of space for its JSON output do_sample=False, pad_token_id=tokenizer.eos_token_id ) # Robustly extract and print the JSON from the model's response decoded_output = tokenizer.decode(outputs, skip_special_tokens=True) try: assistant_marker = 'assistant\n\n' assistant_response_start = decoded_output.find(assistant_marker) response_part = decoded_output[assistant_response_start + len(assistant_marker):].strip() json_start = response_part.find('{') json_end = response_part.rfind('}') + 1 json_string = response_part[json_start:json_end] print("\n--- āœ… MODEL-GENERATED ANALYSIS ---") parsed_json = json.loads(json_string) print(json.dumps(parsed_json, indent=2)) except Exception as e: print(f"\n--- 🚨 ERROR: Could not parse the model's response. ---") print(f"Error: {e}") print("\nFull raw output for debugging:") print(decoded_output) ```