--- license: apache-2.0 datasets: - OpenMed/Medical-Reasoning-SFT-MiniMax-M2.1 base_model: - meta-llama/Llama-3.2-1B-Instruct language: - en pipeline_tag: text-generation tags: - medical - clinical - reasoning - qlora - llama - healthcare - chain-of-thought --- # LlamaTron RS1 Nemesis 1B **Base Model:** meta-llama/Llama-3.2-1B-Instruct **Dataset:** OpenMed/Medical-Reasoning-SFT-MiniMax-M2.1 --- ## Model Overview LlamaTron RS1 Nemesis is a medical reasoning model produced by fine-tuning meta-llama/Llama-3.2-1B-Instruct on the Medical-Reasoning-SFT-MiniMax-M2.1 dataset using QLoRA. The dataset contains 204,773 clinical reasoning conversations with full chain-of-thought traces covering differential diagnosis, treatment planning, pharmacology, and clinical case analysis. Despite being a 1 billion parameter model, it handles complex clinical questions with structured and coherent reasoning. --- ## Demo Screenshots ## Info ![y3msQ](https://cdn-uploads.huggingface.co/production/uploads/66e00ba55e4fd4bfead4a97c/W2xJUdRLD2Y_3RIPTFndV.jpeg) ### Interface ![1](https://cdn-uploads.huggingface.co/production/uploads/66e00ba55e4fd4bfead4a97c/rrn4HJxbXS5wd8FUuHica.png) ### Model Response Example ![2](https://cdn-uploads.huggingface.co/production/uploads/66e00ba55e4fd4bfead4a97c/XQjsOB6hNsBpb01naguAg.png) --- ## Training Setup | Parameter | Value | |-----------|-------| | Base Model | meta-llama/Llama-3.2-1B-Instruct | | GPU | NVIDIA H200 | | Method | QLoRA (4-bit NF4 + LoRA) | | LoRA Rank | r=8, alpha=16 | | LoRA Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | LoRA Dropout | 0.05 | | Trainable Parameters | 5.6M out of 1.24B (0.45%) | | Effective Batch Size | 32 (8 per device x 4 gradient accumulation) | | Learning Rate | 2e-4 | | LR Scheduler | Cosine | | Warmup Ratio | 0.05 | | Optimizer | paged_adamw_8bit | | Max Sequence Length | 512 | | Precision | bf16 + tf32 | | Epochs | 1 | | Total Steps | 6,271 | | Training Time | 3 hours 59 minutes | --- ## Training Results | Step | Train Loss | Validation Loss | |------|------------|-----------------| | 500 | 1.5759 | 1.6126 | | 1000 | 1.5176 | 1.5538 | | 1500 | 1.4805 | 1.5256 | | 2000 | 1.4795 | 1.5060 | | 2500 | 1.4508 | 1.4939 | | 3000 | 1.4534 | 1.4815 | | 3500 | 1.4384 | 1.4739 | | 4000 | 1.4228 | 1.4663 | | 4500 | 1.4251 | 1.4605 | | 5000 | 1.4301 | 1.4567 | | 5500 | 1.4102 | 1.4545 | | 6000 | 1.4246 | 1.4538 | | 6271 | 1.4200 | 1.4500 | Loss decreased consistently across all steps with train and validation loss tracking closely. No overfitting observed. --- ## Dataset Trained on [Medical-Reasoning-SFT-MiniMax-M2.1](https://huggingface.co/datasets/OpenMed/Medical-Reasoning-SFT-MiniMax-M2.1) released by [Maziyar Panahi](https://huggingface.co/maziarpanahi) under the OpenMed initiative. | Property | Value | |----------|-------| | Total Samples | 204,773 | | Estimated Tokens | ~621 Million | | Format | Multi-turn chat with chain-of-thought reasoning | | License | Apache 2.0 | | Topics | Differential diagnosis, treatment planning, pharmacology, clinical case analysis | --- ## How to Use ### Load the Model ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model_id = "Rumiii/LlamaTron_RS1_Nemesis_1B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) messages = [ { "role": "system", "content": "You are LlamaTron RS1 Nemesis, a knowledgeable and compassionate medical AI assistant. Provide accurate, evidence-based medical information clearly and helpfully." }, { "role": "user", "content": "What are the early symptoms of Type 2 Diabetes?" }, ] output = pipe( messages, max_new_tokens=400, do_sample=True, temperature=0.7, top_p=0.9, ) print(output[0]["generated_text"][-1]["content"]) ``` --- ## Repository The full training code, merging scripts, and inference interface are available on GitHub: [github.com/sufirumii/LlamaTron-RS1-Nemesis-1B](https://github.com/sufirumii/LlamaTron-RS1-Nemesis-1B) ### GitHub ![Sample](https://cdn-uploads.huggingface.co/production/uploads/66e00ba55e4fd4bfead4a97c/1YP1OHz_S5fvzCwFTkWrm.png) --- ## Limitations - This model is intended for research and educational purposes only - It is not a substitute for professional medical advice, diagnosis, or treatment - The model was trained with a maximum sequence length of 512 tokens which may limit performance on longer clinical texts - Always consult a qualified healthcare provider for medical decisions --- ## Credits - **Dataset:** [Maziyar Panahi](https://huggingface.co/maziarpanahi) and the [OpenMed](https://huggingface.co/OpenMed) initiative for releasing the Medical-Reasoning-SFT-MiniMax-M2.1 dataset under Apache 2.0 - **Base Model:** Meta AI for releasing Llama-3.2-1B-Instruct - **Libraries:** Hugging Face Transformers, PEFT, TRL, BitsAndBytes, Accelerate --- ## License Apache 2.0 — see [LICENSE](LICENSE) for details.