--- license: cc-by-4.0 language: - en library_name: transformers pipeline_tag: text-generation base_model: Qwen/Qwen3-0.6B base_model_relation: finetune datasets: - dhivyeshrk/diseases-and-symptoms-dataset - niyarrbarman/symptom2disease tags: - medical - clinical-screening - symptom-to-disease - disease-classification - lora - qlora - gguf - unsloth - peft - transformers - bertscore --- # Qwen3-0.6B Clinical Screening This repository packages the artifacts generated by the training notebook [`screening_robot.ipynb`](https://github.com/luizaaca/screening_robot/blob/main/screening_robot.ipynb) for a compact clinical screening assistant based on Qwen3-0.6B. ## Artifact layout - Root-level merged model files (`model.safetensors`, `config.json`, `tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`) for direct `transformers` loading. - `lora/`: PEFT LoRA adapter weights and tokenizer/chat-template files. - `gguf/qwen3-0.6b-clinical-screening.Q4_K_M.gguf`: quantized GGUF export (`Q4_K_M`) for llama.cpp, Ollama, LM Studio, and similar runtimes. - `Modelfile`: Ollama-oriented prompt wrapper aligned with the training prompt and inference settings used in the notebook examples. ## Model details - **Repository**: `https://huggingface.co/luizaaca/qwen3-0.6b-clinical-screening` - **Base model**: `Qwen/Qwen3-0.6B` - **Training runtime base**: `unsloth/Qwen3-0.6B-unsloth-bnb-4bit` - **Training recipe**: QLoRA via Unsloth on a 4-bit loaded base model - **LoRA hyperparameters**: rank 16, alpha 32, dropout 0.0 - **Target modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` - **Max sequence length used during training**: 1024 - **Training steps**: 400 - **Target hardware**: Google Colab Free with Tesla T4 (16 GB VRAM) ## Training data The checked-in notebook trains on two Kaggle datasets: - `dhivyeshrk/diseases-and-symptoms-dataset`: binary symptom-matrix data converted to natural-language symptom lists, with a stratified cap of 50 examples per disease before the train/test split. - `niyarrbarman/symptom2disease`: free-text symptom descriptions mapped directly to disease labels. ## Output contract The assistant is fine-tuned to answer using the standardized disclaimer format: ```text Based on the reported symptoms, the clinical indication points to: . Disclaimer: This is an AI auxiliary tool designed for healthcare professionals. It is not 100% precise and does not replace a professional medical diagnosis. ``` This is a plain-text disease-identification assistant. Unlike the 1.7B JSON specialist model, this 0.6B notebook does **not** train on a structured JSON schema. ## Prompting notes The training and inference flow uses a system prompt that frames the model as a clinical AI assistant and user prompts shaped like: ```text Given the symptoms reported, identify the disease. Symptoms: ... /no_think ``` The notebook markdown discusses mixed thinking/no-thinking supervision, but the effective checked-in configuration sets `thinking_ratio = 0`, so the actual supervised examples follow the no-thinking path. ## Validation summary The notebook evaluates the base model against the fine-tuned model on held-out splits from both datasets and reports: - qualitative side-by-side generations; - Accuracy and macro-F1; - Cohen's Kappa; - row-normalized confusion matrices; and - BERTScore. The code also asserts that fine-tuned accuracy on Dataset 2 matches or exceeds the base model. ## Intended use This repository is suitable for: - research experiments on lightweight clinical-screening assistants; - teaching and prototyping around symptom-to-disease prompting; and - local inference with Transformers, PEFT, GGUF-compatible runtimes, or Ollama. ## Out-of-scope use This repository is **not** intended for: - autonomous diagnosis or treatment decisions; - emergency triage without clinician oversight; - prescribing or medication guidance; or - use as a substitute for professional medical judgment. ## Safety notice This is a research artifact for healthcare-support workflows only. Always keep a qualified human clinician in the loop.