185 lines
6.8 KiB
Markdown
185 lines
6.8 KiB
Markdown
|
|
---
|
|||
|
|
license: apache-2.0
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
tags:
|
|||
|
|
- medical
|
|||
|
|
- palmyra
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**DEPRECATED MODEL NOTICE**
|
|||
|
|
==========================
|
|||
|
|
|
|||
|
|
Please note that this model is no longer maintained or supported by our team. We strongly advise against using it in production or for any critical applications.
|
|||
|
|
|
|||
|
|
Instead, we recommend using our latest and greatest models, which can be found at:
|
|||
|
|
|
|||
|
|
https://huggingface.co/collections/Writer/palmyra-writer-license-66476fa8156169f8720a2c89
|
|||
|
|
|
|||
|
|
==========================
|
|||
|
|
|
|||
|
|
# Palmyra-med-20b
|
|||
|
|
|
|||
|
|
## Model description
|
|||
|
|
**Palmyra-Med-20b** is a 20 billion parameter Large Language Model that has been uptrained on
|
|||
|
|
**Palmyra-Large** with a specialized custom-curated medical dataset.
|
|||
|
|
The main objective of this model is to enhance performance in tasks related to medical dialogue
|
|||
|
|
and question-answering.
|
|||
|
|
|
|||
|
|
- **Developed by:** [https://writer.com/](https://writer.com/);
|
|||
|
|
- **Model type:** Causal decoder-only;
|
|||
|
|
- **Language(s) (NLP):** English;
|
|||
|
|
- **License:** Apache 2.0;
|
|||
|
|
- **Finetuned from model:** [Palmyra-Large](https://huggingface.co/Writer/palmyra-large).
|
|||
|
|
|
|||
|
|
### Model Source
|
|||
|
|
|
|||
|
|
[Palmyra-Med: Instruction-Based Fine-Tuning of LLMs Enhancing Medical Domain Performance](https://dev.writer.com/docs/palmyra-med-instruction-based-fine-tuning-of-llms-enhancing-medical-domain-performance)
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Uses
|
|||
|
|
|
|||
|
|
|
|||
|
|
### Out-of-Scope Use
|
|||
|
|
|
|||
|
|
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
|
|||
|
|
|
|||
|
|
## Bias, Risks, and Limitations
|
|||
|
|
|
|||
|
|
Palmyra-Med-20B is mostly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
|
|||
|
|
|
|||
|
|
### Recommendations
|
|||
|
|
|
|||
|
|
We recommend users of Palmyra-Med-20B to develop guardrails and to take appropriate precautions for any production use.
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
The model is compatible with the huggingface `AutoModelForCausalLM` and can be easily run on a single 40GB A100.
|
|||
|
|
|
|||
|
|
```py
|
|||
|
|
import torch
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
|
|||
|
|
model_name = "Writer/palmyra-med-20b"
|
|||
|
|
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
|
|||
|
|
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
device_map="auto",
|
|||
|
|
torch_dtype=torch.float16,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
prompt = "Can you explain in simple terms how vaccines help our body fight diseases?"
|
|||
|
|
|
|||
|
|
input_text = (
|
|||
|
|
"A chat between a curious user and an artificial intelligence assistant. "
|
|||
|
|
"The assistant gives helpful, detailed, and polite answers to the user's questions. "
|
|||
|
|
"USER: {prompt} "
|
|||
|
|
"ASSISTANT:"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
model_inputs = tokenizer(input_text.format(prompt=prompt), return_tensors="pt").to(
|
|||
|
|
"cuda"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
gen_conf = {
|
|||
|
|
"temperature": 0.7,
|
|||
|
|
"repetition_penalty": 1.0,
|
|||
|
|
"max_new_tokens": 512,
|
|||
|
|
"do_sample": True,
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
out_tokens = model.generate(**model_inputs, **gen_conf)
|
|||
|
|
|
|||
|
|
response_ids = out_tokens[0][len(model_inputs.input_ids[0]) :]
|
|||
|
|
output = tokenizer.decode(response_ids, skip_special_tokens=True)
|
|||
|
|
|
|||
|
|
print(output)
|
|||
|
|
## output ##
|
|||
|
|
# Vaccines stimulate the production of antibodies by the body's immune system.
|
|||
|
|
# Antibodies are proteins produced by B lymphocytes in response to foreign substances,such as viruses and bacteria.
|
|||
|
|
# The antibodies produced by the immune system can bind to and neutralize the pathogens, preventing them from invading and damaging the host cells.
|
|||
|
|
# Vaccines work by introducing antigens, which are components of the pathogen, into the body.
|
|||
|
|
# The immune system then produces antibodies against the antigens, which can recognize and neutralize the pathogen if it enters the body in the future.
|
|||
|
|
# The use of vaccines has led to a significant reduction in the incidence and severity of many diseases, including measles, mumps, rubella, and polio.
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
It can also be used with text-generation-inference
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
model=Writer/palmyra-med-20b
|
|||
|
|
volume=$PWD/data
|
|||
|
|
|
|||
|
|
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference --model-id $model
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Dataset
|
|||
|
|
For the fine-tuning of our LLMs, we used a custom-curated medical dataset that combines data from
|
|||
|
|
two publicly available sources: PubMedQA (Jin et al. 2019) and MedQA (Zhang et al. 2018).The
|
|||
|
|
PubMedQA dataset, which originated from the PubMed abstract database, consists of biomedical
|
|||
|
|
articles accompanied by corresponding question-answer pairs. In contrast, the MedQA dataset
|
|||
|
|
features medical questions and answers that are designed to assess the reasoning capabilities of
|
|||
|
|
medical question-answering systems.
|
|||
|
|
We prepared our custom dataset by merging and processing data from the aforementioned sources,
|
|||
|
|
maintaining the dataset mixture ratios detailed in Table 1. These ratios were consistent for finetuning
|
|||
|
|
both Palmyra-20b and Palmyra-40b models. Upon fine-tuning the models with this dataset, we refer
|
|||
|
|
to the resulting models as Palmyra-Med-20b and Palmyra-Med-40b, respectively.
|
|||
|
|
|
|||
|
|
|
|||
|
|
| Dataset | Ratio | Count |
|
|||
|
|
| -----------|----------- | ----------- |
|
|||
|
|
| PubMedQA | 75% | 150,000 |
|
|||
|
|
| MedQA | 25% | 10,178 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Evaluation
|
|||
|
|
we present the findings of our experiments, beginning with the evaluation outcomes of
|
|||
|
|
the fine-tuned models and followed by a discussion of the base models’ performance on each of the
|
|||
|
|
evaluation datasets. Additionally, we report the progressive improvement of the Palmyra-Med-40b
|
|||
|
|
model throughout the training process on the PubMedQA dataset.
|
|||
|
|
|
|||
|
|
| Model | PubMedQA | MedQA |
|
|||
|
|
| -----------|----------- | ----------- |
|
|||
|
|
| Palmyra-20b | 49.8 | 31.2 |
|
|||
|
|
| Palmyra-40b | 64.8 | 43.1|
|
|||
|
|
| Palmyra-Med-20b| 75.6 | 44.6|
|
|||
|
|
| Palmyra-Med-40b| 81.1 | 72.4|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Limitation
|
|||
|
|
The model may not operate efficiently beyond the confines of the healthcare field.
|
|||
|
|
Since it has not been subjected to practical scenarios, its real-time efficacy and precision remain undetermined.
|
|||
|
|
Under no circumstances should it replace the advice of a medical professional, and it must be regarded solely as a tool for research purposes.
|
|||
|
|
|
|||
|
|
## Citation and Related Information
|
|||
|
|
|
|||
|
|
|
|||
|
|
To cite this model:
|
|||
|
|
```
|
|||
|
|
@misc{Palmyra-Med-20B,
|
|||
|
|
author = {Writer Engineering team},
|
|||
|
|
title = {{Palmyra-Large Parameter Autoregressive Language Model}},
|
|||
|
|
howpublished = {\url{https://dev.writer.com}},
|
|||
|
|
year = 2023,
|
|||
|
|
month = March
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Contact
|
|||
|
|
Hello@writer.com
|
|||
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
|||
|
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Writer__palmyra-med-20b)
|
|||
|
|
|
|||
|
|
| Metric | Value |
|
|||
|
|
|-----------------------|---------------------------|
|
|||
|
|
| Avg. | 40.02 |
|
|||
|
|
| ARC (25-shot) | 46.93 |
|
|||
|
|
| HellaSwag (10-shot) | 73.51 |
|
|||
|
|
| MMLU (5-shot) | 44.34 |
|
|||
|
|
| TruthfulQA (0-shot) | 35.47 |
|
|||
|
|
| Winogrande (5-shot) | 65.35 |
|
|||
|
|
| GSM8K (5-shot) | 2.65 |
|
|||
|
|
| DROP (3-shot) | 11.88 |
|