96 lines
4.4 KiB
Markdown
96 lines
4.4 KiB
Markdown
|
|
---
|
|||
|
|
license: apache-2.0
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
- he
|
|||
|
|
tags:
|
|||
|
|
- instruction-tuned
|
|||
|
|
base_model: dicta-il/dictalm2.0
|
|||
|
|
inference:
|
|||
|
|
parameters:
|
|||
|
|
temperature: 0.7
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
[<img src="https://i.ibb.co/5Lbwyr1/dicta-logo.jpg" width="300px"/>](https://dicta.org.il)
|
|||
|
|
|
|||
|
|
|
|||
|
|
# Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities
|
|||
|
|
|
|||
|
|
The DictaLM-2.0-Instruct Large Language Model (LLM) is an instruct fine-tuned version of the [DictaLM-2.0](https://huggingface.co/dicta-il/dictalm2.0) generative model using a variety of conversation datasets.
|
|||
|
|
|
|||
|
|
For full details of this model please read our [release blog post](https://dicta.org.il/dicta-lm) or the [technical report](https://arxiv.org/abs/2407.07080).
|
|||
|
|
|
|||
|
|
This is the instruct-tuned full-precision model designed for chat. You can try the model out on a live demo [here](https://huggingface.co/spaces/dicta-il/dictalm2.0-instruct-demo).
|
|||
|
|
|
|||
|
|
You can view and access the full collection of base/instruct unquantized/quantized versions of `DictaLM-2.0` [here](https://huggingface.co/collections/dicta-il/dicta-lm-20-collection-661bbda397df671e4a430c27).
|
|||
|
|
|
|||
|
|
## Instruction format
|
|||
|
|
|
|||
|
|
In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
|
|||
|
|
|
|||
|
|
E.g.
|
|||
|
|
```
|
|||
|
|
text = """<s>[INST] איזה רוטב אהוב עליך? [/INST]
|
|||
|
|
טוב, אני די מחבב כמה טיפות מיץ לימון סחוט טרי. זה מוסיף בדיוק את הכמות הנכונה של טעם חמצמץ לכל מה שאני מבשל במטבח!</s>[INST] האם יש לך מתכונים למיונז? [/INST]"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:
|
|||
|
|
|
|||
|
|
## Example Code
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
import torch
|
|||
|
|
|
|||
|
|
device = "cuda" # the device to load the model onto
|
|||
|
|
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained("dicta-il/dictalm2.0-instruct", torch_dtype=torch.bfloat16, device_map=device)
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained("dicta-il/dictalm2.0-instruct")
|
|||
|
|
|
|||
|
|
messages = [
|
|||
|
|
{"role": "user", "content": "איזה רוטב אהוב עליך?"},
|
|||
|
|
{"role": "assistant", "content": "טוב, אני די מחבב כמה טיפות מיץ לימון סחוט טרי. זה מוסיף בדיוק את הכמות הנכונה של טעם חמצמץ לכל מה שאני מבשל במטבח!"},
|
|||
|
|
{"role": "user", "content": "האם יש לך מתכונים למיונז?"}
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
encoded = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
|
|||
|
|
|
|||
|
|
generated_ids = model.generate(encoded, max_new_tokens=50, do_sample=True)
|
|||
|
|
decoded = tokenizer.batch_decode(generated_ids)
|
|||
|
|
print(decoded[0])
|
|||
|
|
# <s> [INST] איזה רוטב אהוב עליך? [/INST]
|
|||
|
|
# טוב, אני די מחבב כמה טיפות מיץ לימון סחוט טרי. זה מוסיף בדיוק את הכמות הנכונה של טעם חמצמץ לכל מה שאני מבשל במטבח!</s> [INST] האם יש לך מתכונים למיונז? [/INST]
|
|||
|
|
# בטח, הנה מתכון בסיסי וקל להכנת מיונז ביתי!
|
|||
|
|
#
|
|||
|
|
# מרכיבים:
|
|||
|
|
# - 2 חלמונים גדולים
|
|||
|
|
# - 1 כף חומץ יין לבן
|
|||
|
|
# (it stopped early because we set max_new_tokens=50)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Model Architecture
|
|||
|
|
|
|||
|
|
DictaLM-2.0-Instruct follows the [Zephyr-7B-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) recipe for fine-tuning an instruct model, with an extended instruct dataset for Hebrew.
|
|||
|
|
|
|||
|
|
## Limitations
|
|||
|
|
|
|||
|
|
The DictaLM 2.0 Instruct model is a demonstration that the base model can be fine-tuned to achieve compelling performance.
|
|||
|
|
It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
|
|||
|
|
make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
|
|||
|
|
|
|||
|
|
## Citation
|
|||
|
|
|
|||
|
|
If you use this model, please cite:
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@misc{shmidman2024adaptingllmshebrewunveiling,
|
|||
|
|
title={Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities},
|
|||
|
|
author={Shaltiel Shmidman and Avi Shmidman and Amir DN Cohen and Moshe Koppel},
|
|||
|
|
year={2024},
|
|||
|
|
eprint={2407.07080},
|
|||
|
|
archivePrefix={arXiv},
|
|||
|
|
primaryClass={cs.CL},
|
|||
|
|
url={https://arxiv.org/abs/2407.07080},
|
|||
|
|
}
|
|||
|
|
```
|