初始化项目,由ModelHub XC社区提供模型
Model: radlab/pLLama3.2-3B-DPO Source: Original Platform
This commit is contained in:
36
README.md
Normal file
36
README.md
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
license: llama3.2
|
||||
language:
|
||||
- pl
|
||||
- en
|
||||
- es
|
||||
- de
|
||||
base_model:
|
||||
- radlab/pLLama3.2-3B
|
||||
---
|
||||
|
||||

|
||||
|
||||
### Intro
|
||||
We have released a collection of radlab/pLLama3.2 models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3.2 models. As part of the collection, we provide models in 1B and 3B architecture.
|
||||
Each model is available in two configurations:
|
||||
- radlab/pLLama3-1B, a model in architecture 1B only after fine-tuning
|
||||
- radlab/pLLama3-1B-DPO, a model in architecture 1B after fine-tuning and DPO process
|
||||
- radlab/pLLama3-3B, a model in architecture 3B only after fine-tuning
|
||||
- radlab/pLLama3-3B-DPO, a model in architecture 3B after fine-tuning and DPO process
|
||||
|
||||
### Dataset
|
||||
In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets.
|
||||
In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors.
|
||||
|
||||
### Learning
|
||||
The learning process was divided into two stages:
|
||||
- Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs.
|
||||
- After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps.
|
||||
|
||||
### Proposed parameters:
|
||||
* temperature: 0.6
|
||||
* repetition_penalty: 1.0
|
||||
|
||||
### Outro
|
||||
Enjoy!
|
||||
Reference in New Issue
Block a user