141 lines
3.2 KiB
Markdown
141 lines
3.2 KiB
Markdown
---
|
|
library_name: transformers
|
|
license: llama3
|
|
datasets:
|
|
- VTSNLP/vietnamese_curated_dataset
|
|
language:
|
|
- vi
|
|
- en
|
|
base_model:
|
|
- meta-llama/Meta-Llama-3-8B
|
|
pipeline_tag: text-generation
|
|
---
|
|
|
|
# Model Information
|
|
|
|
<!-- Provide a quick summary of what the model is/does. -->
|
|
|
|
|
|
|
|
## Model Details
|
|
|
|
### Model Description
|
|
|
|
<!-- Provide a longer summary of what this model is. -->
|
|
|
|
Llama3-ViettelSolutions-8B is a variant of the Meta Llama-3-8B model, continued pre-trained on the [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset) and supervised fine-tuned on 5 million samples of Vietnamese instruct data.
|
|
- **Developed by:** Viettel Solutions
|
|
- **Funded by:** NVIDIA
|
|
- **Model type:** Autoregressive transformer model
|
|
- **Language(s) (NLP):** Vietnamese, English
|
|
- **License:** Llama 3 Community License
|
|
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B
|
|
|
|
## Uses
|
|
|
|
Example snippet for usage with Transformers:
|
|
|
|
```
|
|
import transformers
|
|
import torch
|
|
|
|
model_id = "VTSNLP/Llama3-ViettelSolutions-8B"
|
|
|
|
pipeline = transformers.pipeline(
|
|
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
|
|
)
|
|
pipeline("Xin chào!")
|
|
```
|
|
|
|
|
|
## Training Details
|
|
|
|
### Training Data
|
|
|
|
- Dataset for continue pretrain: [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset)
|
|
|
|
- Dataset for supervised fine-tuning: [Instruct general dataset](https://huggingface.co/datasets/VTSNLP/instruct_general_dataset)
|
|
|
|
|
|
### Training Procedure
|
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
|
|
|
#### Preprocessing
|
|
|
|
[More Information Needed]
|
|
|
|
|
|
#### Training Hyperparameters
|
|
|
|
- **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
|
- **Data sequence length:** 8192
|
|
- **Tensor model parallel size:** 4
|
|
- **Pipelinemodel parallel size:** 1
|
|
- **Context parallel size:** 1
|
|
- **Micro batch size:** 1
|
|
- **Global batch size:** 512
|
|
|
|
## Evaluation
|
|
|
|
<!-- This section describes the evaluation protocols and provides the results. -->
|
|
|
|
### Testing Data, Factors & Metrics
|
|
|
|
#### Testing Data
|
|
|
|
<!-- This should link to a Dataset Card if possible. -->
|
|
|
|
[More Information Needed]
|
|
|
|
#### Factors
|
|
|
|
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
|
|
|
[More Information Needed]
|
|
|
|
#### Metrics
|
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
|
|
|
[More Information Needed]
|
|
|
|
### Results
|
|
|
|
[More Information Needed]
|
|
|
|
#### Summary
|
|
|
|
[More Information Needed]
|
|
|
|
## Technical Specifications
|
|
|
|
- Compute Infrastructure: NVIDIA DGX
|
|
|
|
- Hardware: 4 x A100 80GB
|
|
|
|
- Software: [NeMo Framework](https://github.com/NVIDIA/NeMo)
|
|
|
|
## Citation
|
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
|
|
|
**BibTeX:**
|
|
|
|
[More Information Needed]
|
|
|
|
**APA:**
|
|
|
|
[More Information Needed]
|
|
|
|
## More Information
|
|
|
|
[More Information Needed]
|
|
|
|
## Model Card Authors
|
|
|
|
[More Information Needed]
|
|
|
|
## Model Card Contact
|
|
|
|
[More Information Needed] |