初始化项目，由ModelHub XC社区提供模型

Model: VTSNLP/Llama3-ViettelSolutions-8B Source: Original Platform
2026-04-30 07:50:51 +08:00
commit 1e6e7bad17
15 changed files with 413172 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,141 @@
+---
+library_name: transformers
+license: llama3
+datasets:
+- VTSNLP/vietnamese_curated_dataset
+language:
+- vi
+- en
+base_model:
+- meta-llama/Meta-Llama-3-8B
+pipeline_tag: text-generation
+---
+
+# Model Information
+
+<!-- Provide a quick summary of what the model is/does. -->
+
+
+
+## Model Details
+
+### Model Description
+
+<!-- Provide a longer summary of what this model is. -->
+
+Llama3-ViettelSolutions-8B is a variant of the Meta Llama-3-8B model, continued pre-trained on the [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset) and supervised fine-tuned on 5 million samples of Vietnamese instruct data.
+- **Developed by:** Viettel Solutions
+- **Funded by:** NVIDIA
+- **Model type:** Autoregressive transformer model
+- **Language(s) (NLP):** Vietnamese, English
+- **License:** Llama 3 Community License
+- **Finetuned from model:** meta-llama/Meta-Llama-3-8B
+
+## Uses
+
+Example snippet for usage with Transformers:
+
+```
+import transformers
+import torch
+
+model_id = "VTSNLP/Llama3-ViettelSolutions-8B"
+
+pipeline = transformers.pipeline(
+    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
+)
+pipeline("Xin chào!")
+```
+
+
+## Training Details
+
+### Training Data
+
+- Dataset for continue pretrain: [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset)
+
+- Dataset for supervised fine-tuning: [Instruct general dataset](https://huggingface.co/datasets/VTSNLP/instruct_general_dataset)
+
+
+### Training Procedure
+
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+
+#### Preprocessing
+
+[More Information Needed]
+
+
+#### Training Hyperparameters
+
+- **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+- **Data sequence length:** 8192
+- **Tensor model parallel size:** 4
+- **Pipelinemodel parallel size:** 1
+- **Context parallel size:** 1
+- **Micro batch size:** 1
+- **Global batch size:** 512
+
+## Evaluation
+
+<!-- This section describes the evaluation protocols and provides the results. -->
+
+### Testing Data, Factors & Metrics
+
+#### Testing Data
+
+<!-- This should link to a Dataset Card if possible. -->
+
+[More Information Needed]
+
+#### Factors
+
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+
+[More Information Needed]
+
+#### Metrics
+
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+
+[More Information Needed]
+
+### Results
+
+[More Information Needed]
+
+#### Summary
+
+[More Information Needed]
+
+## Technical Specifications
+
+- Compute Infrastructure: NVIDIA DGX 
+
+- Hardware: 4 x A100 80GB
+
+- Software: [NeMo Framework](https://github.com/NVIDIA/NeMo)
+
+## Citation 
+
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+
+**BibTeX:**
+
+[More Information Needed]
+
+**APA:**
+
+[More Information Needed]
+
+## More Information
+
+[More Information Needed]
+
+## Model Card Authors
+
+[More Information Needed]
+
+## Model Card Contact
+
+[More Information Needed]