初始化项目,由ModelHub XC社区提供模型
Model: VSSA-SDSA/LT_AI_DLKVM Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
345
README.md
Normal file
345
README.md
Normal file
@@ -0,0 +1,345 @@
|
||||
---
|
||||
language:
|
||||
- lt
|
||||
library_name: transformers
|
||||
tags:
|
||||
- blkt
|
||||
- causal-lm
|
||||
- text-generation
|
||||
- pretraining
|
||||
- lithuanian
|
||||
- llama3
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# LT_AI_DLKVM modelio kortelė (LT) / Model Card for LT_AI_DLKVM (EN)
|
||||
|
||||
## Turinys / Table of Contents
|
||||
- [Modelio informacija](#modelio-informacija) (LT) / [Model Details](#model-details) (EN)
|
||||
- [Kaip pradėti naudoti modelį](#kaip-pradėti-naudoti-modelį) (LT) / [How to Get Started with the Model](#how-to-get-started-with-the-model) (EN)
|
||||
- [Naudojimo sritys](#naudojimo-sritys) (LT) / [Uses](#uses) (EN)
|
||||
- [Rizikos, šališkumai ir ribotumai](#rizikos-šališkumai-ir-ribotumai) (LT) / [Risks, Biases, and Limitations](#risks-biases-and-limitations) (EN)
|
||||
- [Mokymo detalės](#mokymo-detalės) (LT) / [Training Details](#training-details) (EN)
|
||||
- [Įvertinimas](#įvertinimas) (LT) / [Evaluation](#evaluation) (EN)
|
||||
- [Citavimas](#citavimas) (LT) / [Citation](#citation) (EN)
|
||||
- [Licencija](#licencija) (LT) / [License](#license) (EN)
|
||||
|
||||
## Modelio informacija
|
||||
|
||||
**Modelio pavadinimas:** LT_AI_DLKVM
|
||||
|
||||
**Projektas:** BLKT-VMS pipeline. Modelis sukurtas kaip tęstinės lietuvių kalbos modelių plėtros dalis, naudojant Lietuvių kalbos tekstyną ir specialiai šiam modeliui parengtą 32 768 tokenų žodyną (tokenizerį).
|
||||
|
||||
**Architektūra:** Llama3 principais paremtas kauzalinis kalbos modelis, naudojamas per Hugging Face Transformers bibliotekos `AutoModelForCausalLM` realizaciją.
|
||||
|
||||
**Modelio aprašas:** LT_AI_DLKVM – tai lietuvių kalbos kauzalinis kalbos modelis, sukurtas tyrimams, išankstiniam mokymui nuo nulio ir tolesniam pritaikymui lietuvių kalbos generavimo bei kalbos technologijų užduotyse. Modelio svoriai buvo inicializuoti atsitiktine tvarka, o mokymas vykdytas dviem etapais, naudojant Lietuvių kalbos tekstyno apdorotą variantą, parengtą ilgo konteksto mokymui.
|
||||
|
||||
Modeliui buvo naudojamas specialiai apmokytas **32 000 tokenų tokenizeris**. Turėdamas apie **1,04 mlrd. parametrų** ir palaikydamas maksimalų **32 768 tokenų** konteksto ilgį, modelis yra pritaikytas efektyviai apdoroti ilgus lietuviškus tekstus ir mišraus domeno turinį. Jis skirtas naudoti kaip bazinis generatyvinis modelis tolesniam papildomam mokymui, domeniniam adaptavimui ir eksperimentams lietuvių kalbos NLP srityje.
|
||||
|
||||
Pagal nutylėjimą modelis nėra instruktavimui pritaikytas ar specializuotas konkrečioms užduotims. Norint jį taikyti pokalbių sistemoms, santraukų sudarymui, klasifikavimui ar domeniškai specifiniam generavimui, rekomenduojamas papildomas mokymas ir vertinimas.
|
||||
|
||||
## Model Details
|
||||
|
||||
**Model name:** LT_AI_DLKVM
|
||||
|
||||
**Project:** BLKT-VMS pipeline. The model was developed as part of the continued development of Lithuanian language models using the Lithuanian Text Corpus and a dedicated **32,000-token tokenizer** prepared specifically for this model.
|
||||
|
||||
**Architecture:** A causal language model based on Llama3 design principles, used through the `AutoModelForCausalLM` implementation from the Hugging Face Transformers library.
|
||||
|
||||
**Model description:** LT_AI_DLKVM is a Lithuanian causal language model developed for research, pretraining from scratch, and downstream adaptation in Lithuanian text generation and language technology tasks. The model weights were initialized randomly, and training was carried out in two stages using the Lithuanian Text Corpus processed variant prepared for long-context training.
|
||||
|
||||
The model uses a specially trained **32,000-token tokenizer**. With approximately **1.04B parameters** and support for a maximum context length of **32,768 tokens**, the model is designed to process long Lithuanian texts and mixed-domain content efficiently. It is intended as a base generative model for further fine-tuning, domain adaptation, and experimentation in Lithuanian NLP.
|
||||
|
||||
The model is not instruction-tuned or task-specialized by default. For downstream applications such as chat, summarization, classification, or domain-specific generation, additional fine-tuning and evaluation are recommended.
|
||||
|
||||
## Kaip pradėti naudoti modelį
|
||||
|
||||
Modelis naudojamas su Transformers biblioteka per `AutoModelForCausalLM`. Jis priima lietuviškus tekstinius raginimus ir generuoja tekstą autoregresiniu būdu.
|
||||
|
||||
Paprastas Python pavyzdys inferencijai:
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
model_id = "VSSA-SDSA/LT_AI_DLKVM"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
|
||||
device_map="auto" if torch.cuda.is_available() else None,
|
||||
)
|
||||
|
||||
prompt = "Lietuvos technologijų ateitis priklausys nuo"
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||
inputs.pop("token_type_ids", None)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model.generate(
|
||||
**inputs,
|
||||
max_new_tokens=256,
|
||||
do_sample=True,
|
||||
temperature=0.7,
|
||||
top_p=0.95,
|
||||
repetition_penalty=1.05,
|
||||
)
|
||||
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## How to Get Started with the Model
|
||||
|
||||
The model is used with the Transformers library through `AutoModelForCausalLM`. It accepts Lithuanian text prompts and generates text autoregressively.
|
||||
|
||||
Simple Python code for inference:
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
model_id = "VSSA-SDSA/LT_AI_DLKVM"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
|
||||
device_map="auto" if torch.cuda.is_available() else None,
|
||||
)
|
||||
|
||||
prompt = "Lietuvos technologijų ateitis priklausys nuo"
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||
inputs.pop("token_type_ids", None)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model.generate(
|
||||
**inputs,
|
||||
max_new_tokens=256,
|
||||
do_sample=True,
|
||||
temperature=0.7,
|
||||
top_p=0.95,
|
||||
repetition_penalty=1.05,
|
||||
)
|
||||
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Naudojimo sritys
|
||||
|
||||
**Numatytasis naudojimas ir ribotumai:** LT_AI_DLKVM skirtas tyrimams, plėtrai ir lietuvių kalbos generavimo sistemų diegimui, įskaitant išankstinį mokymą nuo nulio, domeninį adaptavimą, generavimą pagal raginimus ir ilgo konteksto kalbos modeliavimo eksperimentus.
|
||||
|
||||
Modelis gali būti naudojamas kaip bazinis modelis papildomam mokymui viešojo ir privataus sektoriaus projektuose, kuriuose reikalingas kokybiškas lietuviškų tekstų generavimas arba generatyvinis kalbos modeliavimas.
|
||||
|
||||
Norint modelį patikimai taikyti konkrečioms užduotims, jį rekomenduojama papildomai mokyti ir įvertinti pagal numatytą naudojimo scenarijų.
|
||||
|
||||
## Uses
|
||||
|
||||
**Intended use & limitations:** LT_AI_DLKVM is intended for research, development, and deployment of Lithuanian language generation systems, including pretraining from scratch, domain adaptation, prompt-based generation, and experimentation with long-context language modelling.
|
||||
|
||||
It can serve as a base model for fine-tuning in both public and private sector projects that require robust Lithuanian text generation or generative language modelling capabilities.
|
||||
|
||||
To apply the model reliably to specific downstream tasks, it should be fine-tuned and evaluated for the intended use case.
|
||||
|
||||
## Rizikos, šališkumai ir ribotumai
|
||||
|
||||
Modelis yra bazinis kauzalinis kalbos modelis ir nėra optimizuotas instrukcijų vykdymui, saugai kritinėse sąveikose ar faktiniam patikimumui. Jis gali generuoti sklandų, tačiau neteisingą, neišsamų ar klaidinantį turinį.
|
||||
|
||||
Nors modelis mokytas daugiausia su lietuvių kalbai artimais duomenimis, jis gali atspindėti mokymo korpuse esančius šališkumus, disbalansus ir teminius iškraipymus. Generavimo kokybė gali skirtis priklausomai nuo srities, raginimo formuluotės ir konteksto ilgio.
|
||||
|
||||
**Saugos, šališkumo ir rizikų aspektai:**
|
||||
- Modelis gali generuoti haliucinacijas arba faktinių netikslumų turinį.
|
||||
- Modelis gali atkartoti socialinius, kultūrinius ar kalbinius šališkumus, esančius šaltinio duomenyse.
|
||||
- Modelis nėra tinkamas didelės svarbos taikymams be papildomų apsaugos priemonių, stebėsenos ir užduočiai specifinio validavimo.
|
||||
- Už lietuvių kalbos ar susijusio mišraus domeno ribų modelio veikimas gali būti mažiau patikimas.
|
||||
|
||||
## Risks, Biases, and Limitations
|
||||
|
||||
The model is a base causal language model and is not optimized for instruction-following, safety-critical interaction, or factual reliability. It may generate fluent but incorrect, incomplete, or misleading content.
|
||||
|
||||
While trained on Lithuanian-centric data, the model may reflect biases, imbalances, and topical skews present in the training corpus. Output quality may vary depending on domain, prompting style, and context length.
|
||||
|
||||
**Safety, bias, and risk considerations:**
|
||||
- The model may generate hallucinated or factually inaccurate content.
|
||||
- The model may reproduce social, cultural, or linguistic biases present in the source data.
|
||||
- The model is not suitable for high-stakes use without additional safeguards, monitoring, and task-specific validation.
|
||||
- Performance outside Lithuanian-centric or related mixed-domain settings may be less reliable.
|
||||
|
||||
## Mokymo detalės
|
||||
|
||||
**Mokymo duomenys:** Modelis buvo mokytas naudojant Lietuvių kalbos tekstyną, konkrečiai **Stage 5 processed 32k** variantą, parengtą ilgo konteksto mokymui.
|
||||
|
||||
Apdorotas duomenų rinkinys palaiko:
|
||||
- **Konteksto ilgį:** 32 768 tokenus
|
||||
- **Apytikslį tokenizuotą dydį:** 6,32 mlrd. tokenų vienai epochai
|
||||
|
||||
### Mokymo procedūra
|
||||
|
||||
Modelis buvo mokytas naudojant **kauzalinio kalbos modeliavimo** tikslą (kito tokeno prognozavimą) dviem etapais:
|
||||
|
||||
1. **Pradinis mokymas nuo nulio** su maksimaliu **8 196 tokenų** konteksto ilgiu
|
||||
2. **Ilgo konteksto mokymas**, išplečiant palaikymą iki **32 768 tokenų**
|
||||
|
||||
Modelio svoriai prieš mokymą buvo inicializuoti atsitiktine tvarka.
|
||||
|
||||
### Pradinis mokymas nuo nulio
|
||||
- **Epochų skaičius:** 6
|
||||
- **Maksimalus konteksto ilgis:** 8 196 tokenai
|
||||
- **Mokymosi žingsnio dydis (learning rate):** 2e-4
|
||||
- **Optimizatorius:** AdamW
|
||||
- **Mikro paketo dydis įrenginiui:** 4
|
||||
- **Gradientų kaupimo žingsniai:** 32
|
||||
- **Gradientų kontrolinių taškų metodas:** Įjungtas
|
||||
- **GPU skaičius:** 8
|
||||
- **Aparatinė įranga:** 8 × NVIDIA H100-SXM5-80GB GPUs
|
||||
|
||||
### Ilgo konteksto mokymas
|
||||
- **Epochų skaičius:** 4
|
||||
- **Maksimalus konteksto ilgis:** 32 768 tokenai
|
||||
- **Mokymosi žingsnio dydis (learning rate):** 2e-4
|
||||
- **Optimizatorius:** AdamW
|
||||
- **Mikro paketo dydis vienam įrenginiui:** 2
|
||||
- **Gradientų kaupimo žingsniai:** 64
|
||||
- **Gradientų kontrolinių taškų metodas:** Įjungtas
|
||||
- **GPU skaičius:** 8
|
||||
- **Aparatinė įranga:** 8 × NVIDIA H100-SXM5-80GB GPUs
|
||||
|
||||
### Efektyvus paketų dydis
|
||||
|
||||
Efektyvus globalus paketų dydis abiejuose etapuose buvo:
|
||||
|
||||
`micro_batch_size × gradient_accumulation_steps × GPU_count = 1024`
|
||||
|
||||
Tai sudaro:
|
||||
- **Pradinis etapas:** `4 × 32 × 8 = 1024`
|
||||
- **Ilgo konteksto etapas:** `2 × 64 × 8 = 1024`
|
||||
|
||||
### Mokymo santraukos lentelė
|
||||
|
||||
| Etapas | Epochos | Konteksto ilgis | Learning Rate | Mikro paketo dydis | Gradientų kaupimas | Optimizatorius | Gradientų kontrolinių taškų metodas | GPU | Efektyvus globalus paketų dydis |
|
||||
| :--- | ---: | ---: | :--- | ---: | ---: | :--- | :--- | ---: | ---: |
|
||||
| Pradinis mokymas nuo nulio | 6 | 8,196 | 4e-4 | 4 | 32 | AdamW | Įsjungtas | 8 | 1,024 |
|
||||
| Ilgo konteksto mokymas | 4 | 32,768 | 2e-4 | 2 | 64 | AdamW | Įjungtas | 8 | 1,024 |
|
||||
|
||||
## Training Details
|
||||
|
||||
**Training data:** The model was trained on the Lithuanian Text Corpus, specifically the **Stage 5 processed 32k variant**, prepared for long-context training.
|
||||
|
||||
The processed dataset supports:
|
||||
- **Context length:** 32,768 tokens
|
||||
- **Approximate tokenized size:** 6.32 billion tokens for a single epoch
|
||||
|
||||
### Training procedure
|
||||
|
||||
The model was trained using the **causal language modelling** objective (next-token prediction) in two stages:
|
||||
|
||||
1. **Initial training from scratch** with a maximum context length of **8,196 tokens**
|
||||
2. **Long-context training** extending support to **32,768 tokens**
|
||||
|
||||
The model weights were initialized randomly before training.
|
||||
|
||||
### Initial training from scratch
|
||||
- **Number of epochs:** 6
|
||||
- **Maximum context length:** 8,196 tokens
|
||||
- **Learning rate:** 2e-4
|
||||
- **Optimizer:** AdamW
|
||||
- **Per-device micro batch size:** 4
|
||||
- **Gradient accumulation steps:** 32
|
||||
- **Gradient checkpointing:** Enabled
|
||||
- **Number of GPUs:** 8
|
||||
- **Hardware:** 8 × NVIDIA H100-SXM5-80GB GPUs
|
||||
|
||||
### Long-context training
|
||||
- **Number of epochs:** 4
|
||||
- **Maximum context length:** 32,768 tokens
|
||||
- **Learning rate:** 2e-4
|
||||
- **Optimizer:** AdamW
|
||||
- **Per-device micro batch size:** 2
|
||||
- **Gradient accumulation steps:** 64
|
||||
- **Gradient checkpointing:** Enabled
|
||||
- **Number of GPUs:** 8
|
||||
- **Hardware:** 8 × NVIDIA H100-SXM5-80GB GPUs
|
||||
|
||||
### Effective batch size
|
||||
|
||||
The effective global batch size in both stages was:
|
||||
|
||||
`micro_batch_size × gradient_accumulation_steps × GPU_count = 1024`
|
||||
|
||||
This gives:
|
||||
- **Initial stage:** `4 × 32 × 8 = 1024`
|
||||
- **Long-context stage:** `2 × 64 × 8 = 1024`
|
||||
|
||||
### Training summary table
|
||||
|
||||
| Stage | Epochs | Context Length | Learning Rate | Micro Batch Size | Gradient Accumulation | Optimizer | Gradient Checkpointing | GPUs | Effective Global Batch Size |
|
||||
| :--- | ---: | ---: | :--- | ---: | ---: | :--- | :--- | ---: | ---: |
|
||||
| Initial training from scratch | 6 | 8,196 | 4e-4 | 4 | 32 | AdamW | Disabled | 8 | 1,024 |
|
||||
| Long-context training | 4 | 32,768 | 2e-4 | 2 | 64 | AdamW | Enabled | 8 | 1,024 |
|
||||
|
||||
## Įvertinimas
|
||||
|
||||
**Įvertinimo būsena:** Šiuo metu modelio kortelėje nepateikiama jokių konkrečių užduočiai skirtų įvertinimo rezultatų.
|
||||
|
||||
Pilnesnei viešai versijai šį skyrių rekomenduojama išplėsti, įtraukiant:
|
||||
- perplexity arba nuostolio (loss) rodiklius su atidėtuoju lietuvišku validacijos rinkiniu,
|
||||
- tolesnių užduočių benchmark rezultatus,
|
||||
- ilgo konteksto vertinimą,
|
||||
- saugos ir šališkumo analizę.
|
||||
|
||||
## Evaluation
|
||||
|
||||
**Evaluation status:** No task-specific evaluation results are currently provided in this model card.
|
||||
|
||||
For a fuller release, this section should ideally be expanded with:
|
||||
- perplexity or loss on held-out Lithuanian validation data,
|
||||
- downstream benchmark results,
|
||||
- long-context evaluation,
|
||||
- safety and bias analysis.
|
||||
|
||||
## Citavimas
|
||||
|
||||
Jei naudojate LT_AI_DLKVM ar bet kurią šios saugyklos dalį savo tyrimuose ar diegime, cituokite taip (BibTeX):
|
||||
|
||||
```bibtex
|
||||
@misc{SDSA_LT_AI_DLKVM_2025,
|
||||
title = {{LT_AI_DLKVM}: Lithuanian Causal Language Model},
|
||||
author = {{State Digital Solutions Agency (SDSA)}},
|
||||
year = {2025},
|
||||
howpublished = {\url{https://huggingface.co/VSSA-SDSA/LT_AI_DLKVM}},
|
||||
note = {Developed by Vytautas Magnus University (VMU), UAB Neurotechnology, UAB Tilde informacinės technologijos, MB Krilas}
|
||||
}
|
||||
```
|
||||
|
||||
## Citation
|
||||
|
||||
If you use LT_AI_DLKVM or any part of this repository in your research or deployment, please cite as follows (BibTeX):
|
||||
|
||||
```bibtex
|
||||
@misc{SDSA_LT_AI_DLKVM_2025,
|
||||
title = {{LT_AI_DLKVM}: Lithuanian Causal Language Model},
|
||||
author = {{State Digital Solutions Agency (SDSA)}},
|
||||
year = {2025},
|
||||
howpublished = {\url{https://huggingface.co/VSSA-SDSA/LT_AI_DLKVM}},
|
||||
note = {Developed by Vytautas Magnus University (VMU), UAB Neurotechnology, UAB Tilde informacinės technologijos, MB Krilas}
|
||||
}
|
||||
```
|
||||
|
||||
## Licencija
|
||||
|
||||
Copyright (c) 2025 State Digital Solutions Agency (SDSA)
|
||||
|
||||
Sukurta Vytauto Didžiojo universiteto (VDU), UAB „Neurotechnology“, UAB „Tilde informacinės technologijos“, MB „Krilas“
|
||||
|
||||
Licencijuota pagal NewGenLTU openRAIL-M
|
||||
|
||||
**Pastaba:** Finansuoja Ekonomikos gaivinimo ir atsparumo didinimo priemonės planas „Naujos kartos Lietuva“
|
||||
|
||||
## License
|
||||
|
||||
Copyright (c) 2025 State Digital Solutions Agency (SDSA)
|
||||
|
||||
Developed by Vytautas Magnus University (VMU), UAB Neurotechnology, UAB Tilde informacinės technologijos, MB Krilas
|
||||
|
||||
Licensed under NewGenLTU openRAIL-M
|
||||
|
||||
**Notice:** Funded by Economic Recovery and Resilience Facility "New Generation Lithuania" Plan
|
||||
36
config.json
Normal file
36
config.json
Normal file
@@ -0,0 +1,36 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 0,
|
||||
"dtype": "bfloat16",
|
||||
"eos_token_id": 0,
|
||||
"head_dim": 64,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 2048,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 8192,
|
||||
"max_position_embeddings": 131072,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 16,
|
||||
"num_key_value_heads": 8,
|
||||
"pad_token_id": 0,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": {
|
||||
"factor": 32.0,
|
||||
"high_freq_factor": 4.0,
|
||||
"low_freq_factor": 1.0,
|
||||
"original_max_position_embeddings": 8192,
|
||||
"rope_type": "llama3"
|
||||
},
|
||||
"rope_theta": 500000.0,
|
||||
"tie_word_embeddings": true,
|
||||
"transformers_version": "4.57.3",
|
||||
"use_cache": true,
|
||||
"vocab_size": 32000
|
||||
}
|
||||
84
dvlkm.jsonld
Normal file
84
dvlkm.jsonld
Normal file
@@ -0,0 +1,84 @@
|
||||
{
|
||||
"@context": "https://semiceu.github.io/MLDCAT-AP/releases/3.0.0/context/mldcat-ap.jsonld",
|
||||
"@type": "it6:MachineLearningModel",
|
||||
"@id": "https://example.lt/model/DVLKM",
|
||||
"dct:identifier": "DVLKM",
|
||||
"dct:title": {
|
||||
"@value": "LT generavimo modelis",
|
||||
"@language": "lt"
|
||||
},
|
||||
"dct:description": {
|
||||
"@value": "LT generavimo modelis",
|
||||
"@language": "lt"
|
||||
},
|
||||
"dct:created": "2026-04-15 00:00:00",
|
||||
"it6:version": "1.0",
|
||||
"dct:publisher": {
|
||||
"@type": "foaf:Agent",
|
||||
"foaf:name": "VSSA"
|
||||
},
|
||||
"dct:creator": {
|
||||
"@type": "foaf:Agent",
|
||||
"foaf:name": "UAB Neurotechnology"
|
||||
},
|
||||
"dct:rightsHolder": {
|
||||
"@type": "foaf:Agent",
|
||||
"foaf:name": "VSSA"
|
||||
},
|
||||
"dct:license": {
|
||||
"@id": "MIT"
|
||||
},
|
||||
"it6:intendedUse": [
|
||||
{
|
||||
"@value": "modelis skirtas teksyto generavimui",
|
||||
"@language": "lt"
|
||||
}
|
||||
],
|
||||
"dct:language": [
|
||||
{
|
||||
"@value": "lt"
|
||||
}
|
||||
],
|
||||
"it6:modelArchitecture": [
|
||||
{
|
||||
"@value": "transformer"
|
||||
}
|
||||
],
|
||||
"it6:hasInputModality": [
|
||||
{
|
||||
"@value": "text"
|
||||
}
|
||||
],
|
||||
"it6:hasOutputModality": [
|
||||
{
|
||||
"@value": "text"
|
||||
}
|
||||
],
|
||||
"dcat:landingPage": {
|
||||
"@id": "Huggingface"
|
||||
},
|
||||
"it6:trainedOn": [
|
||||
{
|
||||
"@id": "https://example.lt/dataset/BLKT"
|
||||
}
|
||||
],
|
||||
"it6:hasFile": [
|
||||
{
|
||||
"@type": "it6:File",
|
||||
"@id": "https://example.lt/file/dvlkm",
|
||||
"dct:title": {
|
||||
"@value": "DLKVM",
|
||||
"@language": "lt"
|
||||
},
|
||||
"dct:format": {
|
||||
"@value": "application/octet-stream"
|
||||
},
|
||||
"dcat:accessURL": {
|
||||
"@id": "Huggingface"
|
||||
},
|
||||
"it6:url": {
|
||||
"@id": "Huggingface"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
9
generation_config.json
Normal file
9
generation_config.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 0,
|
||||
"eos_token_id": [
|
||||
0
|
||||
],
|
||||
"pad_token_id": 0,
|
||||
"transformers_version": "4.57.3"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:5055d1fbfe560b6c8201304251200da606d7d5ec1cdb4b5c5a195f75e1034671
|
||||
size 2208453080
|
||||
30
special_tokens_map.json
Normal file
30
special_tokens_map.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
159031
tokenizer.json
Normal file
159031
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
37
tokenizer_config.json
Normal file
37
tokenizer_config.json
Normal file
@@ -0,0 +1,37 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"bos_token": "<|endoftext|>",
|
||||
"clean_up_tokenization_spaces": true,
|
||||
"eos_token": "<|endoftext|>",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"tokenizer_class": "PreTrainedTokenizerFast",
|
||||
"unk_token": "<|endoftext|>"
|
||||
}
|
||||
Reference in New Issue
Block a user