--- library_name: transformers pipeline_tag: text-generation tags: - llama - causal-lm - text-generation - transformers --- # Aitana-2B-S-base-IP-1.0 ## Table of Contents - Model description - Intended uses and limitations - How to use - Training - Technical specifications - Additional information ## Model description Aitana-2B-S-base-IP-1.0 is a generative language model with a decoder-only architecture. This repository contains the base checkpoint, intended for causal language modeling and for further adaptation or task-specific fine-tuning. Based on the files shipped in this repository, the checkpoint uses the Llama architecture and the Transformers ecosystem. The local configuration indicates: - architecture: `LlamaForCausalLM` - hidden size: `2048` - layers: `24` - attention heads: `16` - vocabulary size: `256000` - context length: `8192` - tensor dtype in config: `bfloat16` ## Intended uses and limitations Aitana-2B-S-base-IP-1.0 is a base model that can be used for causal language modeling and text generation. As with other base checkpoints, it is generally more useful as a starting point for instruction-tuning, domain adaptation, or downstream fine-tuning than as a final end-user assistant model. Because this repository currently only exposes the model artifacts and not the full training report, claims about domain coverage, language balance, safety behavior, and benchmark performance should be added only once they are confirmed by the model authors. ## How to use ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "gplsi/Aitana-2B-S-base-IP-1.0" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) prompt = "Escriu un breu resum sobre la importància de la llengua." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=128, do_sample=True, top_p=0.9, temperature=0.7, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training ### Base model TO-DO: document the original parent checkpoint or initialization source for Aitana-2B-S-base-IP-1.0. ### Training data TO-DO: document the training corpora, language distribution, preprocessing steps, deduplication policy, anonymization steps, and data filtering criteria. ### Training hyperparameters TO-DO: document the effective batch size, learning rate schedule, optimizer setup, number of epochs or tokens seen, sequence length used during training, and hardware. ## Technical specifications ### Model architecture and objective - architecture: decoder-only causal language model - implementation class: `LlamaForCausalLM` - hidden size: `2048` - intermediate size: `5440` - layers: `24` - attention heads: `16` - key/value heads: `16` - maximum position embeddings: `8192` - vocabulary size: `256000` - BOS token id: `1` - EOS token id: `2` - PAD token id: `3` ### Tokenizer The tokenizer files in this repository define: - BOS token: `` - EOS token: `` - PAD token: `` - UNK token: `` ### Hardware and software The repository is packaged for the Hugging Face `transformers` library. Specific training hardware and training software details should be documented by the model authors if they are intended to be part of the public model card. ## Additional information ### Author TO-DO: confirm the author list and institutional attribution to be displayed in the public model card. ### Contact TO-DO: add a contact email or project contact point. ### License TO-DO: confirm the license for this checkpoint and add it both here and in `config.json` if desired. ### Funding TO-DO: add funding information if this checkpoint is part of a funded project. ### Disclaimer This repository contains a base language model checkpoint. Base models can reflect biases present in their training data and may generate inaccurate, misleading, or unsafe content. Anyone deploying this model, or systems built on top of it, is responsible for evaluating those risks and ensuring compliance with applicable legal, ethical, and operational requirements.