Aitana-2B-S-base-IP-1.0/README.md

---
library_name: transformers
pipeline_tag: text-generation
tags:
  - llama
  - causal-lm
  - text-generation
  - transformers
---

# Aitana-2B-S-base-IP-1.0

## Table of Contents

- Model description
- Intended uses and limitations
- How to use
- Training
- Technical specifications
- Additional information

## Model description

Aitana-2B-S-base-IP-1.0 is a generative language model with a decoder-only architecture.
This repository contains the base checkpoint, intended for causal language modeling and
for further adaptation or task-specific fine-tuning.

Based on the files shipped in this repository, the checkpoint uses the Llama
architecture and the Transformers ecosystem. The local configuration indicates:

- architecture: `LlamaForCausalLM`
- hidden size: `2048`
- layers: `24`
- attention heads: `16`
- vocabulary size: `256000`
- context length: `8192`
- tensor dtype in config: `bfloat16`

## Intended uses and limitations

Aitana-2B-S-base-IP-1.0 is a base model that can be used for causal language
modeling and text generation. As with other base checkpoints, it is generally more
useful as a starting point for instruction-tuning, domain adaptation, or downstream
fine-tuning than as a final end-user assistant model.

Because this repository currently only exposes the model artifacts and not the full
training report, claims about domain coverage, language balance, safety behavior, and
 benchmark performance should be added only once they are confirmed by the model
authors.

## How to use

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "gplsi/Aitana-2B-S-base-IP-1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "Escriu un breu resum sobre la importància de la llengua."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training

### Base model

TO-DO: document the original parent checkpoint or initialization source for
Aitana-2B-S-base-IP-1.0.

### Training data

TO-DO: document the training corpora, language distribution, preprocessing steps,
deduplication policy, anonymization steps, and data filtering criteria.

### Training hyperparameters

TO-DO: document the effective batch size, learning rate schedule, optimizer setup,
number of epochs or tokens seen, sequence length used during training, and hardware.

## Technical specifications

### Model architecture and objective

- architecture: decoder-only causal language model
- implementation class: `LlamaForCausalLM`
- hidden size: `2048`
- intermediate size: `5440`
- layers: `24`
- attention heads: `16`
- key/value heads: `16`
- maximum position embeddings: `8192`
- vocabulary size: `256000`
- BOS token id: `1`
- EOS token id: `2`
- PAD token id: `3`

### Tokenizer

The tokenizer files in this repository define:

- BOS token: `<s>`
- EOS token: `</s>`
- PAD token: `<pad>`
- UNK token: `<unk>`

### Hardware and software

The repository is packaged for the Hugging Face `transformers` library.
Specific training hardware and training software details should be documented by the
model authors if they are intended to be part of the public model card.

## Additional information

### Author

TO-DO: confirm the author list and institutional attribution to be displayed in the
public model card.

### Contact

TO-DO: add a contact email or project contact point.

### License

TO-DO: confirm the license for this checkpoint and add it both here and in
`config.json` if desired.

### Funding

TO-DO: add funding information if this checkpoint is part of a funded project.

### Disclaimer

This repository contains a base language model checkpoint. Base models can reflect
biases present in their training data and may generate inaccurate, misleading, or
unsafe content. Anyone deploying this model, or systems built on top of it, is
responsible for evaluating those risks and ensuring compliance with applicable legal,
ethical, and operational requirements.