library_name, tags, license
library_name tags license
transformers
llama
causal-lm
text-generation
nanollama
apache-2.0

NanoLlama (Public)

A compact Llama-based language model optimized for efficient inference and deployment. This is the public version with open access.

Model Details

Model Description

NanoLlama is a small-scale language model based on the Llama architecture, designed for lightweight applications and resource-constrained environments. This model provides a good balance between performance and computational efficiency.

  • Developed by: svc-nai-cci
  • Model type: Causal Language Model
  • Language(s): English
  • License: Apache 2.0
  • Finetuned from: Llama architecture
  • Access: Public (Open Access)

Model Architecture

  • Architecture: LlamaForCausalLM
  • Hidden Size: 4096
  • Number of Layers: 4
  • Number of Attention Heads: 4
  • Number of Key-Value Heads: 2
  • Vocabulary Size: 32000
  • Max Position Embeddings: 4096
  • Hidden Activation: SiLU

Uses

Direct Use

This model can be used for:

  • Text generation
  • Conversational AI
  • Code completion
  • Creative writing
  • Question answering

Downstream Use

The model can be fine-tuned for specific tasks such as:

  • Domain-specific text generation
  • Task-specific instruction following
  • Specialized conversational agents

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
model_name = "svc-nai-cci/nanollama-public"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100, temperature=0.7)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Technical Specifications

Model Architecture and Objective

The model uses the standard Llama architecture with:

  • RMSNorm for layer normalization
  • RoPE (Rotary Position Embedding) for positional encoding
  • SwiGLU activation function
  • Grouped Query Attention (GQA)

Performance Characteristics

  • Model Size: Compact design for efficient deployment
  • Memory Requirements: Optimized for low-memory environments
  • Inference Speed: Fast inference suitable for real-time applications

Limitations

  • Limited context length (4096 tokens)
  • May not perform as well as larger models on complex reasoning tasks
  • Primarily trained/fine-tuned for English text

Citation

If you use this model, please cite:

@misc{nanollama2024,
  title={NanoLlama: A Compact Llama-based Language Model},
  author={svc-nai-cci},
  year={2024},
  url={https://huggingface.co/svc-nai-cci/nanollama-public}
}

Contact

For questions or issues, please contact: svc-nai-cci

Description
Model synced from source: svc-nai-cci/nanollama-public
Readme 656 KiB
Languages
Jinja 100%