--- library_name: transformers tags: - llama - causal-lm - text-generation - nanollama license: apache-2.0 --- # NanoLlama (Public) A compact Llama-based language model optimized for efficient inference and deployment. This is the **public** version with open access. ## Model Details ### Model Description NanoLlama is a small-scale language model based on the Llama architecture, designed for lightweight applications and resource-constrained environments. This model provides a good balance between performance and computational efficiency. - **Developed by:** svc-nai-cci - **Model type:** Causal Language Model - **Language(s):** English - **License:** Apache 2.0 - **Finetuned from:** Llama architecture - **Access:** Public (Open Access) ### Model Architecture - **Architecture:** LlamaForCausalLM - **Hidden Size:** 4096 - **Number of Layers:** 4 - **Number of Attention Heads:** 4 - **Number of Key-Value Heads:** 2 - **Vocabulary Size:** 32000 - **Max Position Embeddings:** 4096 - **Hidden Activation:** SiLU ## Uses ### Direct Use This model can be used for: - Text generation - Conversational AI - Code completion - Creative writing - Question answering ### Downstream Use The model can be fine-tuned for specific tasks such as: - Domain-specific text generation - Task-specific instruction following - Specialized conversational agents ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Load the model and tokenizer model_name = "svc-nai-cci/nanollama-public" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Generate text input_text = "Hello, how are you?" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs, max_length=100, temperature=0.7) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) ``` ## Technical Specifications ### Model Architecture and Objective The model uses the standard Llama architecture with: - RMSNorm for layer normalization - RoPE (Rotary Position Embedding) for positional encoding - SwiGLU activation function - Grouped Query Attention (GQA) ### Performance Characteristics - **Model Size:** Compact design for efficient deployment - **Memory Requirements:** Optimized for low-memory environments - **Inference Speed:** Fast inference suitable for real-time applications ## Limitations - Limited context length (4096 tokens) - May not perform as well as larger models on complex reasoning tasks - Primarily trained/fine-tuned for English text ## Citation If you use this model, please cite: ```bibtex @misc{nanollama2024, title={NanoLlama: A Compact Llama-based Language Model}, author={svc-nai-cci}, year={2024}, url={https://huggingface.co/svc-nai-cci/nanollama-public} } ``` ## Contact For questions or issues, please contact: svc-nai-cci