--- language: en license: apache-2.0 library_name: transformers tags: - tinyllama - trl - merged - lora - fine-tuned - pytorch - causal-lm - text-generation - conversational base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 datasets: - arif-butt/arifbutt_dataset pipeline_tag: text-generation inference: false --- # 🦙 TinyLlama TRL Merged - Complete Fine-tuned Model ## 📋 Model Overview This is a **fully merged and standalone model** of TinyLlama (1.1B parameters) fine-tuned using **TRL (Transformer Reinforcement Learning)** framework with LoRA adapters. The LoRA weights have been permanently merged into the base model, creating a single complete model that can be loaded without any adapter libraries. ### Key Features | Feature | Description | |---------|-------------| | **Standalone** | No PEFT library required — single model file | | **Fine-tuned** | Custom trained on educational Q&A dataset | | **Optimized** | FP16 precision for memory efficiency | | **Production Ready** | Single folder deployment | | **Chat Optimized** | Fine-tuned for conversational responses | ### Model Architecture | Component | Specification | |-----------|---------------| | **Base Model** | TinyLlama-1.1B-Chat-v1.0 | | **Architecture** | Llama-based transformer decoder | | **Total Parameters** | 1,100,000,000 (1.1B) | | **Context Length** | 2048 tokens | | **Hidden Size** | 2048 | | **Intermediate Size** | 5632 | | **Number of Layers** | 22 | | **Number of Attention Heads** | 32 | | **Number of Key/Value Heads** | 4 (GQA) | | **Head Dimension** | 64 | | **Activation Function** | SwiGLU | | **Positional Encoding** | RoPE (Rotary Position Embedding) | | **Attention Mechanism** | Grouped-Query Attention (GQA) | | **Precision** | FP16 (float16) | ## 🚀 Usage Guide ### Installation ```bash pip install transformers torch accelerate Method 1: Direct Transformers Loading from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_id = "arif-butt/tinyllama-trl-merged" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True, ) model.eval() # Define prompt template prompt = "Q: What is machine learning?\nA:" # Tokenize inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # Generate with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=150, temperature=0.7, top_p=0.95, do_sample=True, repetition_penalty=1.1, pad_token_id=tokenizer.eos_token_id, ) # Decode and print response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f"Prompt: {prompt}") print(f"Response: {response[len(prompt):].strip()}") Method 2: Pipeline for Simple Inference from transformers import pipeline pipe = pipeline( "text-generation", model="arif-butt/tinyllama-trl-merged", torch_dtype=torch.float16, device_map="auto", ) prompt = "Q: Explain neural networks in simple terms\nA:" result = pipe(prompt, max_new_tokens=150, temperature=0.7, do_sample=True) print(result[0]["generated_text"])