--- license: mit datasets: - roneneldan/TinyStories language: - en pipeline_tag: text-generation tags: - text-generation-inference new_version: GODELEV/Test-1-4000 --- # Test-1-3000 — A 190M Parameter Narrative Intelligence Engine
    
--- # Overview **Test-1-3000** is a compact yet remarkably capable decoder-only Transformer language model built upon the modern **Llama architecture**. The project explores an important question in language model research: > *How much narrative reasoning, coherence, and world understanding can emerge inside a small model when trained correctly?* Despite containing only **190.55 million parameters**, Test-1-3000 demonstrates surprisingly advanced: - Narrative continuity - Character persistence - Long-range memory consistency - Emotional progression - Logical event sequencing - Contextual storytelling stability The model was trained specifically for **short-form narrative intelligence**, focusing on coherent storytelling rather than broad internet-scale memorization. Unlike many small models that generate fragmented or repetitive text, Test-1-3000 learns to maintain: - causal relationships, - stable story worlds, - emotional trajectories, - and meaningful resolutions across long contexts. --- # Key Highlights | Feature | Description | |---|---| | Architecture | Llama-based Decoder-only Transformer | | Parameters | 190.55 Million | | Context Length | 2048 Tokens | | Final Training Step | 3000 | | Final Training Loss | **0.8516** | | Attention Optimization | Flash Attention 2 | | Compilation | `torch.compile` | | Precision | bfloat16 Mixed Precision | | Positional Encoding | Rotary Positional Embeddings (RoPE) | --- #What Makes Test-1-3000 Special? Most compact language models struggle with: - maintaining consistency, - remembering earlier events, - resolving story arcs, - and avoiding repetition. Test-1-3000 was trained with a different objective philosophy: ## Narrative Intelligence First Instead of optimizing for broad factual memorization, the model focuses on: - temporal continuity, - event causality, - emotional logic, - and narrative closure. This creates a surprisingly stable storytelling engine capable of generating coherent multi-paragraph narratives with strong thematic flow. --- # Model Architecture Test-1-3000 follows a modern efficient Transformer design optimized for both: - training stability, - and inference throughput. The architecture borrows heavily from the proven Llama design philosophy while remaining lightweight enough for experimentation and rapid iteration. --- # Technical Specifications | Feature | Specification | |---|---| | Model Type | Decoder-only Transformer | | Hidden Dimension | 768 | | Layers (Depth) | 12 | | Attention Heads | 12 | | Intermediate Size | 3072 | | Activation Function | SwiGLU | | Normalization | RMSNorm | | Vocabulary Size | 50,257 | | Tokenizer | GPT-2 Tokenizer | | Context Window | 2048 Tokens | | Precision | bfloat16 | | Attention Backend | Flash Attention 2 | --- # Positional Understanding with RoPE Test-1-3000 uses **Rotary Positional Embeddings (RoPE)** to maintain precise token relationship awareness throughout long contexts. This allows the model to: - track entities across paragraphs, - preserve story continuity, - maintain dialogue references, - and understand long-range dependencies efficiently. For a model of this scale, the 2048-token context window provides unusually strong narrative memory. --- #The Evolution of Learning Training Test-1-3000 revealed clear emergent phases of cognitive development. The model did not merely memorize text patterns — it progressively developed increasingly sophisticated representations of narrative structure and world dynamics. --- #The Lexical Phase ## *(Steps 0 → 250)* At the beginning of training, the model learned the statistical foundations of language. It discovered: - common sentence structures, - punctuation behavior, - frequent vocabulary patterns, - and story-opening syntax. During this phase, phrases such as: > "Once upon a time" became strong narrative anchors. The model began constructing basic grammatical fluency but still lacked deeper logical understanding. ### Characteristics - High repetition - Weak memory - Poor event continuity - Basic syntax acquisition --- # The Relational Phase ## *(Steps 250 → 1000)* The model started connecting concepts together into meaningful relationships. It learned: - object interactions, - spatial reasoning, - basic causality, - and action consistency. For example: - parks imply trees and playing, - rain implies umbrellas or wetness, - sadness often precedes comfort or resolution. The training loss rapidly decreased below **1.5**, signaling major improvements in structural reasoning. ### Emergent Behaviors - Scene consistency - Character-action alignment - Basic emotional logic - Improved descriptive continuity --- # The Coherence Phase ## *(Steps 1000 → 2000)* This phase marked the emergence of true narrative stabilization. The model learned: - story pacing, - setup/payoff relationships, - conflict resolution, - and multi-sentence thematic continuity. Stories no longer collapsed into unrelated fragments. Instead, the model began maintaining: - stable goals, - emotional arcs, - and logical conclusions. If a story introduced a problem: > "Lily was lonely." the model increasingly learned to produce meaningful emotional resolutions later in the text. ### Major Improvements - Long-range memory - Reduced contradiction - Better endings - Stronger narrative flow - Lower hallucination frequency Final loss at this stage: | Step | Loss | |---|---| | 2000 | **1.27** | --- # The Emergent Narrative Intelligence Phase ## *(Steps 2000 → 3000)* This final stage represented a major leap in generative sophistication. Rather than simply maintaining coherence, the model began exhibiting signs of: - implicit world modeling, - narrative anticipation, - emotional persistence, - and latent planning behavior. The model increasingly understood that stories possess: - momentum, - consequences, - emotional gravity, - and thematic closure. Characters began behaving more consistently across long contexts. Events earlier in stories influenced future generations more reliably. The model also became significantly better at: - avoiding repetitive loops, - maintaining tone, - preserving narrative identity, - and generating cleaner transitions between scenes. ### Emergent Capabilities - Multi-event causal chaining - Persistent emotional tone - Improved dialogue continuity - Better conflict resolution - Reduced topic drift - More natural pacing - Stronger thematic stability Most importantly: > The model began generating stories that feel intentionally written rather than statistically assembled. --- #Final Training Statistics | Metric | Value | |---|---| | Final Step | 3000 | | Final Loss | **0.8516** | | Training Stability | Excellent | | Gradient Behavior | Stable | | Divergence Events | None Observed | --- # Training Configuration ## Hyperparameters | Parameter | Value | |---|---| | Optimizer | AdamW | | Betas | β₁=0.9, β₂=0.95 | | Learning Rate | 5e-4 | | Scheduler | OneCycleLR | | Weight Decay | 0.01 | | Precision | bfloat16 | | Compilation | torch.compile | | Attention Optimization | Flash Attention 2 | | Effective Batch Size | ~262,144 Tokens / Step | --- # Dataset ## TinyStories (2M) Test-1-3000 was trained on the **TinyStories** dataset. TinyStories is uniquely valuable because it isolates: - narrative structure, - reasoning, - consistency, - and causality without the overwhelming informational noise of the open web. The stories use: - child-level vocabulary, - but professionally structured narrative composition. This creates an ideal environment for studying emergent reasoning inside small language models. --- # Training Philosophy The project intentionally prioritizes: - coherence over memorization, - reasoning over factual retrieval, - and narrative intelligence over benchmark chasing. The goal is not merely to create a chatbot. The goal is to study: > how structured cognition emerges inside compact neural systems. --- #Usage — Quick Start Install dependencies: ```bash pip install transformers torch accelerate ``` --- ## Inference Example ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_path = "GODELEV/Test-1-3000" # Load Tokenizer and Model tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained( model_path, torch_dtype=torch.bfloat16, device_map="auto" ) # Prompt prompt = "Once upon a time, Tom found a blue car." inputs = tokenizer( prompt, return_tensors="pt" ).to(model.device) # Generate output = model.generate( **inputs, max_new_tokens=200, temperature=0.7, top_p=0.9, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id ) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` --- # Recommended Generation Settings | Parameter | Recommended | |---|---| | Temperature | 0.7 | | Top-p | 0.9 | | Repetition Penalty | 1.1 | | Max Tokens | 128–512 | | Sampling | Enabled | --- # Observed Emergent Behaviors During evaluation, the model demonstrated: - Character persistence - Goal-oriented progression - Emotional continuity - Environmental consistency - Contextual callbacks - Story resolution awareness These behaviors are especially notable given the model's relatively small parameter count. --- # Limitations Although highly capable for its size, Test-1-3000 still has limitations: - Limited factual world knowledge - Occasional repetition in very long generations - Reduced reasoning performance outside storytelling domains - Less stable beyond trained narrative styles The model is optimized specifically for: > coherent short-form storytelling. --- `` --- # 📜 Citation ```bibtex @misc{test13000, title={Test-1-3000: A 190M Parameter Narrative Intelligence Engine}, author={GODELEV}, year={2026}, note={Compact narrative-focused language model trained on TinyStories} } ``` --- # License This project is intended for: - research, - experimentation, - educational use, - and open exploration of compact language models. --- # Final Thoughts Test-1-3000 demonstrates that meaningful narrative intelligence can emerge inside surprisingly small neural systems when training is focused, clean, and structurally optimized. At only **190M parameters**, the model exhibits behaviors often associated with significantly larger systems: - narrative planning, - emotional continuity, - causal consistency, - and coherent resolution generation. The project serves as both: - a practical storytelling model, - and an experiment in emergent cognition within compact architectures. ---### “Small models are not weak models. ### They are compressed intelligence waiting to emerge.”
````