初始化项目，由ModelHub XC社区提供模型

Model: GODELEV/Test-1-3000 Source: Original Platform
2026-06-09 10:06:17 +08:00
commit cd2ba5aef3
7 changed files with 250906 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,508 @@
 ---
 license: mit
 datasets:
 - roneneldan/TinyStories
 language:
 - en
 pipeline_tag: text-generation
 tags:
 - text-generation-inference
 new_version: GODELEV/Test-1-4000
 ---
 # Test-1-3000 — A 190M Parameter Narrative Intelligence Engine
 <p align="center">
 ![Architecture](https://img.shields.io/badge/Architecture-Llama-blue)
 ![Parameters](https://img.shields.io/badge/Parameters-190M-green)
 ![Context](https://img.shields.io/badge/Context-2048-orange)
 ![Framework](https://img.shields.io/badge/Framework-PyTorch-red)
 ![Training](https://img.shields.io/badge/Training-Step_3000-purple)
 </p>
 ---
 # Overview
 **Test-1-3000** is a compact yet remarkably capable decoder-only Transformer language model built upon the modern **Llama architecture**.  
 The project explores an important question in language model research:
 > *How much narrative reasoning, coherence, and world understanding can emerge inside a small model when trained correctly?*
 Despite containing only **190.55 million parameters**, Test-1-3000 demonstrates surprisingly advanced:
 - Narrative continuity
 - Character persistence
 - Long-range memory consistency
 - Emotional progression
 - Logical event sequencing
 - Contextual storytelling stability
 The model was trained specifically for **short-form narrative intelligence**, focusing on coherent storytelling rather than broad internet-scale memorization.
 Unlike many small models that generate fragmented or repetitive text, Test-1-3000 learns to maintain:
 - causal relationships,
 - stable story worlds,
 - emotional trajectories,
 - and meaningful resolutions across long contexts.
 ---
 # Key Highlights
 | Feature | Description |
 |---|---|
 | Architecture | Llama-based Decoder-only Transformer |
 | Parameters | 190.55 Million |
 | Context Length | 2048 Tokens |
 | Final Training Step | 3000 |
 | Final Training Loss | **0.8516** |
 | Attention Optimization | Flash Attention 2 |
 | Compilation | `torch.compile` |
 | Precision | bfloat16 Mixed Precision |
 | Positional Encoding | Rotary Positional Embeddings (RoPE) |
 ---
 #What Makes Test-1-3000 Special?
 Most compact language models struggle with:
 - maintaining consistency,
 - remembering earlier events,
 - resolving story arcs,
 - and avoiding repetition.
 Test-1-3000 was trained with a different objective philosophy:
 ## Narrative Intelligence First
 Instead of optimizing for broad factual memorization, the model focuses on:
 - temporal continuity,
 - event causality,
 - emotional logic,
 - and narrative closure.
 This creates a surprisingly stable storytelling engine capable of generating coherent multi-paragraph narratives with strong thematic flow.
 ---
 # Model Architecture
 Test-1-3000 follows a modern efficient Transformer design optimized for both:
 - training stability,
 - and inference throughput.
 The architecture borrows heavily from the proven Llama design philosophy while remaining lightweight enough for experimentation and rapid iteration.
 ---
 # Technical Specifications
 | Feature | Specification |
 |---|---|
 | Model Type | Decoder-only Transformer |
 | Hidden Dimension | 768 |
 | Layers (Depth) | 12 |
 | Attention Heads | 12 |
 | Intermediate Size | 3072 |
 | Activation Function | SwiGLU |
 | Normalization | RMSNorm |
 | Vocabulary Size | 50,257 |
 | Tokenizer | GPT-2 Tokenizer |
 | Context Window | 2048 Tokens |
 | Precision | bfloat16 |
 | Attention Backend | Flash Attention 2 |
 ---
 # Positional Understanding with RoPE
 Test-1-3000 uses **Rotary Positional Embeddings (RoPE)** to maintain precise token relationship awareness throughout long contexts.
 This allows the model to:
 - track entities across paragraphs,
 - preserve story continuity,
 - maintain dialogue references,
 - and understand long-range dependencies efficiently.
 For a model of this scale, the 2048-token context window provides unusually strong narrative memory.
 ---
 #The Evolution of Learning
 Training Test-1-3000 revealed clear emergent phases of cognitive development.
 The model did not merely memorize text patterns — it progressively developed increasingly sophisticated representations of narrative structure and world dynamics.
 ---
 #The Lexical Phase  
 ## *(Steps 0 → 250)*
 At the beginning of training, the model learned the statistical foundations of language.
 It discovered:
 - common sentence structures,
 - punctuation behavior,
 - frequent vocabulary patterns,
 - and story-opening syntax.
 During this phase, phrases such as:
 > "Once upon a time"
 became strong narrative anchors.
 The model began constructing basic grammatical fluency but still lacked deeper logical understanding.
 ### Characteristics
 - High repetition
 - Weak memory
 - Poor event continuity
 - Basic syntax acquisition
 ---
 # The Relational Phase  
 ## *(Steps 250 → 1000)*
 The model started connecting concepts together into meaningful relationships.
 It learned:
 - object interactions,
 - spatial reasoning,
 - basic causality,
 - and action consistency.
 For example:
 - parks imply trees and playing,
 - rain implies umbrellas or wetness,
 - sadness often precedes comfort or resolution.
 The training loss rapidly decreased below **1.5**, signaling major improvements in structural reasoning.
 ### Emergent Behaviors
 - Scene consistency
 - Character-action alignment
 - Basic emotional logic
 - Improved descriptive continuity
 ---
 # The Coherence Phase  
 ## *(Steps 1000 → 2000)*
 This phase marked the emergence of true narrative stabilization.
 The model learned:
 - story pacing,
 - setup/payoff relationships,
 - conflict resolution,
 - and multi-sentence thematic continuity.
 Stories no longer collapsed into unrelated fragments.
 Instead, the model began maintaining:
 - stable goals,
 - emotional arcs,
 - and logical conclusions.
 If a story introduced a problem:
 > "Lily was lonely."
 the model increasingly learned to produce meaningful emotional resolutions later in the text.
 ### Major Improvements
 - Long-range memory
 - Reduced contradiction
 - Better endings
 - Stronger narrative flow
 - Lower hallucination frequency
 Final loss at this stage:
 | Step | Loss |
 |---|---|
 | 2000 | **1.27** |
 ---
 # The Emergent Narrative Intelligence Phase  
 ## *(Steps 2000 → 3000)*
 This final stage represented a major leap in generative sophistication.
 Rather than simply maintaining coherence, the model began exhibiting signs of:
 - implicit world modeling,
 - narrative anticipation,
 - emotional persistence,
 - and latent planning behavior.
 The model increasingly understood that stories possess:
 - momentum,
 - consequences,
 - emotional gravity,
 - and thematic closure.
 Characters began behaving more consistently across long contexts.
 Events earlier in stories influenced future generations more reliably.
 The model also became significantly better at:
 - avoiding repetitive loops,
 - maintaining tone,
 - preserving narrative identity,
 - and generating cleaner transitions between scenes.
 ### Emergent Capabilities
 - Multi-event causal chaining
 - Persistent emotional tone
 - Improved dialogue continuity
 - Better conflict resolution
 - Reduced topic drift
 - More natural pacing
 - Stronger thematic stability
 Most importantly:
 > The model began generating stories that feel intentionally written rather than statistically assembled.
 ---
 #Final Training Statistics
 | Metric | Value |
 |---|---|
 | Final Step | 3000 |
 | Final Loss | **0.8516** |
 | Training Stability | Excellent |
 | Gradient Behavior | Stable |
 | Divergence Events | None Observed |
 ---
 # Training Configuration
 ## Hyperparameters
 | Parameter | Value |
 |---|---|
 | Optimizer | AdamW |
 | Betas | β₁=0.9, β₂=0.95 |
 | Learning Rate | 5e-4 |
 | Scheduler | OneCycleLR |
 | Weight Decay | 0.01 |
 | Precision | bfloat16 |
 | Compilation | torch.compile |
 | Attention Optimization | Flash Attention 2 |
 | Effective Batch Size | ~262,144 Tokens / Step |
 ---
 # Dataset
 ## TinyStories (2M)
 Test-1-3000 was trained on the **TinyStories** dataset.
 TinyStories is uniquely valuable because it isolates:
 - narrative structure,
 - reasoning,
 - consistency,
 - and causality
 without the overwhelming informational noise of the open web.
 The stories use:
 - child-level vocabulary,
 - but professionally structured narrative composition.
 This creates an ideal environment for studying emergent reasoning inside small language models.
 ---
 # Training Philosophy
 The project intentionally prioritizes:
 - coherence over memorization,
 - reasoning over factual retrieval,
 - and narrative intelligence over benchmark chasing.
 The goal is not merely to create a chatbot.
 The goal is to study:
 > how structured cognition emerges inside compact neural systems.
 ---
 #Usage — Quick Start
 Install dependencies:
 ```bash
 pip install transformers torch accelerate
 ```
 ---
 ## Inference Example
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_path = "GODELEV/Test-1-3000"
 # Load Tokenizer and Model
 tokenizer = AutoTokenizer.from_pretrained(model_path)
 model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
 )
 # Prompt
 prompt = "Once upon a time, Tom found a blue car."
 inputs = tokenizer(
    prompt,
    return_tensors="pt"
 ).to(model.device)
 # Generate
 output = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id
 )
 print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
 ---
 # Recommended Generation Settings
 | Parameter | Recommended |
 |---|---|
 | Temperature | 0.7 |
 | Top-p | 0.9 |
 | Repetition Penalty | 1.1 |
 | Max Tokens | 128–512 |
 | Sampling | Enabled |
 ---
 # Observed Emergent Behaviors
 During evaluation, the model demonstrated:
 - Character persistence
 - Goal-oriented progression
 - Emotional continuity
 - Environmental consistency
 - Contextual callbacks
 - Story resolution awareness
 These behaviors are especially notable given the model's relatively small parameter count.
 ---
 # Limitations
 Although highly capable for its size, Test-1-3000 still has limitations:
 - Limited factual world knowledge
 - Occasional repetition in very long generations
 - Reduced reasoning performance outside storytelling domains
 - Less stable beyond trained narrative styles
 The model is optimized specifically for:
 > coherent short-form storytelling.
 ---
 ``
 ---
 # 📜 Citation
 ```bibtex
@misc{test13000,
  title={Test-1-3000: A 190M Parameter Narrative Intelligence Engine},
  author={GODELEV},
  year={2026},
  note={Compact narrative-focused language model trained on TinyStories}
 }
 ```
 ---
 # License
 This project is intended for:
 - research,
 - experimentation,
 - educational use,
 - and open exploration of compact language models.
 ---
 # Final Thoughts
 Test-1-3000 demonstrates that meaningful narrative intelligence can emerge inside surprisingly small neural systems when training is focused, clean, and structurally optimized.
 At only **190M parameters**, the model exhibits behaviors often associated with significantly larger systems:
 - narrative planning,
 - emotional continuity,
 - causal consistency,
 - and coherent resolution generation.
 The project serves as both:
 - a practical storytelling model,
 - and an experiment in emergent cognition within compact architectures.
 ---
 <p align="center">
 ### “Small models are not weak models.  
 ### They are compressed intelligence waiting to emerge.”
 </p>
 ````
--- a/config.json
+++ b/config.json
@@ -0,0 +1,32 @@
 {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "dtype": "float32",
  "eos_token_id": 2,
  "head_dim": 64,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 2432,
  "max_position_embeddings": 2048,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 14,
  "num_hidden_layers": 12,
  "num_key_value_heads": 2,
  "pad_token_id": null,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_parameters": {
    "rope_theta": 10000.0,
    "rope_type": "default"
  },
  "tie_word_embeddings": false,
  "transformers_version": "5.6.2",
  "use_cache": false,
  "vocab_size": 50257
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,9 @@
 {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "output_attentions": false,
  "output_hidden_states": false,
  "transformers_version": "5.6.2",
  "use_cache": false
 }
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e1dcdfb41c634038eea67bfbd7c01de8fea575aee2432c89b8eb23cd6ea3d817
 size 762210848
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,13 @@
 {
  "add_prefix_space": false,
  "backend": "tokenizers",
  "bos_token": "<|endoftext|>",
  "eos_token": "<|endoftext|>",
  "errors": "replace",
  "is_local": false,
  "local_files_only": false,
  "model_max_length": 2048,
  "pad_token": "<|endoftext|>",
  "tokenizer_class": "GPT2Tokenizer",
  "unk_token": "<|endoftext|>"
 }