初始化项目,由ModelHub XC社区提供模型
Model: GODELEV/Test-1-3000 Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
508
README.md
Normal file
508
README.md
Normal file
@@ -0,0 +1,508 @@
|
|||||||
|
---
|
||||||
|
license: mit
|
||||||
|
datasets:
|
||||||
|
- roneneldan/TinyStories
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
tags:
|
||||||
|
- text-generation-inference
|
||||||
|
new_version: GODELEV/Test-1-4000
|
||||||
|
---
|
||||||
|
# Test-1-3000 — A 190M Parameter Narrative Intelligence Engine
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
|
||||||
|

|
||||||
|

|
||||||
|

|
||||||
|

|
||||||
|

|
||||||
|
|
||||||
|
</p>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Overview
|
||||||
|
|
||||||
|
**Test-1-3000** is a compact yet remarkably capable decoder-only Transformer language model built upon the modern **Llama architecture**.
|
||||||
|
|
||||||
|
The project explores an important question in language model research:
|
||||||
|
|
||||||
|
> *How much narrative reasoning, coherence, and world understanding can emerge inside a small model when trained correctly?*
|
||||||
|
|
||||||
|
Despite containing only **190.55 million parameters**, Test-1-3000 demonstrates surprisingly advanced:
|
||||||
|
|
||||||
|
- Narrative continuity
|
||||||
|
- Character persistence
|
||||||
|
- Long-range memory consistency
|
||||||
|
- Emotional progression
|
||||||
|
- Logical event sequencing
|
||||||
|
- Contextual storytelling stability
|
||||||
|
|
||||||
|
The model was trained specifically for **short-form narrative intelligence**, focusing on coherent storytelling rather than broad internet-scale memorization.
|
||||||
|
|
||||||
|
Unlike many small models that generate fragmented or repetitive text, Test-1-3000 learns to maintain:
|
||||||
|
|
||||||
|
- causal relationships,
|
||||||
|
- stable story worlds,
|
||||||
|
- emotional trajectories,
|
||||||
|
- and meaningful resolutions across long contexts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Key Highlights
|
||||||
|
|
||||||
|
| Feature | Description |
|
||||||
|
|---|---|
|
||||||
|
| Architecture | Llama-based Decoder-only Transformer |
|
||||||
|
| Parameters | 190.55 Million |
|
||||||
|
| Context Length | 2048 Tokens |
|
||||||
|
| Final Training Step | 3000 |
|
||||||
|
| Final Training Loss | **0.8516** |
|
||||||
|
| Attention Optimization | Flash Attention 2 |
|
||||||
|
| Compilation | `torch.compile` |
|
||||||
|
| Precision | bfloat16 Mixed Precision |
|
||||||
|
| Positional Encoding | Rotary Positional Embeddings (RoPE) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#What Makes Test-1-3000 Special?
|
||||||
|
|
||||||
|
Most compact language models struggle with:
|
||||||
|
|
||||||
|
- maintaining consistency,
|
||||||
|
- remembering earlier events,
|
||||||
|
- resolving story arcs,
|
||||||
|
- and avoiding repetition.
|
||||||
|
|
||||||
|
Test-1-3000 was trained with a different objective philosophy:
|
||||||
|
|
||||||
|
## Narrative Intelligence First
|
||||||
|
|
||||||
|
Instead of optimizing for broad factual memorization, the model focuses on:
|
||||||
|
|
||||||
|
- temporal continuity,
|
||||||
|
- event causality,
|
||||||
|
- emotional logic,
|
||||||
|
- and narrative closure.
|
||||||
|
|
||||||
|
This creates a surprisingly stable storytelling engine capable of generating coherent multi-paragraph narratives with strong thematic flow.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Model Architecture
|
||||||
|
|
||||||
|
Test-1-3000 follows a modern efficient Transformer design optimized for both:
|
||||||
|
|
||||||
|
- training stability,
|
||||||
|
- and inference throughput.
|
||||||
|
|
||||||
|
The architecture borrows heavily from the proven Llama design philosophy while remaining lightweight enough for experimentation and rapid iteration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Technical Specifications
|
||||||
|
|
||||||
|
| Feature | Specification |
|
||||||
|
|---|---|
|
||||||
|
| Model Type | Decoder-only Transformer |
|
||||||
|
| Hidden Dimension | 768 |
|
||||||
|
| Layers (Depth) | 12 |
|
||||||
|
| Attention Heads | 12 |
|
||||||
|
| Intermediate Size | 3072 |
|
||||||
|
| Activation Function | SwiGLU |
|
||||||
|
| Normalization | RMSNorm |
|
||||||
|
| Vocabulary Size | 50,257 |
|
||||||
|
| Tokenizer | GPT-2 Tokenizer |
|
||||||
|
| Context Window | 2048 Tokens |
|
||||||
|
| Precision | bfloat16 |
|
||||||
|
| Attention Backend | Flash Attention 2 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Positional Understanding with RoPE
|
||||||
|
|
||||||
|
Test-1-3000 uses **Rotary Positional Embeddings (RoPE)** to maintain precise token relationship awareness throughout long contexts.
|
||||||
|
|
||||||
|
This allows the model to:
|
||||||
|
|
||||||
|
- track entities across paragraphs,
|
||||||
|
- preserve story continuity,
|
||||||
|
- maintain dialogue references,
|
||||||
|
- and understand long-range dependencies efficiently.
|
||||||
|
|
||||||
|
For a model of this scale, the 2048-token context window provides unusually strong narrative memory.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#The Evolution of Learning
|
||||||
|
|
||||||
|
Training Test-1-3000 revealed clear emergent phases of cognitive development.
|
||||||
|
|
||||||
|
The model did not merely memorize text patterns — it progressively developed increasingly sophisticated representations of narrative structure and world dynamics.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#The Lexical Phase
|
||||||
|
## *(Steps 0 → 250)*
|
||||||
|
|
||||||
|
At the beginning of training, the model learned the statistical foundations of language.
|
||||||
|
|
||||||
|
It discovered:
|
||||||
|
|
||||||
|
- common sentence structures,
|
||||||
|
- punctuation behavior,
|
||||||
|
- frequent vocabulary patterns,
|
||||||
|
- and story-opening syntax.
|
||||||
|
|
||||||
|
During this phase, phrases such as:
|
||||||
|
|
||||||
|
> "Once upon a time"
|
||||||
|
|
||||||
|
became strong narrative anchors.
|
||||||
|
|
||||||
|
The model began constructing basic grammatical fluency but still lacked deeper logical understanding.
|
||||||
|
|
||||||
|
### Characteristics
|
||||||
|
|
||||||
|
- High repetition
|
||||||
|
- Weak memory
|
||||||
|
- Poor event continuity
|
||||||
|
- Basic syntax acquisition
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# The Relational Phase
|
||||||
|
## *(Steps 250 → 1000)*
|
||||||
|
|
||||||
|
The model started connecting concepts together into meaningful relationships.
|
||||||
|
|
||||||
|
It learned:
|
||||||
|
|
||||||
|
- object interactions,
|
||||||
|
- spatial reasoning,
|
||||||
|
- basic causality,
|
||||||
|
- and action consistency.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
- parks imply trees and playing,
|
||||||
|
- rain implies umbrellas or wetness,
|
||||||
|
- sadness often precedes comfort or resolution.
|
||||||
|
|
||||||
|
The training loss rapidly decreased below **1.5**, signaling major improvements in structural reasoning.
|
||||||
|
|
||||||
|
### Emergent Behaviors
|
||||||
|
|
||||||
|
- Scene consistency
|
||||||
|
- Character-action alignment
|
||||||
|
- Basic emotional logic
|
||||||
|
- Improved descriptive continuity
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# The Coherence Phase
|
||||||
|
## *(Steps 1000 → 2000)*
|
||||||
|
|
||||||
|
This phase marked the emergence of true narrative stabilization.
|
||||||
|
|
||||||
|
The model learned:
|
||||||
|
|
||||||
|
- story pacing,
|
||||||
|
- setup/payoff relationships,
|
||||||
|
- conflict resolution,
|
||||||
|
- and multi-sentence thematic continuity.
|
||||||
|
|
||||||
|
Stories no longer collapsed into unrelated fragments.
|
||||||
|
|
||||||
|
Instead, the model began maintaining:
|
||||||
|
|
||||||
|
- stable goals,
|
||||||
|
- emotional arcs,
|
||||||
|
- and logical conclusions.
|
||||||
|
|
||||||
|
If a story introduced a problem:
|
||||||
|
|
||||||
|
> "Lily was lonely."
|
||||||
|
|
||||||
|
the model increasingly learned to produce meaningful emotional resolutions later in the text.
|
||||||
|
|
||||||
|
### Major Improvements
|
||||||
|
|
||||||
|
- Long-range memory
|
||||||
|
- Reduced contradiction
|
||||||
|
- Better endings
|
||||||
|
- Stronger narrative flow
|
||||||
|
- Lower hallucination frequency
|
||||||
|
|
||||||
|
Final loss at this stage:
|
||||||
|
|
||||||
|
| Step | Loss |
|
||||||
|
|---|---|
|
||||||
|
| 2000 | **1.27** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# The Emergent Narrative Intelligence Phase
|
||||||
|
## *(Steps 2000 → 3000)*
|
||||||
|
|
||||||
|
This final stage represented a major leap in generative sophistication.
|
||||||
|
|
||||||
|
Rather than simply maintaining coherence, the model began exhibiting signs of:
|
||||||
|
|
||||||
|
- implicit world modeling,
|
||||||
|
- narrative anticipation,
|
||||||
|
- emotional persistence,
|
||||||
|
- and latent planning behavior.
|
||||||
|
|
||||||
|
The model increasingly understood that stories possess:
|
||||||
|
|
||||||
|
- momentum,
|
||||||
|
- consequences,
|
||||||
|
- emotional gravity,
|
||||||
|
- and thematic closure.
|
||||||
|
|
||||||
|
Characters began behaving more consistently across long contexts.
|
||||||
|
|
||||||
|
Events earlier in stories influenced future generations more reliably.
|
||||||
|
|
||||||
|
The model also became significantly better at:
|
||||||
|
|
||||||
|
- avoiding repetitive loops,
|
||||||
|
- maintaining tone,
|
||||||
|
- preserving narrative identity,
|
||||||
|
- and generating cleaner transitions between scenes.
|
||||||
|
|
||||||
|
### Emergent Capabilities
|
||||||
|
|
||||||
|
- Multi-event causal chaining
|
||||||
|
- Persistent emotional tone
|
||||||
|
- Improved dialogue continuity
|
||||||
|
- Better conflict resolution
|
||||||
|
- Reduced topic drift
|
||||||
|
- More natural pacing
|
||||||
|
- Stronger thematic stability
|
||||||
|
|
||||||
|
Most importantly:
|
||||||
|
|
||||||
|
> The model began generating stories that feel intentionally written rather than statistically assembled.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#Final Training Statistics
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|---|---|
|
||||||
|
| Final Step | 3000 |
|
||||||
|
| Final Loss | **0.8516** |
|
||||||
|
| Training Stability | Excellent |
|
||||||
|
| Gradient Behavior | Stable |
|
||||||
|
| Divergence Events | None Observed |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Training Configuration
|
||||||
|
|
||||||
|
## Hyperparameters
|
||||||
|
|
||||||
|
| Parameter | Value |
|
||||||
|
|---|---|
|
||||||
|
| Optimizer | AdamW |
|
||||||
|
| Betas | β₁=0.9, β₂=0.95 |
|
||||||
|
| Learning Rate | 5e-4 |
|
||||||
|
| Scheduler | OneCycleLR |
|
||||||
|
| Weight Decay | 0.01 |
|
||||||
|
| Precision | bfloat16 |
|
||||||
|
| Compilation | torch.compile |
|
||||||
|
| Attention Optimization | Flash Attention 2 |
|
||||||
|
| Effective Batch Size | ~262,144 Tokens / Step |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Dataset
|
||||||
|
|
||||||
|
## TinyStories (2M)
|
||||||
|
|
||||||
|
Test-1-3000 was trained on the **TinyStories** dataset.
|
||||||
|
|
||||||
|
TinyStories is uniquely valuable because it isolates:
|
||||||
|
|
||||||
|
- narrative structure,
|
||||||
|
- reasoning,
|
||||||
|
- consistency,
|
||||||
|
- and causality
|
||||||
|
|
||||||
|
without the overwhelming informational noise of the open web.
|
||||||
|
|
||||||
|
The stories use:
|
||||||
|
|
||||||
|
- child-level vocabulary,
|
||||||
|
- but professionally structured narrative composition.
|
||||||
|
|
||||||
|
This creates an ideal environment for studying emergent reasoning inside small language models.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Training Philosophy
|
||||||
|
|
||||||
|
The project intentionally prioritizes:
|
||||||
|
|
||||||
|
- coherence over memorization,
|
||||||
|
- reasoning over factual retrieval,
|
||||||
|
- and narrative intelligence over benchmark chasing.
|
||||||
|
|
||||||
|
The goal is not merely to create a chatbot.
|
||||||
|
|
||||||
|
The goal is to study:
|
||||||
|
|
||||||
|
> how structured cognition emerges inside compact neural systems.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#Usage — Quick Start
|
||||||
|
|
||||||
|
Install dependencies:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install transformers torch accelerate
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Inference Example
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
|
||||||
|
model_path = "GODELEV/Test-1-3000"
|
||||||
|
|
||||||
|
# Load Tokenizer and Model
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
||||||
|
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_path,
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
device_map="auto"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Prompt
|
||||||
|
prompt = "Once upon a time, Tom found a blue car."
|
||||||
|
|
||||||
|
inputs = tokenizer(
|
||||||
|
prompt,
|
||||||
|
return_tensors="pt"
|
||||||
|
).to(model.device)
|
||||||
|
|
||||||
|
# Generate
|
||||||
|
output = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens=200,
|
||||||
|
temperature=0.7,
|
||||||
|
top_p=0.9,
|
||||||
|
repetition_penalty=1.1,
|
||||||
|
do_sample=True,
|
||||||
|
eos_token_id=tokenizer.eos_token_id,
|
||||||
|
pad_token_id=tokenizer.pad_token_id
|
||||||
|
)
|
||||||
|
|
||||||
|
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Recommended Generation Settings
|
||||||
|
|
||||||
|
| Parameter | Recommended |
|
||||||
|
|---|---|
|
||||||
|
| Temperature | 0.7 |
|
||||||
|
| Top-p | 0.9 |
|
||||||
|
| Repetition Penalty | 1.1 |
|
||||||
|
| Max Tokens | 128–512 |
|
||||||
|
| Sampling | Enabled |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Observed Emergent Behaviors
|
||||||
|
|
||||||
|
During evaluation, the model demonstrated:
|
||||||
|
|
||||||
|
- Character persistence
|
||||||
|
- Goal-oriented progression
|
||||||
|
- Emotional continuity
|
||||||
|
- Environmental consistency
|
||||||
|
- Contextual callbacks
|
||||||
|
- Story resolution awareness
|
||||||
|
|
||||||
|
These behaviors are especially notable given the model's relatively small parameter count.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Limitations
|
||||||
|
|
||||||
|
Although highly capable for its size, Test-1-3000 still has limitations:
|
||||||
|
|
||||||
|
- Limited factual world knowledge
|
||||||
|
- Occasional repetition in very long generations
|
||||||
|
- Reduced reasoning performance outside storytelling domains
|
||||||
|
- Less stable beyond trained narrative styles
|
||||||
|
|
||||||
|
The model is optimized specifically for:
|
||||||
|
|
||||||
|
> coherent short-form storytelling.
|
||||||
|
|
||||||
|
---
|
||||||
|
``
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 📜 Citation
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{test13000,
|
||||||
|
title={Test-1-3000: A 190M Parameter Narrative Intelligence Engine},
|
||||||
|
author={GODELEV},
|
||||||
|
year={2026},
|
||||||
|
note={Compact narrative-focused language model trained on TinyStories}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# License
|
||||||
|
|
||||||
|
This project is intended for:
|
||||||
|
|
||||||
|
- research,
|
||||||
|
- experimentation,
|
||||||
|
- educational use,
|
||||||
|
- and open exploration of compact language models.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Final Thoughts
|
||||||
|
|
||||||
|
Test-1-3000 demonstrates that meaningful narrative intelligence can emerge inside surprisingly small neural systems when training is focused, clean, and structurally optimized.
|
||||||
|
|
||||||
|
At only **190M parameters**, the model exhibits behaviors often associated with significantly larger systems:
|
||||||
|
|
||||||
|
- narrative planning,
|
||||||
|
- emotional continuity,
|
||||||
|
- causal consistency,
|
||||||
|
- and coherent resolution generation.
|
||||||
|
|
||||||
|
The project serves as both:
|
||||||
|
|
||||||
|
- a practical storytelling model,
|
||||||
|
- and an experiment in emergent cognition within compact architectures.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
|
||||||
|
### “Small models are not weak models.
|
||||||
|
### They are compressed intelligence waiting to emerge.”
|
||||||
|
|
||||||
|
</p>
|
||||||
|
````
|
||||||
32
config.json
Normal file
32
config.json
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"LlamaForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"dtype": "float32",
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"head_dim": 64,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 896,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 2432,
|
||||||
|
"max_position_embeddings": 2048,
|
||||||
|
"mlp_bias": false,
|
||||||
|
"model_type": "llama",
|
||||||
|
"num_attention_heads": 14,
|
||||||
|
"num_hidden_layers": 12,
|
||||||
|
"num_key_value_heads": 2,
|
||||||
|
"pad_token_id": null,
|
||||||
|
"pretraining_tp": 1,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"rope_parameters": {
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"rope_type": "default"
|
||||||
|
},
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"transformers_version": "5.6.2",
|
||||||
|
"use_cache": false,
|
||||||
|
"vocab_size": 50257
|
||||||
|
}
|
||||||
9
generation_config.json
Normal file
9
generation_config.json
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"output_attentions": false,
|
||||||
|
"output_hidden_states": false,
|
||||||
|
"transformers_version": "5.6.2",
|
||||||
|
"use_cache": false
|
||||||
|
}
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:e1dcdfb41c634038eea67bfbd7c01de8fea575aee2432c89b8eb23cd6ea3d817
|
||||||
|
size 762210848
|
||||||
250306
tokenizer.json
Normal file
250306
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
13
tokenizer_config.json
Normal file
13
tokenizer_config.json
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
{
|
||||||
|
"add_prefix_space": false,
|
||||||
|
"backend": "tokenizers",
|
||||||
|
"bos_token": "<|endoftext|>",
|
||||||
|
"eos_token": "<|endoftext|>",
|
||||||
|
"errors": "replace",
|
||||||
|
"is_local": false,
|
||||||
|
"local_files_only": false,
|
||||||
|
"model_max_length": 2048,
|
||||||
|
"pad_token": "<|endoftext|>",
|
||||||
|
"tokenizer_class": "GPT2Tokenizer",
|
||||||
|
"unk_token": "<|endoftext|>"
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user