初始化项目,由ModelHub XC社区提供模型

Model: StentorLabs/Stentor-30M
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-29 17:12:20 +08:00
commit c9fae039cc
10 changed files with 276577 additions and 0 deletions

37
.gitattributes vendored Normal file
View File

@@ -0,0 +1,37 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
training_loss.png filter=lfs diff=lfs merge=lfs -text
training_perplexity.png filter=lfs diff=lfs merge=lfs -text

655
README.md Normal file
View File

@@ -0,0 +1,655 @@
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- text-generation
- llama
- small-language-model
- efficient
- edge-deployment
- speculative-decoding
- tiny-model
- 30m-parameters
- kaggle-trained
- educational
- research
- low-resource
- cpu-inference
- mobile-deployment
- synthetic-data
- fineweb
- cosmopedia
pipeline_tag: text-generation
datasets:
- HuggingFaceFW/fineweb-edu
- HuggingFaceTB/smollm-corpus
widget:
- text: "Once upon a time"
example_title: "Story Generation"
- text: "Explain neural networks in simple terms."
example_title: "Toy Explanation (Often Wrong)"
- text: "def fibonacci(n):"
example_title: "Code Continuation"
- text: "[INST]What is machine learning?[/INST]"
example_title: "Instruction-Style Prompt (Not Tuned)"
model_card_authors:
- StentorLabs
model-index:
- name: Stentor-30M
results:
- task:
type: text-generation
dataset:
name: FineWeb-Edu + Cosmopedia v2 (validation split)
type: mixed
metrics:
- name: Validation Loss
type: loss
value: 3.4971
- name: Perplexity
type: perplexity
value: 33.02
---
# Stentor-30M
![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)
![Model Size](https://img.shields.io/badge/parameters-30M-green.svg)
![Training Time](https://img.shields.io/badge/training-7.88h-orange.svg)
![Hardware](https://img.shields.io/badge/hardware-1x%20Tesla%20T4-red.svg)
![Context Length](https://img.shields.io/badge/context-512%20tokens-purple.svg)
[![Hugging Face](https://img.shields.io/badge/🤗-Hugging%20Face-yellow.svg)](https://huggingface.co/StentorLabs/Stentor-30M)
[![GGUF](https://img.shields.io/badge/GGUF-mradermacher-blue.svg)](https://huggingface.co/mradermacher/Stentor-30M-GGUF)
Stentor-30M is a highly compact, efficient language model built on the Llama architecture. Designed for speed and low-resource environments, this ~30.4M parameter checkpoint utilizes a mixed-precision training pipeline and is best treated as a **base next-token predictor** (not a chat assistant). It does not "understand" text in a human sense and is not trained to reliably follow instructions. While the tokenizer may include special tokens/templates that resemble instruction or tool formats, the model itself is **not instruction-tuned** and will often generate **plausible but off-topic** text. It serves as an accessible entry point for researching attention mechanisms and testing training pipelines on consumer hardware.
> ⚠️ **Important Limitations**
>
> - **Context Window:** Maximum 512 tokens (very short)
> - **Not Instruction-Tuned:** May ignore prompts or respond off-topic
> - **Stopping / EOS:** Sometimes stops on its own, but it's rare; always set `max_new_tokens`
> - **Tokenizer ≠ Capability:** "tool/function" tokens do not imply real tool use
> - **No Safety Tuning:** Base model without RLHF or safety alignment
> - **Limited Knowledge:** 30M parameters = limited world knowledge
> - **Proof-of-Concept:** Not suitable for production without fine-tuning
> - **Educational Focus:** Trained on synthetic textbooks, not diverse real-world data
Recommended generation settings (based on manual testing):
- **Max new tokens:** 10-60
- **Temperature:** 1.1-1.4
- **Top-p:** 0.35-0.75
Real interactions (sampling is non-deterministic; your outputs may vary):
```text
Max New Tokens: 30
Temp: 1.2
Top p: 0.55
User:
The story of my life is
Generated text:
The story of my life is a tale of the story of the man who has been born in Germany. He was the first to learn about his family, and his story of the
```
```text
Max New Tokens: 30
Temp: 1.2
Top p: 0.7
User:
Biology is the understanding of
Generated text:
Biology is the understanding of nature and animals, not only as a model for biological research but also as a tool for understanding human behavior and conservation. Biological research is about understanding
```
```text
Max New Tokens: 30
Temp: 1.2
Top p: 0.7
User:
Everyone is dead
Text Generated:
Everyone is dead: 50 percent of our people will be killed in the coming days of our nation. 60 percent of us will live and go in
```
---
## 🚀 Quick Start
Get up and running in 3 simple steps:
### 1. Install
```bash
pip install transformers torch
```
### 2. Load & Generate
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("StentorLabs/Stentor-30M")
tokenizer = AutoTokenizer.from_pretrained("StentorLabs/Stentor-30M")
prompt = "The future of AI is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=50, # always set this; the model may not stop on its own
do_sample=True,
temperature=1.1,
top_p=0.55,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### 3. Explore!
- Try different prompts
- Adjust `max_new_tokens`, `temperature`, and `top_p`
---
## 📦 Quantized Versions
Pre-quantized versions of Stentor-30M are available for use with llama.cpp, LM Studio, Ollama, and other compatible runtimes — no conversion needed.
| Format | Provider | Link |
|--------|----------|------|
| GGUF (multiple quants) | mradermacher | [mradermacher/Stentor-30M-GGUF](https://huggingface.co/mradermacher/Stentor-30M-GGUF) |
Just download your preferred quantization (e.g. `Q4_K_M` for a good size/quality balance) and run it directly with llama.cpp or load it in LM Studio.
---
## Model Details
### Model Description
Stentor-30M is a lightweight LlamaForCausalLM model designed to bring the architectural benefits of Llama to a fraction of the size. With a hidden size of 256 and a compact parameter budget, this model is optimized for rapid inference and edge-deployment scenarios where memory is at a premium.
The tokenizer configuration may include control tokens commonly used in instruction/tool-call formatting (for experimentation), but **these tokens do not make the base model instruction-following or tool-using**. If you need reliable instruction following or structured tool calls, you will need additional fine-tuning / alignment.
- **Developed by:** Kai Izumoto (StentorLabs)
- **Funded by:** Self-funded
- **Shared by:** StentorLabs
- **Model type:** LlamaForCausalLM (Auto-regressive Language Model)
- **Language(s):** English
- **License:** Apache-2.0
- **Finetuned from model:** None (Base model trained from scratch)
## Uses
### Direct Use
- **Low-Latency Text Generation:** Due to its compact size (approx. 30.4M parameters), Stentor-30M is suitable for real-time applications on CPU or mobile devices.
- **Instruction-Style Prompting (Limited):** You can *format* prompts using tags like `[INST]`, but the model is **not** instruction-tuned and will often fail to follow the request.
- **Tool-Call Formatting Tokens (Limited):** The tokenizer may include tool-related tokens, but the model is **not** trained to reliably emit valid tool calls/JSON or to "use tools".
- **Edge Deployment:** Ideal for resource-constrained environments including mobile devices, IoT, and embedded systems.
### Downstream Use
- **Speculative Decoding (Experimental):** Stentor-30M can be used as a fast draft model for larger Llama-based models, but speedups depend on how often the larger model accepts the draft tokens (quality limits may reduce gains).
- **Educational/Research:** A perfect "petri dish" model for studying attention mechanics (4 attention heads) and training dynamics without requiring massive compute.
- **Prototyping:** Quick, low-cost experiments focused on latency, sampling behavior, and failure modes before scaling up.
### Out-of-Scope Use
- **Complex Reasoning:** As a 30M parameter model, users should not expect high-level reasoning or deep knowledge retrieval comparable to multi-billion parameter models.
- **Instruction-Following Chatbots:** This is a base model and is not reliably conversational or on-task.
- **Long Context:** The model is optimized for short-context tasks with a maximum position embedding of 512 tokens.
- **Production-Critical Applications:** This is a research/proof-of-concept model and should not be used for mission-critical applications without thorough testing.
## Bias, Risks, and Limitations
- **Context Window:** The model has a hard limit of 512 tokens for context length.
- **Prompt Relevance:** Outputs are often generic or unrelated to the prompt, even when they sound fluent.
- **Knowledge Base:** Limited parameter count restricts the amount of world knowledge the model can store.
- **Training Data Bias:** The model inherits any biases present in the FineWeb-Edu and Cosmopedia v2 datasets.
- **Hallucinations:** Like all language models, Stentor-30M may generate plausible-sounding but factually incorrect information.
- **No Safety Tuning:** This is a base model without safety alignment or RLHF.
### Recommendations
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. This model is best used for specific, narrow tasks or as a component in a larger system (e.g., speculative decoding) rather than a general-purpose assistant.
## How to Get Started with the Model
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "StentorLabs/Stentor-30M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# The repo may provide a chat template, but this is still a base model.
# Do not expect reliable instruction following just because you use chat formatting.
messages = [
{"role": "user", "content": "Hello, what are you?"}
]
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
)
outputs = model.generate(inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Advanced Usage with Tool-Call Formatting (Educational)
```python
# The tokenizer may include tokens that resemble tool/function calling formats.
# The base model is not trained to reliably emit valid tool calls or structured JSON.
messages = [
{"role": "system", "content": "You are a tiny base language model. You do not have tool access."},
{"role": "user", "content": "What's the weather like?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=100)
```
## Detailed Use Cases
### 1. Speculative Decoding with Llama 3
Potentially speed up larger model inference by using Stentor-30M as a draft model (results vary):
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load draft model (Stentor-30M)
draft_model = AutoModelForCausalLM.from_pretrained("StentorLabs/Stentor-30M")
draft_tokenizer = AutoTokenizer.from_pretrained("StentorLabs/Stentor-30M")
# Load target model
target_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
target_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
# Use speculative decoding (requires a recent Transformers version that supports `assistant_model`)
prompt = "Explain machine learning"
inputs = target_tokenizer(prompt, return_tensors="pt")
outputs = target_model.generate(
**inputs,
assistant_model=draft_model, # Stentor-30M as draft
do_sample=True,
max_new_tokens=100
)
print(target_tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### 2. Run with llama.cpp / LM Studio / Ollama (GGUF)
Pre-quantized GGUF files are available at [mradermacher/Stentor-30M-GGUF](https://huggingface.co/mradermacher/Stentor-30M-GGUF) — no conversion required.
```bash
# Download a quantized GGUF (e.g. Q4_K_M) from the link above, then run with llama.cpp:
./llama-cli -m stentor-30m-Q4_K_M.gguf -p "Hello world" -n 50
```
Or simply load the `.gguf` file directly in **LM Studio** or **Ollama** for a GUI/API experience.
### 3. Edge Deployment with ONNX
Convert to ONNX for mobile/edge deployment:
```bash
# Install dependencies
pip install optimum[exporters]
# Export to ONNX
optimum-cli export onnx \
--model StentorLabs/Stentor-30M \
--task text-generation-with-past \
stentor-30m-onnx/
```
```python
# Use with ONNX Runtime
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer
model = ORTModelForCausalLM.from_pretrained("stentor-30m-onnx")
tokenizer = AutoTokenizer.from_pretrained("StentorLabs/Stentor-30M")
inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))
```
### 4. Rapid Prototyping
Quick experimentation before scaling:
```python
# These "tasks" are intentionally broad: this tiny base model will often fail.
# The point is to observe latency, failure modes, and sampling behavior.
from transformers import pipeline
generator = pipeline("text-generation", model="StentorLabs/Stentor-30M")
test_prompts = [
"Summarize this: [long text]",
"Translate to French: Hello",
"Answer: What is 2+2?"
]
for prompt in test_prompts:
result = generator(prompt, max_new_tokens=30)[0]['generated_text']
print(f"Prompt: {prompt}\nResult: {result}\n")
```
## Quantize It Yourself
If you want to produce your own quantized versions rather than using the pre-built GGUFs:
### 8-bit Quantization (bitsandbytes)
```python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
"StentorLabs/Stentor-30M",
quantization_config=quantization_config,
device_map="auto"
)
# Memory: ~30 MB (~50% reduction from fp16 weights)
```
### 4-bit Quantization (bitsandbytes)
```python
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
"StentorLabs/Stentor-30M",
quantization_config=quantization_config,
device_map="auto"
)
# Memory: ~15 MB (~75% reduction from fp16 weights)
```
**Note:** Requires `bitsandbytes` library: `pip install bitsandbytes`
### Convert to GGUF Manually
```bash
# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Install dependencies
pip install -r requirements.txt
# Download model
huggingface-cli download StentorLabs/Stentor-30M --local-dir stentor-30m
# Convert to GGUF
python convert_hf_to_gguf.py stentor-30m/ \
--outfile stentor-30m.gguf \
--outtype f16
# Quantize (optional)
./llama-quantize stentor-30m.gguf stentor-30m-q4_0.gguf q4_0
```
### Convert to TensorFlow Lite (Mobile)
```bash
# Install dependencies
pip install tensorflow tf2onnx
# First convert to ONNX (see above)
# Then convert ONNX to TFLite
python -m tf2onnx.convert \
--onnx stentor-30m-onnx/model.onnx \
--output stentor-30m.tflite \
--opset 13
```
**Format summary:**
- **GGUF:** C++ applications, llama.cpp, LM Studio, Ollama — [pre-built available](https://huggingface.co/mradermacher/Stentor-30M-GGUF)
- **ONNX:** Cross-platform (Windows/Linux/Mac/Web)
- **TFLite:** Android/iOS mobile apps
---
## Training Details
### Training Data
The model was trained on a high-quality mixed dataset focused on educational content and synthetic textbook data:
- **FineWeb-Edu** ([HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)): A dataset filtered for educational quality.
- **Cosmopedia v2** ([HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus)): A corpus of synthetic textbooks and stories.
**Total tokens processed:** 600,000,512 tokens
### Training Procedure
The model was trained using a custom script in a Kaggle Jupyter environment, demonstrating the accessibility of training efficient models on free-tier compute.
#### Preprocessing
The training pipeline utilized lightweight but effective preprocessing steps:
- **Cleaning:** Unicode normalization (NFKC) and whitespace stripping/normalization.
- **Formatting:** Optional wrapping for chat formats or `<think>` tokens.
- **Packing:** Sequence packing into fixed block_size chunks to maximize training efficiency.
- **Tokenization:** Standard Llama tokenization with EOS tokens appended.
#### Training Hyperparameters
<details>
<summary><b>Click to view full training configuration</b></summary>
| Hyperparameter | Value |
|----------------|-------|
| Precision | fp16 mixed precision |
| Optimizer | AdamW |
| Scheduler | Cosine |
| Learning Rate | 0.0008 |
| Weight Decay | 0.01 |
| Warmup Ratio | 0.02 |
| Stable Ratio | 0.8 |
| Total Batch Size | 256 |
| Max Train Steps | 4,578 |
| Evaluation Steps | 100 |
| Gradient Accumulation | 64 |
</details>
#### Speeds, Sizes, Times
- **Training Time:** 28,367.5 seconds (~7.88 hours)
- **Hardware:** 1x Tesla T4 (`num_processes: 1`)
- **Vocab Size:** 32,768 (padded to multiple of 128)
- **Sequence Length:** 512 tokens
- **Tokens per Second (avg):** ~21,137 TPS
- **Total Parameters:** 30,419,712
- **Embedding Parameters:** 8,388,608 (27.6% of total)
> **Note:** A significant portion of parameters are allocated to embeddings due to the 32K vocabulary size. For future iterations, a smaller vocabulary (8K-16K) could free up capacity for additional model layers.
---
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
Evaluation was performed on a held-out validation split of the mixed FineWeb-Edu and Cosmopedia dataset.
#### Metrics
- **Validation Loss:** Measures how well the model predicts the next token (lower is better).
- **Perplexity (PPL):** The exponential of the loss, indicating how "surprised" the model is by new text (lower is better).
### Results
![Training Loss Curve](training_loss.png)
![Training Perplexity Curve](training_perplexity.png)
| Metric | Value |
|--------|-------|
| **Validation Loss** | 3.4971 (best @ step 4500) |
| **Perplexity** | 33.02 |
#### Training Progress
The model showed steady improvement throughout training:
- Initial train loss (step 25): 9.4245
- Mid-training train loss (step 2300): 3.7579
- Final train loss (step 4575): 3.2368
- Best eval loss: 3.4971 (step 4500)
- Final eval loss / PPL: 3.4975 / 33.03
> **Note:** As a 30M parameter base model, this checkpoint should be treated as a functional proof-of-concept baseline. The model does not run external benchmarks like MMLU or GSM8K.
---
## Technical Specifications
### Model Architecture and Objective
<details>
<summary><b>Click to view full architecture specifications</b></summary>
Stentor-30M utilizes the Llama architecture with the following specific configuration:
| Component | Value |
|-----------|-------|
| Hidden Size | 256 |
| Intermediate Size | 1024 |
| Num Hidden Layers | 21 |
| Attention Heads | 4 |
| Key/Value Heads | 4 |
| Hidden Activation | SiLU |
| RoPE Theta | 10000.0 |
| Max Position Embeddings | 512 |
| Vocab Size | 32,768 |
| Tie Word Embeddings | True |
> **Architecture Note:** This configuration is set to 21 layers to keep total parameters in the 30M-31M target range with a 32,768-token vocabulary.
</details>
### Compute Infrastructure
The model was trained using standard cloud infrastructure available to researchers and students.
#### Hardware
- **GPUs:** 1x NVIDIA Tesla T4 (16GB)
- **Platform:** Kaggle Notebooks (free tier)
- **Compute Type:** Cloud-based
#### Software
- **Transformers Version:** 5.2.0
- **PyTorch Version:** Latest stable
- **Torch Compile:** False (disabled for notebook stability)
- **Accelerate:** Enabled for training
---
## Environmental Impact
- **Hardware Type:** 1x NVIDIA Tesla T4
- **Hours used:** ~7.88 hours
- **Cloud Provider:** Kaggle
- **Compute Region:** US West
- **Carbon Emitted:** ~160 gCO2e (estimated)
Training on free-tier cloud GPUs demonstrates the accessibility of small language model research to students and independent researchers.
---
## Related Resources
### Official Resources
- 📊 Best model artifact: `results/best_model` (config + tokenizer + weights + metadata)
- 🎓 [Model Card Methodology](https://arxiv.org/abs/1810.03993) - Mitchell et al., 2018
### Quantized Versions
- 🗜️ [mradermacher/Stentor-30M-GGUF](https://huggingface.co/mradermacher/Stentor-30M-GGUF) - GGUF quantizations for llama.cpp, LM Studio, Ollama
### Related Models
- [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) - Larger alternative (1.1B params)
- [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) - Similar size category
- [Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) - Target model for speculative decoding
### Research Papers
- [Speculative Decoding](https://arxiv.org/abs/2211.17192) - Leviathan et al., 2023
- [Small Language Models Survey](https://arxiv.org/abs/2402.14848) - Survey on efficient LLMs
---
## Citation
```bibtex
@misc{izumoto2026stentor30m,
title={Stentor-30M: A Compact Llama-based Language Model},
author={Kai Izumoto},
year={2026},
publisher={StentorLabs},
howpublished={\url{https://huggingface.co/StentorLabs/Stentor-30M}}
}
```
---
## Glossary
- **NLP (Natural Language Processing):** The field of AI focused on the interaction between computers and human language.
- **PPL (Perplexity):** A measurement of how well a probability model predicts a sample. Lower is generally better.
- **Speculative Decoding:** A technique where a small "draft" model (like Stentor-30M) quickly generates tokens that are then verified by a larger model, speeding up the overall process.
- **SLM (Small Language Model):** Language models with parameters typically under 1B, designed for efficiency and specific tasks.
- **RoPE (Rotary Position Embedding):** A method for encoding position information in transformer models.
- **Edge Deployment:** Running models on resource-constrained devices like mobile phones or IoT devices.
- **GGUF:** A file format used by llama.cpp and compatible runtimes for efficient local inference.
---
## Model Card Contact
For questions, please contact [StentorLabs@gmail.com](mailto:StentorLabs@gmail.com) or open an issue on the [model repository](https://huggingface.co/StentorLabs/Stentor-30M/discussions).
---
## Acknowledgments
Special thanks to:
- Hugging Face for the transformers library and dataset hosting
- The creators of FineWeb-Edu and Cosmopedia v2 datasets
- Kaggle for providing free GPU compute resources
- [mradermacher](https://huggingface.co/mradermacher) for providing GGUF quantizations
- The open-source community for making accessible AI research possible
---
## Connect & Community
### Stay Updated
- 📧 [Email](mailto:StentorLabs@gmail.com) - Direct contact
- 💬 [HuggingFace Discussions](https://huggingface.co/StentorLabs/Stentor-30M/discussions) - Questions and community chat
### More from StentorLabs
- 🔬 [All Models](https://huggingface.co/StentorLabs) - Browse our model collection
---
<p align="center">
Made with ❤️ by <a href="https://huggingface.co/StentorLabs">StentorLabs</a>
<br>
<i>Democratizing AI through accessible, efficient models</i>
</p>

87
chat_template.jinja Normal file
View File

@@ -0,0 +1,87 @@
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = none %}
{%- endif %}
{%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}
{%- set ns = namespace() %}
{%- set ns.index = 0 %}
{%- for message in loop_messages %}
{%- if not (message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
{%- if (message["role"] == "user") != (ns.index % 2 == 0) %}
{{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
{%- endif %}
{%- set ns.index = ns.index + 1 %}
{%- endif %}
{%- endfor %}
{{- bos_token }}
{%- for message in loop_messages %}
{%- if message["role"] == "user" %}
{%- if tools is not none and (message == user_messages[-1]) %}
{{- "[AVAILABLE_TOOLS] [" }}
{%- for tool in tools %}
{%- set tool = tool.function %}
{{- '{"type": "function", "function": {' }}
{%- for key, val in tool.items() if key != "return" %}
{%- if val is string %}
{{- '"' + key + '": "' + val + '"' }}
{%- else %}
{{- '"' + key + '": ' + val|tojson }}
{%- endif %}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- "}}" }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" }}
{%- endif %}
{%- endfor %}
{{- "[/AVAILABLE_TOOLS]" }}
{%- endif %}
{%- if loop.last and system_message is defined %}
{{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
{%- else %}
{{- "[INST] " + message["content"] + "[/INST]" }}
{%- endif %}
{%- elif message.tool_calls is defined and message.tool_calls is not none %}
{{- "[TOOL_CALLS] [" }}
{%- for tool_call in message.tool_calls %}
{%- set out = tool_call.function|tojson %}
{{- out[:-1] }}
{%- if not tool_call.id is defined or tool_call.id|length != 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
{%- endif %}
{{- ', "id": "' + tool_call.id + '"}' }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" + eos_token }}
{%- endif %}
{%- endfor %}
{%- elif message["role"] == "assistant" %}
{{- " " + message["content"]|trim + eos_token}}
{%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
{%- if message.content is defined and message.content.content is defined %}
{%- set content = message.content.content %}
{%- else %}
{%- set content = message.content %}
{%- endif %}
{{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
{%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
{%- endif %}
{{- '"call_id": "' + message.tool_call_id + '"}[/TOOL_RESULTS]' }}
{%- else %}
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
{%- endif %}
{%- endfor %}

32
config.json Normal file
View File

@@ -0,0 +1,32 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"dtype": "float32",
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 256,
"initializer_range": 0.02,
"intermediate_size": 1024,
"max_position_embeddings": 512,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 4,
"num_hidden_layers": 21,
"num_key_value_heads": 4,
"pad_token_id": 2,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_parameters": {
"rope_theta": 10000.0,
"rope_type": "default"
},
"tie_word_embeddings": true,
"transformers_version": "5.2.0",
"use_cache": true,
"vocab_size": 32768
}

10
generation_config.json Normal file
View File

@@ -0,0 +1,10 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"output_attentions": false,
"output_hidden_states": false,
"pad_token_id": 2,
"transformers_version": "5.2.0",
"use_cache": true
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7cd2171cbc7fc7882408d0658847d3c4093b3a7e73e184214b2881d06165d893
size 121699864

275733
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

14
tokenizer_config.json Normal file
View File

@@ -0,0 +1,14 @@
{
"add_prefix_space": true,
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": false,
"model_max_length": 512,
"pad_token": "</s>",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "PreTrainedTokenizerFast",
"unk_token": "<unk>",
"use_default_system_prompt": false
}

3
training_loss.png Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7d0a6448211629c1131dd0ef52ea122a63e4ac083d296327880e58140e332332
size 142350

3
training_perplexity.png Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b59b7c5cc8e1f438debc8b437df1aec603addb229984150582679ebee60b7c73
size 168440