74 lines
2.8 KiB
Markdown
74 lines
2.8 KiB
Markdown
|
|
---
|
|||
|
|
license: apache-2.0
|
|||
|
|
tags:
|
|||
|
|
- text-generation
|
|||
|
|
- llama
|
|||
|
|
- ferrotorch
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# `ferrotorch/smollm-135m`
|
|||
|
|
|
|||
|
|
SmolLM-135M (HuggingFaceTB/SmolLM-135M). Llama-architecture causal LM, 135M parameters, 30 layers / 9 q-heads / 3 kv-heads (GQA), hidden=576, intermediate=1536, vocab=49152, tie_word_embeddings=true, rope_theta=10000.0. Apache 2.0 license. Pinned as the real-artifact baseline for causal LM parity vs `transformers==4.50.3` (#1147).
|
|||
|
|
|
|||
|
|
## Provenance
|
|||
|
|
|
|||
|
|
* Upstream: `HuggingFaceTB/SmolLM-135M` (apache-2.0).
|
|||
|
|
* Conversion script: [`ferrotorch/scripts/pin_pretrained_llm_weights.py`](https://github.com/dollspace/ferrotorch/blob/main/scripts/pin_pretrained_llm_weights.py).
|
|||
|
|
* Ferrotorch issue: <https://github.com/dollspace/ferrotorch/issues/1147>.
|
|||
|
|
* Number of trainable parameters: **134,515,008**.
|
|||
|
|
* SHA-256 of `model.safetensors` (this file is pinned in
|
|||
|
|
`ferrotorch-hub/src/registry.rs`): `c7a387d6fe81ca6dd304aeb809bda3932ff1bbef3ca41c9484502f2f448dc093`.
|
|||
|
|
* Config snapshot: hidden=576, layers=30,
|
|||
|
|
heads=9, kv_heads=3,
|
|||
|
|
intermediate=1536, vocab=49152,
|
|||
|
|
tie_word_embeddings=True,
|
|||
|
|
rope_theta=10000.0,
|
|||
|
|
rms_norm_eps=1e-05.
|
|||
|
|
|
|||
|
|
## Value-parity probe
|
|||
|
|
|
|||
|
|
Two extra files are uploaded so the ferrotorch-side harness can
|
|||
|
|
reproduce the parity verdict without re-running the upstream
|
|||
|
|
transformers model:
|
|||
|
|
|
|||
|
|
* `_value_parity_input.txt` — the verbatim prompt string the
|
|||
|
|
harness tokenizes (`"The quick brown fox jumps over the lazy"`).
|
|||
|
|
* `_value_parity_token_ids.json` — the tokenizer's output for that
|
|||
|
|
prompt (with the upstream tokenizer's `add_special_tokens=True`).
|
|||
|
|
* `_value_parity_output.bin` — float32 logits dumped from a fresh
|
|||
|
|
`transformers.AutoModelForCausalLM.from_pretrained(..., torch_dtype=float32)`
|
|||
|
|
single-prefill forward pass on those token ids (no cache).
|
|||
|
|
Format: `[u32 ndim][u32 × ndim shape][f32 × prod(shape) data]`
|
|||
|
|
little-endian; identical layout to the vision-side dumps.
|
|||
|
|
|
|||
|
|
## How to load
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
use ferrotorch_hub::load_pretrained;
|
|||
|
|
use ferrotorch_llama::{LlamaConfig, LlamaForCausalLM};
|
|||
|
|
use ferrotorch_hub::HfTransformerConfig;
|
|||
|
|
|
|||
|
|
let state = load_pretrained::<f32>("smollm-135m")?;
|
|||
|
|
let hf_cfg = HfTransformerConfig::from_file("config.json")?;
|
|||
|
|
let cfg = LlamaConfig::from_hf(&hf_cfg)?;
|
|||
|
|
let mut model = LlamaForCausalLM::<f32>::new(cfg)?;
|
|||
|
|
model.load_hf_state_dict(&state, /* strict = */ true)?;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Upstream license
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|||
|
|
you may not use this file except in compliance with the License.
|
|||
|
|
You may obtain a copy of the License at
|
|||
|
|
|
|||
|
|
https://www.apache.org/licenses/LICENSE-2.0
|
|||
|
|
|
|||
|
|
Unless required by applicable law or agreed to in writing, software
|
|||
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|||
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|||
|
|
See the License for the specific language governing permissions and
|
|||
|
|
limitations under the License.
|
|||
|
|
|
|||
|
|
```
|