初始化项目,由ModelHub XC社区提供模型
Model: ferrotorch/smollm-135m Source: Original Platform
This commit is contained in:
73
README.md
Normal file
73
README.md
Normal file
@@ -0,0 +1,73 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- text-generation
|
||||
- llama
|
||||
- ferrotorch
|
||||
---
|
||||
|
||||
# `ferrotorch/smollm-135m`
|
||||
|
||||
SmolLM-135M (HuggingFaceTB/SmolLM-135M). Llama-architecture causal LM, 135M parameters, 30 layers / 9 q-heads / 3 kv-heads (GQA), hidden=576, intermediate=1536, vocab=49152, tie_word_embeddings=true, rope_theta=10000.0. Apache 2.0 license. Pinned as the real-artifact baseline for causal LM parity vs `transformers==4.50.3` (#1147).
|
||||
|
||||
## Provenance
|
||||
|
||||
* Upstream: `HuggingFaceTB/SmolLM-135M` (apache-2.0).
|
||||
* Conversion script: [`ferrotorch/scripts/pin_pretrained_llm_weights.py`](https://github.com/dollspace/ferrotorch/blob/main/scripts/pin_pretrained_llm_weights.py).
|
||||
* Ferrotorch issue: <https://github.com/dollspace/ferrotorch/issues/1147>.
|
||||
* Number of trainable parameters: **134,515,008**.
|
||||
* SHA-256 of `model.safetensors` (this file is pinned in
|
||||
`ferrotorch-hub/src/registry.rs`): `c7a387d6fe81ca6dd304aeb809bda3932ff1bbef3ca41c9484502f2f448dc093`.
|
||||
* Config snapshot: hidden=576, layers=30,
|
||||
heads=9, kv_heads=3,
|
||||
intermediate=1536, vocab=49152,
|
||||
tie_word_embeddings=True,
|
||||
rope_theta=10000.0,
|
||||
rms_norm_eps=1e-05.
|
||||
|
||||
## Value-parity probe
|
||||
|
||||
Two extra files are uploaded so the ferrotorch-side harness can
|
||||
reproduce the parity verdict without re-running the upstream
|
||||
transformers model:
|
||||
|
||||
* `_value_parity_input.txt` — the verbatim prompt string the
|
||||
harness tokenizes (`"The quick brown fox jumps over the lazy"`).
|
||||
* `_value_parity_token_ids.json` — the tokenizer's output for that
|
||||
prompt (with the upstream tokenizer's `add_special_tokens=True`).
|
||||
* `_value_parity_output.bin` — float32 logits dumped from a fresh
|
||||
`transformers.AutoModelForCausalLM.from_pretrained(..., torch_dtype=float32)`
|
||||
single-prefill forward pass on those token ids (no cache).
|
||||
Format: `[u32 ndim][u32 × ndim shape][f32 × prod(shape) data]`
|
||||
little-endian; identical layout to the vision-side dumps.
|
||||
|
||||
## How to load
|
||||
|
||||
```rust
|
||||
use ferrotorch_hub::load_pretrained;
|
||||
use ferrotorch_llama::{LlamaConfig, LlamaForCausalLM};
|
||||
use ferrotorch_hub::HfTransformerConfig;
|
||||
|
||||
let state = load_pretrained::<f32>("smollm-135m")?;
|
||||
let hf_cfg = HfTransformerConfig::from_file("config.json")?;
|
||||
let cfg = LlamaConfig::from_hf(&hf_cfg)?;
|
||||
let mut model = LlamaForCausalLM::<f32>::new(cfg)?;
|
||||
model.load_hf_state_dict(&state, /* strict = */ true)?;
|
||||
```
|
||||
|
||||
## Upstream license
|
||||
|
||||
```
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
https://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
|
||||
```
|
||||
Reference in New Issue
Block a user