初始化项目，由ModelHub XC社区提供模型

Model: ferrotorch/smollm-135m Source: Original Platform
2026-06-04 17:40:17 +08:00
commit fe9de6e7f1
10 changed files with 98603 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,73 @@
+---
+license: apache-2.0
+tags:
+  - text-generation
+  - llama
+  - ferrotorch
+---
+
+# `ferrotorch/smollm-135m`
+
+SmolLM-135M (HuggingFaceTB/SmolLM-135M). Llama-architecture causal LM, 135M parameters, 30 layers / 9 q-heads / 3 kv-heads (GQA), hidden=576, intermediate=1536, vocab=49152, tie_word_embeddings=true, rope_theta=10000.0. Apache 2.0 license. Pinned as the real-artifact baseline for causal LM parity vs `transformers==4.50.3` (#1147).
+
+## Provenance
+
+* Upstream: `HuggingFaceTB/SmolLM-135M` (apache-2.0).
+* Conversion script: [`ferrotorch/scripts/pin_pretrained_llm_weights.py`](https://github.com/dollspace/ferrotorch/blob/main/scripts/pin_pretrained_llm_weights.py).
+* Ferrotorch issue: <https://github.com/dollspace/ferrotorch/issues/1147>.
+* Number of trainable parameters: **134,515,008**.
+* SHA-256 of `model.safetensors` (this file is pinned in
+  `ferrotorch-hub/src/registry.rs`): `c7a387d6fe81ca6dd304aeb809bda3932ff1bbef3ca41c9484502f2f448dc093`.
+* Config snapshot: hidden=576, layers=30,
+  heads=9, kv_heads=3,
+  intermediate=1536, vocab=49152,
+  tie_word_embeddings=True,
+  rope_theta=10000.0,
+  rms_norm_eps=1e-05.
+
+## Value-parity probe
+
+Two extra files are uploaded so the ferrotorch-side harness can
+reproduce the parity verdict without re-running the upstream
+transformers model:
+
+* `_value_parity_input.txt` — the verbatim prompt string the
+  harness tokenizes (`"The quick brown fox jumps over the lazy"`).
+* `_value_parity_token_ids.json` — the tokenizer's output for that
+  prompt (with the upstream tokenizer's `add_special_tokens=True`).
+* `_value_parity_output.bin` — float32 logits dumped from a fresh
+  `transformers.AutoModelForCausalLM.from_pretrained(..., torch_dtype=float32)`
+  single-prefill forward pass on those token ids (no cache).
+  Format: `[u32 ndim][u32 × ndim shape][f32 × prod(shape) data]`
+  little-endian; identical layout to the vision-side dumps.
+
+## How to load
+
+```rust
+use ferrotorch_hub::load_pretrained;
+use ferrotorch_llama::{LlamaConfig, LlamaForCausalLM};
+use ferrotorch_hub::HfTransformerConfig;
+
+let state = load_pretrained::<f32>("smollm-135m")?;
+let hf_cfg = HfTransformerConfig::from_file("config.json")?;
+let cfg = LlamaConfig::from_hf(&hf_cfg)?;
+let mut model = LlamaForCausalLM::<f32>::new(cfg)?;
+model.load_hf_state_dict(&state, /* strict = */ true)?;
+```
+
+## Upstream license
+
+```
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+```