初始化项目，由ModelHub XC社区提供模型

Model: open-machine/SmolLM2-135M-FlashNorm Source: Original Platform
2026-05-04 18:37:03 +08:00
commit 9126a0f000
9 changed files with 147508 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,76 @@
+---
+license: apache-2.0
+base_model: HuggingFaceTB/SmolLM2-135M
+tags:
+  - flashnorm
+  - transformer-tricks
+  - efficient-inference
+  - weightless-rmsnorm
+pipeline_tag: text-generation
+---
+
+# SmolLM2-135M-FlashNorm
+
+FlashNorm-prepared checkpoint of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M). Mathematically equivalent to the source model. The per-channel RMSNorm weight tensors (`input_layernorm.weight`, `post_attention_layernorm.weight`, `model.norm.weight`) are folded into the following linear layers and then removed from the state dict entirely.
+
+> **Framework support note.** Stock vLLM currently does not load this checkpoint because the norm weight tensors are absent. The upstream patch to accept missing tensors is tracked at: **TBD (vLLM issue link)**. Until the patch lands, use HuggingFace Transformers; it loads this with a warning that norm weights were not initialized and defaults them to ones, which is the correct behavior for FlashNorm.
+>
+> Two additional Llama-family verification checkpoints are published as [Llama-3.2-1B-FlashNorm-test](https://huggingface.co/open-machine/Llama-3.2-1B-FlashNorm-test) and [Llama-3.1-8B-FlashNorm-test](https://huggingface.co/open-machine/Llama-3.1-8B-FlashNorm-test). These retain the norm tensors as all-ones (compatibility layout) so they load in stock vLLM today and are intended for experimentation. They will be republished as weightless variants once vLLM's loader supports absent norm tensors.
+
+## What FlashNorm does
+
+An exact reformulation of `RMSNorm -> Linear`:
+
+- Fold the per-channel normalization weight `g` into the following linear layer: `W_star = W @ diag(g)`, computed once at checkpoint conversion.
+- After folding, the RMSNorm layer has no learnable per-channel scale. At runtime it simply divides by `rms(x)`.
+- The resulting model computes the same output as the original, by Proposition 1 of the FlashNorm paper.
+
+See the [paper](https://github.com/OpenMachine-ai/transformer-tricks/blob/main/tex/flashNorm.tex) (Section 3.1 and Proposition 1) and the [transformer-tricks](https://github.com/OpenMachine-ai/transformer-tricks) repo for details.
+
+## What's different from the source checkpoint
+
+| Tensor | Source | This FlashNorm checkpoint |
+|---|---|---|
+| `model.layers.*.input_layernorm.weight` | learned per-channel `g` | **absent** |
+| `model.layers.*.self_attn.{q,k,v}_proj.weight` | `W` | `W @ diag(g_input_layernorm)` |
+| `model.layers.*.post_attention_layernorm.weight` | learned per-channel `g` | **absent** |
+| `model.layers.*.mlp.{gate,up}_proj.weight` | `W` | `W @ diag(g_post_attention_layernorm)` |
+| `model.norm.weight` | learned per-channel `g` | **absent** |
+
+All dtype conventions match the source (`bfloat16`). Mathematical identity to the source holds by construction.
+
+## Usage
+
+### Regenerate locally with `transformer_tricks`
+
+```python
+import transformer_tricks as tt
+tt.flashify_repo('HuggingFaceTB/SmolLM2-135M', strict=True)
+```
+
+### Via HuggingFace Transformers
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+tok = AutoTokenizer.from_pretrained('open-machine/SmolLM2-135M-FlashNorm')
+model = AutoModelForCausalLM.from_pretrained('open-machine/SmolLM2-135M-FlashNorm')
+
+ids = tok('Once upon a time there was', return_tensors='pt').input_ids
+out = model.generate(ids, max_new_tokens=50, do_sample=False)
+print(tok.decode(out[0], skip_special_tokens=True))
+```
+
+A warning about missing norm weights is expected; Transformers defaults those to ones, which is the correct value for a FlashNorm checkpoint.
+
+### Via vLLM
+
+Not yet supported. See the tracking issue linked above.
+
+## Verification
+
+Under fp32 inference, greedy generation from this checkpoint is bit-identical to the source SmolLM2-135M model. Under fp16 inference the output is within benchmark noise (see the Quality table in Section 5 of the paper).
+
+## License
+
+Apache-2.0, inherited from the source model.
--- a/config.json
+++ b/config.json
@@ -0,0 +1,34 @@
+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dtype": "bfloat16",
+  "eos_token_id": 0,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 576,
+  "initializer_range": 0.041666666666666664,
+  "intermediate_size": 1536,
+  "is_llama_config": true,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 9,
+  "num_hidden_layers": 30,
+  "num_key_value_heads": 3,
+  "pad_token_id": null,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_interleaved": false,
+  "rope_parameters": {
+    "rope_theta": 100000,
+    "rope_type": "default"
+  },
+  "tie_word_embeddings": true,
+  "transformers_version": "5.5.4",
+  "use_cache": true,
+  "vocab_size": 49152
+}
--- a/merges.txt
+++ b/merges.txt
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1049412dc1bfef8ad396b910fb2243375aee9b9ad346790d95fd69a0e09a3afb
+size 268983424
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,42 @@
+{
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<|im_start|>",
+    "<|im_end|>",
+    "<repo_name>",
+    "<reponame>",
+    "<file_sep>",
+    "<filename>",
+    "<gh_stars>",
+    "<issue_start>",
+    "<issue_comment>",
+    "<issue_closed>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<jupyter_script>",
+    "<empty_output>"
+  ],
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,167 @@
+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<repo_name>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<file_sep>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<jupyter_script>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<|im_start|>",
+    "<|im_end|>",
+    "<repo_name>",
+    "<reponame>",
+    "<file_sep>",
+    "<filename>",
+    "<gh_stars>",
+    "<issue_start>",
+    "<issue_comment>",
+    "<issue_closed>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<jupyter_script>",
+    "<empty_output>"
+  ],
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 8192,
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>",
+  "vocab_size": 49152
+}
--- a/vocab.json
+++ b/vocab.json