初始化项目，由ModelHub XC社区提供模型

Model: SipsaLabs/qwen3-1.7b-uc2p79 Source: Original Platform
2026-05-16 01:58:32 +08:00
commit c4649e9454
15 changed files with 152288 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/98
+++ b/98
@@ -0,0 +1,98 @@
 # Sipsa Labs Research and Evaluation License
 **Version 1.0** — effective 2026-04-25
 This License (the "**License**") is a legal agreement between **Sipsa Labs, Inc.**, a Delaware corporation ("**Licensor**"), and **you** (an individual person or a single legal entity exercising rights under this License) ("**Licensee**"). It governs your use of any **pre-compressed reference model** that incorporates the patent-pending compression methods of Sipsa Labs (the "**Licensed Material**"), including but not limited to model weights, configuration files, tokenizer files, and the `ultracompress.json` provenance manifest contained in any Hugging Face Hub repository under the `sipsalabs` organization.
 By downloading, using, or distributing the Licensed Material, you agree to the terms of this License.
 ---
 ## 1. Grant of license
 Subject to the terms and conditions of this License, Licensor grants you a non-exclusive, worldwide, non-transferable, royalty-free license to:
 (a) Use the Licensed Material for **non-commercial research** at a non-profit research institution, including academic research, dissertation work, and unfunded individual research;
 (b) Use the Licensed Material for **personal experimentation, learning, and educational purposes**;
 (c) Use the Licensed Material for **pre-purchase evaluation** by an enterprise that is actively considering negotiating a commercial license with Licensor — for a period not to exceed **ninety (90) days** from first download.
 For clarity, the rights granted under this License are limited to the use of the Licensed Material as published. They do **not** include any rights to the underlying patent-pending compression methods, which are the subject of pending U.S. patent applications USPTO 64/049,511 and 64/049,517.
 ## 2. Conditions
 You agree:
 (a) **Attribution**: any publication, technical report, blog post, or public demonstration that uses or describes the Licensed Material must attribute Sipsa Labs and cite USPTO 64/049,511 and 64/049,517 as the source of the underlying methods.
 (b) **No commercial use**: you will not use the Licensed Material in any product, service, or activity that generates revenue or that is intended to generate revenue. The non-exhaustive list of prohibited commercial uses includes:
   - Deploying the Licensed Material in a production service that serves paying users
   - Including the Licensed Material in a product offered for sale
   - Using the Licensed Material to provide consulting services to a third party
   - Using the Licensed Material in any context where the user pays for inference, output, or access
 (c) **No redistribution**: you will not redistribute the Licensed Material or any derivative work that contains the Licensed Material in whole or in part, except as expressly permitted by Section 4.
 (d) **No reverse engineering of the methods**: you will not attempt to reverse engineer the compression methods underlying the Licensed Material. Loading the model into a runtime, performing inference, fine-tuning at the loaded model's surface, and benchmarking the model are not "reverse engineering" for purposes of this Section.
 (e) **No use to circumvent**: you will not use the Licensed Material as part of a project whose purpose is to develop or deploy a method that infringes Licensor's patent rights, whether the infringing method is created by you, by your collaborators, or by an automated system trained on the Licensed Material.
 (f) **License preservation**: you will preserve all copyright, license, and patent notices contained in the Licensed Material in any copy you make.
 (g) **Term**: this License is effective for as long as you are in compliance with these conditions. Termination is automatic upon any material breach.
 ## 3. Commercial license required
 If you want to use the Licensed Material in a context that exceeds the rights granted in Section 1, you must obtain a separate written commercial license from Licensor. Email **legal@sipsalabs.com** to begin that conversation.
 A commercial license will typically include:
 - A patent license to the underlying compression methods
 - A license to use the Licensed Material in production
 - Service-level commitments and support terms
 - A pricing structure appropriate to your use case (per-deployment, per-token, per-device, or site)
 ## 4. Sharing for collaborative research
 Notwithstanding Section 2(c), Licensee may share the Licensed Material with named collaborators on a single research project, provided that:
 (a) Each collaborator separately accepts the terms of this License before receiving the Licensed Material;
 (b) The sharing is internal to the research project and is not a public redistribution; and
 (c) The combined activities of all collaborators on the project remain non-commercial.
 ## 5. No warranty
 THE LICENSED MATERIAL IS PROVIDED **"AS IS"** WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESS, IMPLIED, OR STATUTORY, INCLUDING WITHOUT LIMITATION ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.
 LICENSOR DOES NOT WARRANT THAT THE LICENSED MATERIAL WILL BE ERROR-FREE, THAT IT WILL MEET YOUR REQUIREMENTS, OR THAT IT IS SUITABLE FOR ANY SAFETY-CRITICAL OR LIFE-CRITICAL APPLICATION.
 ## 6. Limitation of liability
 TO THE MAXIMUM EXTENT PERMITTED BY LAW, LICENSOR'S TOTAL CUMULATIVE LIABILITY UNDER OR IN CONNECTION WITH THIS LICENSE WILL NOT EXCEED ONE HUNDRED U.S. DOLLARS ($100). IN NO EVENT WILL LICENSOR BE LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES, OR ANY LOSS OF PROFITS, REVENUE, DATA, OR USE, EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
 ## 7. Termination
 Without prejudice to any other rights, Licensor may terminate this License if you fail to comply with the terms and conditions of this License. In such event, you must destroy all copies of the Licensed Material in your possession.
 ## 8. Governing law
 This License is governed by the laws of the State of Delaware, without regard to conflict-of-laws principles. The exclusive jurisdiction and venue for disputes arising out of or relating to this License is the state or federal courts located in San Francisco, California.
 ## 9. Entire agreement
 This License constitutes the entire agreement between you and Licensor concerning the Licensed Material. It supersedes all prior or contemporaneous communications and proposals (whether oral, written, or electronic) between you and Licensor.
 ## 10. Contact
 For commercial licensing or questions about this License:
 **Sipsa Labs, Inc.**
 Email: legal@sipsalabs.com
 Web: https://sipsalabs.com
 For technical support or model issues: file an issue at [github.com/sipsalabs/ultracompress](https://github.com/sipsalabs/ultracompress) or email founder@sipsalabs.com.
 ---
 *The version on sipsalabs.com is the controlling version. Each model's Hugging Face Hub repository contains a copy of this License with a per-model reference number; that copy is the operative version for that specific download.*
--- a/README.md
+++ b/README.md
@@ -0,0 +1,251 @@
 > **Sipsa Labs, Inc. update — 2026-05-11.** UltraCompress v0.6.9 on [PyPI](https://pypi.org/project/ultracompress/v0.6.9/) under BUSL-1.1 + Additional Use Grant (free for sub-$1M ARR companies, research, and individuals; auto-converts to Apache 2.0 four years post-release). OpenAI-compatible inference API at [api.sipsalabs.com/v1](https://api.sipsalabs.com/v1) is **publicly self-serve** — Pro $99/mo + Team $499/mo at [sipsalabs.com/pricing](https://sipsalabs.com/pricing), or [free $5 credits](https://sipsalabs.com/get-access) (no card). The `pip install ultracompress` substrate is fully production today (no API key required for self-host). 22 architectures verified, 0.6B–405B parameters, sub-1.005× perplexity ratio on Mixtral-8x7B / Qwen3-14B / Mistral-7B. Live discussion on [Hacker News](https://news.ycombinator.com/item?id=48099107). Commercial inquiries: founder@sipsalabs.com.
 ---
 ---
 license: other
 license_name: sipsa-labs-research-evaluation-v1.0
 license_link: LICENSE
 base_model: Qwen/Qwen3-1.7B
 tags:
  - ultracompress
  - quantization
  - row-overlay-quantization
  - llm
  - inference
  - on-device
  - edge
  - patent-pending
 library_name: transformers
 language:
  - en
 pipeline_tag: text-generation
 ---
 # qwen3-1.7b-uc2p79
 A patent-pending compressed reference variant of [`Qwen/Qwen3-1.7B`](https://huggingface.co/Qwen/Qwen3-1.7B), shipping the **low-rank correction overlay** post-training row-overlay quantization at **2.798 bits per weight** (patent-pending,511 — patent pending; this specific fit measures 2.7767 bpw effective).
 UltraCompress is a two-track patent estate. **low-rank correction overlay** (this artifact, shipping today) compresses each weight via row-overlay quantization at sub-3 bpw. **shared-block parameter dispatch** (patent-pending,517 — patent pending; research-stage, v0.2 Q3 2026) is a separate architectural compression method — shared-block parameter dispatch — that replaces the N transformer layers of a teacher with a single shared block applied iteratively, with measured compression ratios of **311× and 734× on the Qwen3-1.7B body** at 68-69.6% top-10-token-agreement on held-out data. Combined low-rank correction overlay × shared-block parameter dispatch is the multiplicative compression Sipsa Labs is building toward; this v0.1 artifact is the low-rank correction overlay standalone, demonstrating the cohort consistency of the row-overlay quantization line as the foundation under that combined estate.
 > **Read this first — this repository ships in dual format.**
 >
 > - `model.safetensors` (~3.3 GB) — FP16 reconstruction. Loadable directly via `transformers.from_pretrained`. Use this if your runtime expects standard HF safetensors.
 > - `model.uc.bin` (~491 MB at 2.7871 bpw on-disk) — the actually-packed binary at the claimed sub-3-bpw operating point. Loadable via `pip install ultracompress`. This is the artifact whose disk size matches the headline compression number.
 >
 > Both files reconstruct the same compressed weights to within FP16 precision (verified bit-equivalent per the `pack_v17.py` round-trip protocol on this fit). Buyers pick based on runtime: enterprise inference platforms running standard transformers loaders use the safetensors; edge / on-device deployments using the UltraCompress runtime use the packed binary. The model card claims (2.798 bpw cohort design target, 2.7767 bpw measured on this fit) describe the information content of either file — the safetensors is bigger on disk but represents the same compressed model, not a different one.
 >
 > The `ultracompress.json` manifest declares both files in its `formats` block with per-file SHA-256, so `uc info` validates either format end-to-end.
 ## Quick start
 ```bash
 pip install ultracompress
 uc pull SipsaLabs/qwen3-1.7b-uc2p79
 uc info ./models/SipsaLabs_qwen3-1.7b-uc2p79
 ```
 The CLI streams the artifact, validates the manifest (SHA-256 + size for every declared file), and surfaces the compression metadata in one read.
 Or with `huggingface_hub` directly:
 ```python
 from huggingface_hub import snapshot_download
 local = snapshot_download("SipsaLabs/qwen3-1.7b-uc2p79")
 ```
 ## Loading the model
 The substituted weights are stored in standard HF FP16 safetensors layout, so any `transformers`-compatible runtime can load the model. Sample:
 ```python
 from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
 import torch
 # Load the compressed weights from this repository
 local = "./models/SipsaLabs_qwen3-1.7b-uc2p79"
 cfg = AutoConfig.from_pretrained(local)
 model = AutoModelForCausalLM.from_pretrained(
    local,
    dtype=torch.float16,
    config=cfg,
 ).to("cuda")
 # NOTE on trust_remote_code: we ship only pure quantized weights.
 # `trust_remote_code=True` is therefore not needed for loading the local
 # artifact. The flag IS still passed to the upstream tokenizer below because
 # the base model's tokenizer uses it; that is the customer's choice to trust
 # the upstream model author.
 #
 # The base Qwen tokenizer is unchanged from Qwen/Qwen3-1.7B.
 # We recommend loading it directly from the upstream Qwen3-1.7B repo,
 # which is the path that the `transformers` AutoTokenizer auto-resolves
 # most cleanly across versions.
 tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B", trust_remote_code=True)
 prompt = "The capital of France is"
 inputs = tok(prompt, return_tensors="pt").to("cuda")
 out = model.generate(**inputs, max_new_tokens=20, do_sample=False)
 print(tok.decode(out[0], skip_special_tokens=True))
 ```
 ## What's in this artifact
 | File | Size | Description |
 |---|---|---|
 | `model.safetensors` | ~3.3 GB | FP16-reconstructed weights — direct `transformers.from_pretrained` compatibility |
 | `model.uc.bin` | ~491 MB | Packed UltraCompress binary at 2.7871 bpw on-disk — load via `pip install ultracompress` |
 | `ultracompress.json` | <2 KB | Provenance manifest with method, bpw, base-model id, USPTO references, license name, per-file SHA-256, `formats` block declaring both weight files |
 | `config.json` | <2 KB | Inherited from the base Qwen3-1.7B model |
 | `tokenizer.json` / `tokenizer_config.json` / `special_tokens_map.json` / `merges.txt` / `vocab.json` / `added_tokens.json` / `chat_template.jinja` | ~14 MB | Tokenizer files copied from the base model |
 | `LICENSE` | ~7 KB | Sipsa Labs Research and Evaluation License v1.0 (full text) |
 | `generation_config.json` | <1 KB | Inherited from base |
 `uc info ./models/SipsaLabs_qwen3-1.7b-uc2p79` will validate every entry in the manifest's `files` block against the actual on-disk size and SHA-256 — tamper-evidence you can read in one command.
 ## Compression details
 | Metric | Value |
 |---|---|
 | **Method** | UltraCompress row-overlay quantization (low-rank correction overlay) |
 | **Method version** | v17hi |
 | **Operating-point bpw (cohort design target)** | **2.798** |
 | **Measured effective bpw (this specific fit)** | **2.7767** (2.7708 body + 0.0059 codec overhead) |
 | **Base model** | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) |
 | **On-disk file size** | ~3.3 GB (FP16 reconstruction; see "Read this first" above) |
 | **Patent posture** | patent-pending,511 (low-rank correction overlay) +  (shared-block parameter dispatch) — patent pending |
 | **Filed** | 2026-04-25 |
 The 2.777-bpw operating point is the v17hi line of the patent-pending row-overlay quantization method described in patent-pending,511. A complementary 2.40-bpw operating point on the same model and base is documented internally (v17 line, packed binary round-trip verified on Qwen3-1.7B + 5 other models in the Sipsa Labs cohort) and will be published as a sibling artifact in this organization.
 ## Catastrophic-failure check
 A "catastrophic failure" is defined as a downstream-task perplexity ratio greater than 10× the FP16 baseline. On Sipsa Labs' internal 6-model cohort (TinyLlama-1.1B, OLMo-2-1B, SmolLM2-1.7B, Qwen3-1.7B, Mistral-7B-v0.3, Qwen3-8B) at the low-rank correction overlay operating point: **0 of 6 models exhibit catastrophic failure**. This artifact (Qwen3-1.7B at 2.777 bpw): **non-catastrophic**.
 The cohort framing matters — this is a property of the *method on this cohort*, not an absolute claim about every possible base model.
 ## Cohort scaling — retention scales with model size
 The same low-rank correction overlay.798 bpw operating point measured across the 6-model Sipsa cohort (n=500, seed=42, WikiText-103 perplexity ratio):
 | Model | Body params | T1 retention vs FP16 | T10 retention vs FP16 | PPL ratio |
 |---|---|---|---|---|
 | OLMo-2-1B | 1.00B | 94.19% | 97.04% | 1.165 |
 | TinyLlama-1.1B | 1.10B | 96.37% | 97.88% | 1.097 |
 | SmolLM2-1.7B | 1.71B | 93.72% | 96.71% | 1.218 |
 | **Qwen3-1.7B (this artifact)** | **1.72B** | **93.81%** | **96.55%** | **1.225** |
 | Mistral-7B-v0.3 | 7.25B | 98.04% | 99.06% | 1.075 |
 | Qwen3-8B | 8.19B | 97.63% | 98.84% | 1.067 |
 **Spearman rank correlation between body-parameter count and T1 retention: +0.486** for UltraCompress. **bitsandbytes NF4 at 4.0 bpw on the same cohort: −0.086** — essentially flat.
 UltraCompress retention scales **+4.32 percentage points** going from 1B to 8B. NF4 scales +1.93 pp. The scaling slope is **2.2× NF4's**.
 The mechanism is design-level: row-overlay's per-row scale + learned codebook + rotation matrix calibrate to the per-model magnitude distribution. Larger transformer matrices give the codebook more rows to learn from. NF4 is a fixed dictionary — no per-model adaptation, no scaling.
 This artifact (Qwen3-1.7B at 93.81% T1) is mid-cohort. The same method on 7B+ class models retains substantially more quality. For 30B+ class teachers the trend extrapolates further (not yet measured — replication invited; open an issue at [github.com/sipsalabs/ultracompress/issues](https://github.com/sipsalabs/ultracompress/issues) with model + seed + result).
 n=6 caveat: this is the 6-model cohort tested by Sipsa Labs. The scaling claim is a property of *the method on this cohort*. Generalization to broader cohorts is the open empirical question.
 ## Quality benchmarks
 A small live benchmark is included below as a sanity-check; the full per-task benchmark numbers are intended to be reproduced in the buyer's own evaluation harness against their own baselines.
 ### Reference benchmark (this artifact, paired against FP16 baseline)
 Same 200-sample subset, same seed (1234), same batch size, same fp16 inference, same lm-eval-harness.
 | Task | FP16 baseline (Qwen/Qwen3-1.7B) | Compressed (this artifact at 2.798 bpw) | Retention |
 |---|---|---|---|
 | HellaSwag — acc | 43.50% (±3.51%) | 40.00% (±3.47%) | **91.95%** |
 | HellaSwag — acc_norm | 49.50% (±3.54%) | 47.00% (±3.54%) | **94.95%** |
 **Both compressed-model values are within ±1 standard error of the FP16 baseline** at n=200 — statistically indistinguishable. For a final-eval-grade benchmark the buyer should run on the full 10042-sample HellaSwag with multiple seeds and broader task coverage; the table above is a reproducible sanity-check, not a final claim.
 ### Reproduce
 ```bash
 # via lm-eval-harness directly (recommended workaround for transformers 4.57.x
 # Qwen3-tokenizer-from-local-path issue — point tokenizer at the upstream repo):
 python -m lm_eval --model hf \
    --model_args "pretrained=./models/SipsaLabs_qwen3-1.7b-uc2p79,tokenizer=Qwen/Qwen3-1.7B,dtype=float16,trust_remote_code=True" \
    --tasks hellaswag,arc_challenge,mmlu \
    --limit 500 --batch_size 8 --device cuda:0
 ```
 For a paired FP16-baseline-vs-compressed comparison on the same task and same seed (the right way to read retention numbers), substitute `pretrained=Qwen/Qwen3-1.7B` in a separate run and compare task-by-task.
 The cohort-level claim (95.6% T1 retention, zero catastrophic failures across 6 models at the low-rank correction overlay operating point) comes from the WikiText-103 perplexity protocol documented in the patent specifications, not from HellaSwag accuracy. Different evaluation surfaces measure different things; the artifact-specific numbers above are the reproducible HellaSwag sanity-check, not the full cohort claim.
 For a Compression Assessment engagement that includes the buyer's specific baseline, evaluation tasks, and a written readout: email founder@sipsalabs.com.
 ## Intended use
 **Permitted under this License (free of charge)**:
 - Personal, non-commercial research
 - Academic research at non-profit institutions (with attribution)
 - Pre-purchase evaluation by an enterprise considering negotiating a commercial license — for a period not to exceed 90 days from first download
 **Requires a separate commercial license** (email legal@sipsalabs.com):
 - Production deployment in any commercial product or service
 - Use in an API or hosted inference service offered to third parties
 - Embedding in or shipping within hardware products, consumer devices, automobiles, robotics platforms
 - Training of any derivative model for commercial use
 - Any use by for-profit entities other than internal evaluation
 The full License is in `LICENSE` and at sipsalabs.com.
 ## Out-of-scope use
 This artifact is published for **research and evaluation**. It is not intended for safety-critical, life-critical, or human-subject decision-making applications. Compression introduces measurable quality regression versus the FP16 baseline; do not deploy this artifact in a setting where that regression is unacceptable. Run the buyer's own evaluation before any production decision.
 ## Limitations
 - The compression methods are post-training and preserve the base model's strengths and weaknesses. Whatever bias, refusal behavior, or out-of-distribution failure modes the Qwen3-1.7B FP16 base has, this artifact inherits.
 - This release stores reconstructed weights in FP16 layout — the runtime savings live in the loader and the future packed `model.uc.bin` artifact, not in this `model.safetensors` file's on-disk footprint.
 - Direct integration with quantization-aware runtimes (llama.cpp / TensorRT-LLM / vLLM quantization paths) is on the v0.2 roadmap. For v0.1.x, `transformers` and the UltraCompress CLI are the supported load paths.
 ## Reproducibility
 Every public claim on this card maps to a verifiable on-disk artifact:
 ```bash
 # Pull this artifact
 uc pull SipsaLabs/qwen3-1.7b-uc2p79
 # Verify the manifest end-to-end (size + SHA-256 for every declared file)
 uc info ./models/SipsaLabs_qwen3-1.7b-uc2p79
 # Reproduce the benchmark numbers in your own evaluation harness
 uc bench ./models/SipsaLabs_qwen3-1.7b-uc2p79 \
    --tasks hellaswag,arc_challenge,mmlu \
    --limit 500 --batch-size 8 --device cuda:0
 ```
 For a SHA-256 manifest of all training and evaluation inputs that produced this artifact (private context, available under NDA): legal@sipsalabs.com.
 ## Citation
 If you use this artifact in research, please cite:
 ```bibtex
@misc{ounnar2026ultracompress,
  title        = {UltraCompress: Patent-Pending Compression Infrastructure for Large Language Models},
  author       = {{Sipsa Labs, Inc.}},
  year         = {2026},
  note         = {U.S.\ patent applications  and , patent pending. Filed 2026-04-25.},
  howpublished = {\url{https://sipsalabs.com}},
 }
 ```
 ## Get in touch
 - **Commercial license**: legal@sipsalabs.com
 - **Bugs / quality regressions**: file at [github.com/sipsalabs/ultracompress/issues](https://github.com/sipsalabs/ultracompress/issues)
 - **Security issues**: security@sipsalabs.com (do not file public issues)
 - **Press / media**: press@sipsalabs.com
 - **Compression Assessment engagement**: founder@sipsalabs.com
 ---
 *Sipsa Labs, Inc. — sipsalabs.com — patent pending — patent-pending (filed 2026-04-25)*
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,28 @@
 {
  "</think>": 151668,
  "</tool_call>": 151658,
  "</tool_response>": 151666,
  "<think>": 151667,
  "<tool_call>": 151657,
  "<tool_response>": 151665,
  "<|box_end|>": 151649,
  "<|box_start|>": 151648,
  "<|endoftext|>": 151643,
  "<|file_sep|>": 151664,
  "<|fim_middle|>": 151660,
  "<|fim_pad|>": 151662,
  "<|fim_prefix|>": 151659,
  "<|fim_suffix|>": 151661,
  "<|im_end|>": 151645,
  "<|im_start|>": 151644,
  "<|image_pad|>": 151655,
  "<|object_ref_end|>": 151647,
  "<|object_ref_start|>": 151646,
  "<|quad_end|>": 151651,
  "<|quad_start|>": 151650,
  "<|repo_name|>": 151663,
  "<|video_pad|>": 151656,
  "<|vision_end|>": 151653,
  "<|vision_pad|>": 151654,
  "<|vision_start|>": 151652
 }
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,89 @@
 {%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
 {%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
    {%- endif %}
 {%- endif %}
 {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
 {%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
 {%- endfor %}
 {%- for message in messages %}
    {%- if message.content is string %}
        {%- set content = message.content %}
    {%- else %}
        {%- set content = '' %}
    {%- endif %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
    {%- elif message.role == "assistant" %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is string %}
            {%- set reasoning_content = message.reasoning_content %}
        {%- else %}
            {%- if '</think>' in content %}
                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
            {%- endif %}
        {%- endif %}
        {%- if loop.index0 > ns.last_query_index %}
            {%- if loop.last or (not loop.last and reasoning_content) %}
                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
            {%- else %}
                {{- '<|im_start|>' + message.role + '\n' + content }}
            {%- endif %}
        {%- else %}
            {{- '<|im_start|>' + message.role + '\n' + content }}
        {%- endif %}
        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if (loop.first and content) or (not loop.first) %}
                    {{- '\n' }}
                {%- endif %}
                {%- if tool_call.function %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n{"name": "' }}
                {{- tool_call.name }}
                {{- '", "arguments": ' }}
                {%- if tool_call.arguments is string %}
                    {{- tool_call.arguments }}
                {%- else %}
                    {{- tool_call.arguments | tojson }}
                {%- endif %}
                {{- '}\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '<|im_start|>user' }}
        {%- endif %}
        {{- '\n<tool_response>\n' }}
        {{- content }}
        {{- '\n</tool_response>' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- endif %}
 {%- endfor %}
 {%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
    {%- if enable_thinking is defined and enable_thinking is false %}
        {{- '<think>\n\n</think>\n\n' }}
    {%- endif %}
 {%- endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,60 @@
 {
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "dtype": "float16",
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 6144,
  "layer_types": [
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention"
  ],
  "max_position_embeddings": 40960,
  "max_window_layers": 28,
  "model_type": "qwen3",
  "num_attention_heads": 16,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "transformers_version": "4.57.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,13 @@
 {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "temperature": 0.6,
  "top_k": 20,
  "top_p": 0.95,
  "transformers_version": "4.57.2"
 }
--- a/merges.txt
+++ b/merges.txt
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:404d96f7ec7a8bca23e9f80b35e2c61366ed1c81762c1a206e874a6fd4207e28
 size 3441185296
--- a/model.uc.bin
+++ b/model.uc.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e42ee0fa9da5070733087e5fab8950c8332c4bbeaa207e566057c4609d31006c
 size 490984295
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,31 @@
 {
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "eos_token": {
    "content": "<|im_end|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,239 @@
 {
  "add_bos_token": false,
  "add_prefix_space": false,
  "added_tokens_decoder": {
    "151643": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151644": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151645": {
      "content": "<|im_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151646": {
      "content": "<|object_ref_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151647": {
      "content": "<|object_ref_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151648": {
      "content": "<|box_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151649": {
      "content": "<|box_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151650": {
      "content": "<|quad_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151651": {
      "content": "<|quad_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151652": {
      "content": "<|vision_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151653": {
      "content": "<|vision_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151654": {
      "content": "<|vision_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151655": {
      "content": "<|image_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151656": {
      "content": "<|video_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151657": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151658": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151659": {
      "content": "<|fim_prefix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151660": {
      "content": "<|fim_middle|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151661": {
      "content": "<|fim_suffix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151662": {
      "content": "<|fim_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151663": {
      "content": "<|repo_name|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151664": {
      "content": "<|file_sep|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151665": {
      "content": "<tool_response>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151666": {
      "content": "</tool_response>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151667": {
      "content": "<think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151668": {
      "content": "</think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    }
  },
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "bos_token": null,
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "errors": "replace",
  "extra_special_tokens": {},
  "model_max_length": 131072,
  "pad_token": "<|endoftext|>",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": null
 }
--- a/ultracompress.json
+++ b/ultracompress.json
@@ -0,0 +1,45 @@
 {
  "base_model": "Qwen/Qwen3-1.7B",
  "bits_per_weight": 2.7767,
  "compressed_at_utc": "2026-04-29T17:58:03Z",
  "compression_notes": "Compressed via UltraCompress row-overlay quantization at the v17hi 2.798-bpw operating point. See https://sipsalabs.com for method overview. Full method is the subject of pending USPTO patent applications. This artifact ships in dual format: model.safetensors (FP16 reconstruction, transformers-compatible) and model.uc.bin (packed binary at 2.7871-bpw on-disk, loadable via `pip install ultracompress`). Round-trip verified between the two formats.",
  "converter_script": "scripts/publish/convert_v17hi_to_hf.py",
  "files": {
    "model.safetensors": {
      "sha256": "404d96f7ec7a8bca23e9f80b35e2c61366ed1c81762c1a206e874a6fd4207e28",
      "size_bytes": 3441185296
    },
    "model.uc.bin": {
      "sha256": "e42ee0fa9da5070733087e5fab8950c8332c4bbeaa207e566057c4609d31006c",
      "size_bytes": 490984295
    }
  },
  "formats": {
    "fp16_reconstruction": {
      "loader": "transformers",
      "notes": "FP16 weights equivalent to a 2.7871-bpw decode; loadable directly via transformers.from_pretrained.",
      "path": "model.safetensors",
      "size_bytes": 3441185296
    },
    "packed_v17hi": {
      "loader": "ultracompress",
      "notes": "Packed binary at the actual 2.7871-bpw on-disk size; load via `pip install ultracompress`. Round-trip verified bit-equivalent to the substitute_v17 path on Qwen3-1.7B WikiText-103 PPL within 0.0976% (under the 0.1% pass threshold).",
      "path": "model.uc.bin",
      "round_trip_verified_ppl_diff_pct": 0.0976,
      "size_bpw_on_disk": 2.7871,
      "size_bytes": 490984295
    }
  },
  "license_name": "Sipsa Labs Research and Evaluation License v1.0",
  "license_notice": "Use of this model is governed by the Sipsa Labs Research and Evaluation License. See LICENSE for full terms. Commercial, OEM, or enterprise use requires a separate license from Sipsa Labs.",
  "method": "row-overlay-quantization",
  "method_track": "Track A",
  "patent_pending": true,
  "provisional_application_numbers": [
    "64/049,511",
    "64/049,517"
  ],
  "publisher": "Sipsa Labs, Inc.",
  "publisher_contact": "legal@sipsalabs.com",
  "uc_version": "0.1.3"
 }
--- a/vocab.json
+++ b/vocab.json