初始化项目,由ModelHub XC社区提供模型

Model: kollecter/Mamba_Hermes-3B-GPU-CPU-v0.2-GGUF
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-21 15:39:16 +08:00
commit b047a11c8f
7 changed files with 74 additions and 0 deletions

40
.gitattributes vendored Normal file
View File

@@ -0,0 +1,40 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
ggml-mambahermes-3b-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
ggml-mambahermes-3b-f16.gguf filter=lfs diff=lfs merge=lfs -text
ggml-mambahermes-3b-q6_k.gguf filter=lfs diff=lfs merge=lfs -text
ggml-mambahermes-3b-q5_k.gguf filter=lfs diff=lfs merge=lfs -text
ggml-mambahermes-3b-q4_k.gguf filter=lfs diff=lfs merge=lfs -text

19
README.md Normal file
View File

@@ -0,0 +1,19 @@
---
license: wtfpl
library_name: gguf
pipeline_tag: text-generation
base_model: Q-bert/MambaHermes-3B
---
**NOTE**: These weights require the latest build of llama.cpp (i.e. commit [`c2101a2`](https://github.com/ggerganov/llama.cpp/commit/c2101a2e909ac7c08976d414e64e96c90ee5fa9e)). More details [here](https://github.com/ggerganov/llama.cpp/pull/5328).
From the author of the llama.cpp patch:
> I started working on this as an experiment and because I wanted to try Mamba models with llama.cpp (also, there have been quite a few finetunes already).
Turns out that implementing support for a novel model architecture is quite fun (well, at least when it finally works).
The most powerful machine on which I try LLMs is a low-power laptop with 8GB of ram and an Intel CPU (no discrete GPU), so I can't try Mamba-3B in its full f32 glory (the full weights take 11GB), but at least now it's possible to use it quantized.
> Constant memory usage is a big advantage of Mamba models, but this also means that previous states are not all kept in memory (at least in the current implementation, only the last one is kept), which means there might be more prompt re-processing than necessary in the server example, especially if your client trims the end of the output (it's also problematic that the stop token(s) are not included in the server's responses). The main example has no such problem.
> Currently, the initial text generation speed for Mamba is a bit slower than for Transformer-based models (with empty context), but unlike them, Mamba's speed does not degrade with the amount of tokens processed.
Also note that quantization may make the state unstable (making the output gibberish), but this needs more testing to figure out how much this happens (because I only saw it happen with very small models (130M), and not yet with bigger ones (3B)).
> For testing, I recommend converting from https://huggingface.co/state-spaces/mamba-130m-hf since it's small, the config.json doesn't require modification, the tokenizer is already next to the model files, and the token_embd weight is shared with the output weight, so the download is smaller.

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:22e72d6fdce91d9aae18e73039becb18c099e376f87936c90393aeda1c86d6a5
size 5784587968

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ad0bdc089f6fd3314c421fabaf9d35b58765a51bc1191e6e4a79c1c929740dca
size 2015155232

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d7ae6105643c2e1e2dcb5a86443e889e3105b4dc90710aa310a1eca088010eae
size 2329728032

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0029844a791402f9f710672803604b66cb04a29e577feae2d5876a94f6320826
size 2663961632

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:50cf18c6263bb59f27ffe9aa5330c8abce5df6705552557c690d454c19ca917c
size 3304620032