Files
chromadb-context-1-gguf/README.md
ModelHub XC 5b8a25a4d7 初始化项目,由ModelHub XC社区提供模型
Model: ryancook/chromadb-context-1-gguf
Source: Original Platform
2026-05-06 07:58:38 +08:00

112 lines
4.7 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
library_name: gguf
base_model:
- chromadb/context-1
pipeline_tag: text-generation
language: en
tags:
- gguf
- llama.cpp
- gpt-oss
- chromadb
- chroma
- moe
- text-generation
- quantized
---
# Chroma Context-1 — GGUF (llama.cpp)
**GGUF weights for [Chroma Context-1](https://huggingface.co/chromadb/context-1),** converted for **[llama.cpp](https://github.com/ggml-org/llama.cpp)** and any runtime that loads GGUF (LM Studio, Ollama with compatible import paths, local servers, etc.).
This repository exists because **the upstream model is distributed in PyTorch / safetensors form only**. These files are the same weights in **GGUF**, with a range of **llama-quantize** presets so you can trade quality for VRAM and disk.
---
## Upstream (source of truth)
| | Link |
|---|------|
| **Original weights & model card** | [**`chromadb/context-1`**](https://huggingface.co/chromadb/context-1) |
| **Architecture family** | gpt-oss MoE (see upstream card; base traceable to OpenAI **[`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b)**) |
| **License** | **Apache 2.0** (unchanged; you must comply with upstream terms) |
**Attribution:** All tensors are derived from **[chromadb/context-1](https://huggingface.co/chromadb/context-1)**. This repo is a **community conversion** and is **not** affiliated with or endorsed by Chroma. For behavior, safety, and intended use, read the **official** model card first.
---
## Quick start
**1. Install** a recent [llama.cpp](https://github.com/ggml-org/llama.cpp) build (or use a GUI that bundles it).
**2. Download** this repository:
```bash
huggingface-cli download ryancook/chromadb-context-1-gguf --local-dir ./chromadb-context-1-gguf
```
**3. Run** (example — adjust paths and context length to your hardware):
```bash
llama-cli -m ./chromadb-context-1-gguf/chromadb-context-1-Q4_0.gguf -cnv --color -ngl 99
```
Swap the filename for any published `chromadb-context-1-*.gguf` from the **Files** tab (for example `Q4_K_M` or `MXFP4_MOE` when available).
---
## Choosing a file
**Start here (good defaults for most people):**
| Priority | File pattern | When to use |
|----------|----------------|-------------|
| 1 | **`…-Q4_K_M.gguf`** or **`…-Q5_K_M.gguf`** | Best general-purpose balance of quality and size (if present in this repo). |
| 2 | **`…-MXFP4_MOE.gguf`** | Smaller MoE-oriented layout; strong choice when supported by your llama.cpp build/GPU stack. |
| 3 | **`…-Q4_0.gguf`** / **`…-Q5_0.gguf`** | Simpler legacy-style quants; predictable tradeoffs. |
| 4 | **`…-bf16.gguf`** | Full **BF16** fidelity (~40GiB class); for reference or maximum quality when you have RAM/VRAM. |
**Other presets** (IQ*, TQ*, Q2_K, Q3_K*, Q6_K, Q8_0, F16, …) may appear in the **Files** tab as they are published. Lower-bit and ternary formats are **experimental** for quality; profile on your workload before relying on them.
> **Tip:** The **Files and versions** view on Hugging Face is authoritative for what is available in each commit. Filenames follow `chromadb-context-1-<PRESET>.gguf`.
---
## Conversion pipeline
Reproducible high-level steps:
1. **Obtain** weights from [**chromadb/context-1**](https://huggingface.co/chromadb/context-1) (Apache 2.0).
2. **Convert** to GGUF with llama.cpp **`convert_hf_to_gguf.py`** (BF16 output from upstream bf16 checkpoint).
3. **Quantize** with **`llama-quantize`** using the preset named in each filename (`Q4_0`, `Q4_K_M`, `MXFP4_MOE`, etc.).
### Reproducibility
Conversions for this collection were produced with **[ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp)** at commit **`07ba6d275`** (short SHA; matches upstream `convert_hf_to_gguf.py` / `llama-quantize` from that tree). Newer llama.cpp revisions are generally backward compatible for GGUF loading, but you may see small numerical differences if you re-quantize.
---
## Hardware & context
- **VRAM / RAM:** MoE models route only a subset of experts per token; still treat published sizes as a guide and monitor peak usage at your target context length.
- **Context length:** Upstream supports a very long context window; practical limits depend on **KV cache size** and quant. Start with a smaller **`-c`** / context setting and increase only after you confirm stability.
---
## License
Same as upstream: **Apache 2.0**. Keep **[chromadb/context-1](https://huggingface.co/chromadb/context-1)** attribution visible when you redistribute or ship products built on these files.
---
## More from Chroma
- **Official model (safetensors):** [chromadb/context-1](https://huggingface.co/chromadb/context-1)
- **Chroma:** [trychroma.com](https://www.trychroma.com/)
</think>
<tool▁calls▁begin><tool▁call▁begin>
Shell