112 lines
4.7 KiB
Markdown
112 lines
4.7 KiB
Markdown
|
|
---
|
|||
|
|
license: apache-2.0
|
|||
|
|
library_name: gguf
|
|||
|
|
base_model:
|
|||
|
|
- chromadb/context-1
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
language: en
|
|||
|
|
tags:
|
|||
|
|
- gguf
|
|||
|
|
- llama.cpp
|
|||
|
|
- gpt-oss
|
|||
|
|
- chromadb
|
|||
|
|
- chroma
|
|||
|
|
- moe
|
|||
|
|
- text-generation
|
|||
|
|
- quantized
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Chroma Context-1 — GGUF (llama.cpp)
|
|||
|
|
|
|||
|
|
**GGUF weights for [Chroma Context-1](https://huggingface.co/chromadb/context-1),** converted for **[llama.cpp](https://github.com/ggml-org/llama.cpp)** and any runtime that loads GGUF (LM Studio, Ollama with compatible import paths, local servers, etc.).
|
|||
|
|
|
|||
|
|
This repository exists because **the upstream model is distributed in PyTorch / safetensors form only**. These files are the same weights in **GGUF**, with a range of **llama-quantize** presets so you can trade quality for VRAM and disk.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Upstream (source of truth)
|
|||
|
|
|
|||
|
|
| | Link |
|
|||
|
|
|---|------|
|
|||
|
|
| **Original weights & model card** | [**`chromadb/context-1`**](https://huggingface.co/chromadb/context-1) |
|
|||
|
|
| **Architecture family** | gpt-oss MoE (see upstream card; base traceable to OpenAI **[`gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b)**) |
|
|||
|
|
| **License** | **Apache 2.0** (unchanged; you must comply with upstream terms) |
|
|||
|
|
|
|||
|
|
**Attribution:** All tensors are derived from **[chromadb/context-1](https://huggingface.co/chromadb/context-1)**. This repo is a **community conversion** and is **not** affiliated with or endorsed by Chroma. For behavior, safety, and intended use, read the **official** model card first.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Quick start
|
|||
|
|
|
|||
|
|
**1. Install** a recent [llama.cpp](https://github.com/ggml-org/llama.cpp) build (or use a GUI that bundles it).
|
|||
|
|
|
|||
|
|
**2. Download** this repository:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
huggingface-cli download ryancook/chromadb-context-1-gguf --local-dir ./chromadb-context-1-gguf
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**3. Run** (example — adjust paths and context length to your hardware):
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
llama-cli -m ./chromadb-context-1-gguf/chromadb-context-1-Q4_0.gguf -cnv --color -ngl 99
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Swap the filename for any published `chromadb-context-1-*.gguf` from the **Files** tab (for example `Q4_K_M` or `MXFP4_MOE` when available).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Choosing a file
|
|||
|
|
|
|||
|
|
**Start here (good defaults for most people):**
|
|||
|
|
|
|||
|
|
| Priority | File pattern | When to use |
|
|||
|
|
|----------|----------------|-------------|
|
|||
|
|
| 1 | **`…-Q4_K_M.gguf`** or **`…-Q5_K_M.gguf`** | Best general-purpose balance of quality and size (if present in this repo). |
|
|||
|
|
| 2 | **`…-MXFP4_MOE.gguf`** | Smaller MoE-oriented layout; strong choice when supported by your llama.cpp build/GPU stack. |
|
|||
|
|
| 3 | **`…-Q4_0.gguf`** / **`…-Q5_0.gguf`** | Simpler legacy-style quants; predictable tradeoffs. |
|
|||
|
|
| 4 | **`…-bf16.gguf`** | Full **BF16** fidelity (~40 GiB class); for reference or maximum quality when you have RAM/VRAM. |
|
|||
|
|
|
|||
|
|
**Other presets** (IQ*, TQ*, Q2_K, Q3_K*, Q6_K, Q8_0, F16, …) may appear in the **Files** tab as they are published. Lower-bit and ternary formats are **experimental** for quality; profile on your workload before relying on them.
|
|||
|
|
|
|||
|
|
> **Tip:** The **Files and versions** view on Hugging Face is authoritative for what is available in each commit. Filenames follow `chromadb-context-1-<PRESET>.gguf`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Conversion pipeline
|
|||
|
|
|
|||
|
|
Reproducible high-level steps:
|
|||
|
|
|
|||
|
|
1. **Obtain** weights from [**chromadb/context-1**](https://huggingface.co/chromadb/context-1) (Apache 2.0).
|
|||
|
|
2. **Convert** to GGUF with llama.cpp **`convert_hf_to_gguf.py`** (BF16 output from upstream bf16 checkpoint).
|
|||
|
|
3. **Quantize** with **`llama-quantize`** using the preset named in each filename (`Q4_0`, `Q4_K_M`, `MXFP4_MOE`, etc.).
|
|||
|
|
|
|||
|
|
### Reproducibility
|
|||
|
|
|
|||
|
|
Conversions for this collection were produced with **[ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp)** at commit **`07ba6d275`** (short SHA; matches upstream `convert_hf_to_gguf.py` / `llama-quantize` from that tree). Newer llama.cpp revisions are generally backward compatible for GGUF loading, but you may see small numerical differences if you re-quantize.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Hardware & context
|
|||
|
|
|
|||
|
|
- **VRAM / RAM:** MoE models route only a subset of experts per token; still treat published sizes as a guide and monitor peak usage at your target context length.
|
|||
|
|
- **Context length:** Upstream supports a very long context window; practical limits depend on **KV cache size** and quant. Start with a smaller **`-c`** / context setting and increase only after you confirm stability.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## License
|
|||
|
|
|
|||
|
|
Same as upstream: **Apache 2.0**. Keep **[chromadb/context-1](https://huggingface.co/chromadb/context-1)** attribution visible when you redistribute or ship products built on these files.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## More from Chroma
|
|||
|
|
|
|||
|
|
- **Official model (safetensors):** [chromadb/context-1](https://huggingface.co/chromadb/context-1)
|
|||
|
|
- **Chroma:** [trychroma.com](https://www.trychroma.com/)
|
|||
|
|
|
|||
|
|
</think>
|
|||
|
|
|
|||
|
|
|
|||
|
|
<|tool▁calls▁begin|><|tool▁call▁begin|>
|
|||
|
|
Shell
|