---
license: apache-2.0
language:
- en
base_model: Ayansk11/FinSenti-Qwen3-8B
datasets:
- Ayansk11/FinSenti-Dataset
pipeline_tag: text-generation
library_name: gguf
tags:
- finance
- financial-sentiment
- chain-of-thought
- reasoning
- gguf
- llama-cpp
- ollama
- quantized
- finsenti
---
# FinSenti-Qwen3-8B - GGUF
GGUF builds of [FinSenti-Qwen3-8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-8B)
for use with [Ollama](https://ollama.com), [llama.cpp](https://github.com/ggerganov/llama.cpp),
LM Studio, KoboldCpp, and other GGUF-compatible runtimes.
This is the same model as the SafeTensors repo, just converted and
quantized so you can run it on a CPU or a small GPU without pulling in
PyTorch.
## Files in this repo
| File | Quant | Size | Notes |
|------|-------|------|-------|
| `FinSenti-Qwen3-8B.Q4_K_M.gguf` | Q4_K_M | 4.70 GB | Smallest, mild quality dip. Default pick for laptops. |
| `FinSenti-Qwen3-8B.Q5_K_M.gguf` | Q5_K_M | 5.40 GB | Balanced quality and size. |
| `FinSenti-Qwen3-8B.Q8_0.gguf` | Q8_0 | 8.20 GB | Closest to bf16, biggest file. |
If you're not sure which to pick: **start with Q4_K_M**. It's the smallest
file, it runs everywhere, and the quality drop versus the original bf16
weights is small for a model this size.
## Quick start (llama.cpp)
```bash
# Download the Q4_K_M file (or pick a different quant from the table above)
huggingface-cli download Ayansk11/FinSenti-Qwen3-8B-GGUF FinSenti-Qwen3-8B.Q4_K_M.gguf --local-dir .
# Run it
./llama-cli -m FinSenti-Qwen3-8B.Q4_K_M.gguf \
--system "You are a financial sentiment analyst. For each headline you receive, write a short reasoning chain inside ... tags, then give a single label inside ... tags. The label must be exactly one of: positive, negative, neutral." \
-p "Apple beats Q4 estimates as iPhone sales jump 12% year over year." \
-n 256
```
## Quick start (Ollama)
This repo ships a `Modelfile` for each quant. To register the Q4_K_M build
under the name `finsenti-qwen3-8b`:
```bash
huggingface-cli download Ayansk11/FinSenti-Qwen3-8B-GGUF \
FinSenti-Qwen3-8B.Q4_K_M.gguf Modelfile.Q4_K_M --local-dir ./finsenti-tmp
cd finsenti-tmp
ollama create finsenti-qwen3-8b -f Modelfile.Q4_K_M
# Then chat with it
ollama run finsenti-qwen3-8b "Apple beats Q4 estimates as iPhone sales jump 12% year over year."
```
You should see output like:
```
Beating estimates is a positive earnings surprise. A 12% YoY iPhone sales jump in the company's biggest product line points to demand strength. Both signals push the read positive.
positive
```
## Quick start (Python via llama-cpp-python)
```python
from llama_cpp import Llama
llm = Llama(
model_path="./FinSenti-Qwen3-8B.Q4_K_M.gguf",
n_ctx=2048,
n_threads=8,
)
system = (
"You are a financial sentiment analyst. For each headline you receive, "
"write a short reasoning chain inside ... tags, "
"then give a single label inside ... tags. The label "
"must be exactly one of: positive, negative, neutral."
)
resp = llm.create_chat_completion(
messages=[
{"role": "system", "content": system},
{"role": "user", "content": "Apple beats Q4 estimates as iPhone sales jump 12% year over year."},
],
max_tokens=256,
temperature=0.0,
)
print(resp["choices"][0]["message"]["content"])
```
## Hardware
The Q4_K_M build is about 4.70 GB on disk and needs
roughly 6 GB of free RAM at runtime. On a modern laptop
CPU you should see 15-40 tokens per second depending on the size of the
model and your core count. Throwing it on a small GPU (Apple Silicon, a
6-8 GB NVIDIA card) gets you considerably faster generation.
If you need more headroom, the Q5_K_M and Q8_0 files are progressively
closer to the original bf16 quality at the cost of size.
## Picking a quant
- **Q4_K_M** (4.70 GB): the default for laptops
and small servers. Mild quality dip versus full precision but fits
almost anywhere.
- **Q5_K_M** (5.40 GB): a step up if you have
the RAM. Most people won't notice the difference from Q8.
- **Q8_0** (8.20 GB): closest to the bf16 weights.
Use this if you want the cleanest output and have the disk space.
## Prompt format
Same as the base model. Use the system prompt verbatim, put the headline
or short snippet in the user turn, and parse the `...`
block for the label.
## Limitations
GGUF is a faithful conversion of the base model, so the same caveats apply:
- English only
- Short text only (training context was 2048 tokens)
- Three labels: positive, negative, neutral
- It explains its read but it isn't doing finance research; don't use the
reasoning chain as investment advice
Quantization adds a small extra error on top of the base model. For
Q4_K_M on a model this size you'll see occasional disagreement with the
bf16 model on borderline headlines, usually neutral-vs-positive flips.
## Related FinSenti models
Other sizes and bases trained with the same recipe:
- **Qwen3**: [Qwen3-0.6B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-0.6B), [Qwen3-1.7B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-1.7B), [Qwen3-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-4B)
- **Qwen3.5**: [Qwen3.5-0.8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-0.8B), [Qwen3.5-2B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-2B), [Qwen3.5-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-4B), [Qwen3.5-9B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-9B)
- **DeepSeek**: [DeepSeek-R1-1.5B](https://huggingface.co/Ayansk11/FinSenti-DeepSeek-R1-1.5B)
- **MobileLLM**: [MobileLLM-R1-950M](https://huggingface.co/Ayansk11/FinSenti-MobileLLM-R1-950M)
- **Tiny-LLM**: [Tiny-LLM-10M](https://huggingface.co/Ayansk11/FinSenti-Tiny-LLM-10M)
- **Llama-3**: [Llama-3.2-1B](https://huggingface.co/Ayansk11/FinSenti-Llama-3.2-1B)
- **SmolLM**: [SmolLM-1.7B](https://huggingface.co/Ayansk11/FinSenti-SmolLM-1.7B)
The full-precision SafeTensors version of this model is at
[Ayansk11/FinSenti-Qwen3-8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-8B), and the
training data is at
[Ayansk11/FinSenti-Dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset).
## Citation
```bibtex
@misc{shaikh2026finsenti,
title = {FinSenti: Small Language Models for Financial Sentiment with Chain-of-Thought Reasoning},
author = {Shaikh, Ayan},
year = {2026},
url = {https://huggingface.co/collections/Ayansk11/finsenti},
note = {Indiana University}
}
```
## License
Apache 2.0.