FinSenti-DeepSeek-R1-1.5B-GGUF/README.md

---
license: apache-2.0
language:
  - en
base_model: Ayansk11/FinSenti-DeepSeek-R1-1.5B
datasets:
  - Ayansk11/FinSenti-Dataset
pipeline_tag: text-generation
library_name: gguf
tags:
  - finance
  - financial-sentiment
  - chain-of-thought
  - reasoning
  - gguf
  - llama-cpp
  - ollama
  - quantized
  - finsenti
---
# FinSenti-DeepSeek-R1-1.5B - GGUF

GGUF builds of [FinSenti-DeepSeek-R1-1.5B](https://huggingface.co/Ayansk11/FinSenti-DeepSeek-R1-1.5B)
for use with [Ollama](https://ollama.com), [llama.cpp](https://github.com/ggerganov/llama.cpp),
LM Studio, KoboldCpp, and other GGUF-compatible runtimes.

This is the same model as the SafeTensors repo, just converted and
quantized so you can run it on a CPU or a small GPU without pulling in
PyTorch.

## Files in this repo

| File | Quant | Size | Notes |
|------|-------|------|-------|
| `FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf` | Q4_K_M | 1.00 GB | Smallest, mild quality dip. Default pick for laptops. |
| `FinSenti-DeepSeek-R1-1.5B.Q5_K_M.gguf` | Q5_K_M | 1.16 GB | Balanced quality and size. |
| `FinSenti-DeepSeek-R1-1.5B.Q8_0.gguf` | Q8_0 | 1.70 GB | Closest to bf16, biggest file. |

If you're not sure which to pick: **start with Q4_K_M**. It's the smallest
file, it runs everywhere, and the quality drop versus the original bf16
weights is small for a model this size.

## Quick start (llama.cpp)

```bash
# Download the Q4_K_M file (or pick a different quant from the table above)
huggingface-cli download Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf --local-dir .

# Run it
./llama-cli -m FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf \
  --system "You are a financial sentiment analyst. For each headline you receive, write a short reasoning chain inside <reasoning>...</reasoning> tags, then give a single label inside <answer>...</answer> tags. The label must be exactly one of: positive, negative, neutral." \
  -p "Apple beats Q4 estimates as iPhone sales jump 12% year over year." \
  -n 256
```

## Quick start (Ollama)

This repo ships a `Modelfile` for each quant. To register the Q4_K_M build
under the name `finsenti-deepseek-r1-1-5b`:

```bash
huggingface-cli download Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF \
  FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf Modelfile.Q4_K_M --local-dir ./finsenti-tmp
cd finsenti-tmp
ollama create finsenti-deepseek-r1-1-5b -f Modelfile.Q4_K_M

# Then chat with it
ollama run finsenti-deepseek-r1-1-5b "Apple beats Q4 estimates as iPhone sales jump 12% year over year."
```

You should see output like:

```
<reasoning>
Beating estimates is a positive earnings surprise. A 12% YoY iPhone sales jump in the company's biggest product line points to demand strength. Both signals push the read positive.
</reasoning>
<answer>positive</answer>
```

## Quick start (Python via llama-cpp-python)

```python
from llama_cpp import Llama

llm = Llama(
    model_path="./FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf",
    n_ctx=2048,
    n_threads=8,
)

system = (
    "You are a financial sentiment analyst. For each headline you receive, "
    "write a short reasoning chain inside <reasoning>...</reasoning> tags, "
    "then give a single label inside <answer>...</answer> tags. The label "
    "must be exactly one of: positive, negative, neutral."
)

resp = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": "Apple beats Q4 estimates as iPhone sales jump 12% year over year."},
    ],
    max_tokens=256,
    temperature=0.0,
)
print(resp["choices"][0]["message"]["content"])
```

## Hardware

The Q4_K_M build is about 1.00 GB on disk and needs
roughly 2 GB of free RAM at runtime. On a modern laptop
CPU you should see 15-40 tokens per second depending on the size of the
model and your core count. Throwing it on a small GPU (Apple Silicon, a
6-8 GB NVIDIA card) gets you considerably faster generation.

If you need more headroom, the Q5_K_M and Q8_0 files are progressively
closer to the original bf16 quality at the cost of size.

## Picking a quant

- **Q4_K_M** (1.00 GB): the default for laptops
  and small servers. Mild quality dip versus full precision but fits
  almost anywhere.
- **Q5_K_M** (1.16 GB): a step up if you have
  the RAM. Most people won't notice the difference from Q8.
- **Q8_0** (1.70 GB): closest to the bf16 weights.
  Use this if you want the cleanest output and have the disk space.

## Run it on your phone

This model is small enough to run entirely on-device. The Q4_K_M build is
1.00 GB on disk and needs roughly 1.6 GB of free RAM
during inference, so it fits on most phones with 4 GB+ RAM (roughly any
Android flagship from 2020 onward, or iPhone 11 and newer).

### iOS

The easiest path is [PocketPal AI](https://apps.apple.com/app/id6502579498)
(free, App Store):

1. Install PocketPal AI from the App Store.
2. Open the app and go to **Models** -> **+** -> **Add from Hugging Face**.
3. Search for `Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF` and select `FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf`.
4. Tap download; the file is 1.00 GB.
5. Once downloaded, tap the model to load it. Open the chat tab.
6. Set the system prompt (gear icon) to:
   > You are a financial sentiment analyst. For each headline you receive,
   > write a short reasoning chain inside `<reasoning>...</reasoning>` tags,
   > then give a single label inside `<answer>...</answer>` tags. The label
   > must be exactly one of: positive, negative, neutral.
7. Send a headline like *"Apple beats Q4 estimates as iPhone sales jump 12% YoY"*
   and you'll get back the reasoning chain plus the label.

[LLMFarm](https://apps.apple.com/app/id6443968971) and
[Private LLM](https://privatellm.app/) work too if you already use them.

### Android

PocketPal AI is on
[Google Play](https://play.google.com/store/apps/details?id=com.pocketpalai)
as well, with the same flow as the iOS version.

If you'd rather avoid the Play Store,
[ChatterUI](https://github.com/Vali-98/ChatterUI) is a free, open-source
client. Install the APK from the GitHub Releases page, then add the model
from Hugging Face inside the app.

### Tips for phone usage

- **Keep max output tokens around 256.** A reasoning chain plus an answer
  rarely needs more than that.
- **Inference is fully offline** once the model is downloaded. No data
  leaves your phone.
- **Heat and battery:** one classification finishes in a few seconds, but
  running hundreds in a loop will warm the device up. Charge while batching.
- **Stick with Q4_K_M on phones.** The quality difference vs Q5/Q8 for
  sentiment labels is small, and the smaller file leaves more headroom for
  the OS.

## Prompt format

Same as the base model. Use the system prompt verbatim, put the headline
or short snippet in the user turn, and parse the `<answer>...</answer>`
block for the label.

## Limitations

GGUF is a faithful conversion of the base model, so the same caveats apply:

- English only
- Short text only (training context was 2048 tokens)
- Three labels: positive, negative, neutral
- It explains its read but it isn't doing finance research; don't use the
  reasoning chain as investment advice

Quantization adds a small extra error on top of the base model. For
Q4_K_M on a model this size you'll see occasional disagreement with the
bf16 model on borderline headlines, usually neutral-vs-positive flips.

## Related FinSenti models

Other sizes and bases trained with the same recipe:

- **Qwen3**: [Qwen3-0.6B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-0.6B), [Qwen3-1.7B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-1.7B), [Qwen3-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-4B), [Qwen3-8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-8B)
- **Qwen3.5**: [Qwen3.5-0.8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-0.8B), [Qwen3.5-2B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-2B), [Qwen3.5-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-4B), [Qwen3.5-9B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-9B)

The full-precision SafeTensors version of this model is at
[Ayansk11/FinSenti-DeepSeek-R1-1.5B](https://huggingface.co/Ayansk11/FinSenti-DeepSeek-R1-1.5B), and the
training data is at
[Ayansk11/FinSenti-Dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset).

## Citation

```bibtex
@misc{shaikh2026finsenti,
  title  = {FinSenti: Small Language Models for Financial Sentiment with Chain-of-Thought Reasoning},
  author = {Shaikh, Ayan},
  year   = {2026},
  url    = {https://huggingface.co/collections/Ayansk11/finsenti},
  note   = {Indiana University}
}
```

## License

Apache 2.0.
初始化项目，由ModelHub XC社区提供模型 Model: Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF Source: Original Platform 2026-04-24 22:02:07 +08:00			`---`
			`license: apache-2.0`
			`language:`
			`- en`
			`base_model: Ayansk11/FinSenti-DeepSeek-R1-1.5B`
			`datasets:`
			`- Ayansk11/FinSenti-Dataset`
			`pipeline_tag: text-generation`
			`library_name: gguf`
			`tags:`
			`- finance`
			`- financial-sentiment`
			`- chain-of-thought`
			`- reasoning`
			`- gguf`
			`- llama-cpp`
			`- ollama`
			`- quantized`
			`- finsenti`
			`---`
			`# FinSenti-DeepSeek-R1-1.5B - GGUF`

			`GGUF builds of [FinSenti-DeepSeek-R1-1.5B](https://huggingface.co/Ayansk11/FinSenti-DeepSeek-R1-1.5B)`
			`for use with [Ollama](https://ollama.com), [llama.cpp](https://github.com/ggerganov/llama.cpp),`
			`LM Studio, KoboldCpp, and other GGUF-compatible runtimes.`

			`This is the same model as the SafeTensors repo, just converted and`
			`quantized so you can run it on a CPU or a small GPU without pulling in`
			`PyTorch.`

			`## Files in this repo`

			`\| File \| Quant \| Size \| Notes \|`
			`\|------\|-------\|------\|-------\|`
			\| `FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf` \| Q4_K_M \| 1.00 GB \| Smallest, mild quality dip. Default pick for laptops. \|
			\| `FinSenti-DeepSeek-R1-1.5B.Q5_K_M.gguf` \| Q5_K_M \| 1.16 GB \| Balanced quality and size. \|
			\| `FinSenti-DeepSeek-R1-1.5B.Q8_0.gguf` \| Q8_0 \| 1.70 GB \| Closest to bf16, biggest file. \|

			`If you're not sure which to pick: start with Q4_K_M. It's the smallest`
			`file, it runs everywhere, and the quality drop versus the original bf16`
			`weights is small for a model this size.`

			`## Quick start (llama.cpp)`

			```bash
			`# Download the Q4_K_M file (or pick a different quant from the table above)`
			`huggingface-cli download Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf --local-dir .`

			`# Run it`
			`./llama-cli -m FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf \`
			`--system "You are a financial sentiment analyst. For each headline you receive, write a short reasoning chain inside <reasoning>...</reasoning> tags, then give a single label inside <answer>...</answer> tags. The label must be exactly one of: positive, negative, neutral." \`
			`-p "Apple beats Q4 estimates as iPhone sales jump 12% year over year." \`
			`-n 256`
			```

			`## Quick start (Ollama)`

			This repo ships a `Modelfile` for each quant. To register the Q4_K_M build
			under the name `finsenti-deepseek-r1-1-5b`:

			```bash
			`huggingface-cli download Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF \`
			`FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf Modelfile.Q4_K_M --local-dir ./finsenti-tmp`
			`cd finsenti-tmp`
			`ollama create finsenti-deepseek-r1-1-5b -f Modelfile.Q4_K_M`

			`# Then chat with it`
			`ollama run finsenti-deepseek-r1-1-5b "Apple beats Q4 estimates as iPhone sales jump 12% year over year."`
			```

			`You should see output like:`

			```
			`<reasoning>`
			`Beating estimates is a positive earnings surprise. A 12% YoY iPhone sales jump in the company's biggest product line points to demand strength. Both signals push the read positive.`
			`</reasoning>`
			`<answer>positive</answer>`
			```

			`## Quick start (Python via llama-cpp-python)`

			```python
			`from llama_cpp import Llama`

			`llm = Llama(`
			`model_path="./FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf",`
			`n_ctx=2048,`
			`n_threads=8,`
			`)`

			`system = (`
			`"You are a financial sentiment analyst. For each headline you receive, "`
			`"write a short reasoning chain inside <reasoning>...</reasoning> tags, "`
			`"then give a single label inside <answer>...</answer> tags. The label "`
			`"must be exactly one of: positive, negative, neutral."`
			`)`

			`resp = llm.create_chat_completion(`
			`messages=[`
			`{"role": "system", "content": system},`
			`{"role": "user", "content": "Apple beats Q4 estimates as iPhone sales jump 12% year over year."},`
			`],`
			`max_tokens=256,`
			`temperature=0.0,`
			`)`
			`print(resp["choices"][0]["message"]["content"])`
			```

			`## Hardware`

			`The Q4_K_M build is about 1.00 GB on disk and needs`
			`roughly 2 GB of free RAM at runtime. On a modern laptop`
			`CPU you should see 15-40 tokens per second depending on the size of the`
			`model and your core count. Throwing it on a small GPU (Apple Silicon, a`
			`6-8 GB NVIDIA card) gets you considerably faster generation.`

			`If you need more headroom, the Q5_K_M and Q8_0 files are progressively`
			`closer to the original bf16 quality at the cost of size.`

			`## Picking a quant`

			`- Q4_K_M (1.00 GB): the default for laptops`
			`and small servers. Mild quality dip versus full precision but fits`
			`almost anywhere.`
			`- Q5_K_M (1.16 GB): a step up if you have`
			`the RAM. Most people won't notice the difference from Q8.`
			`- Q8_0 (1.70 GB): closest to the bf16 weights.`
			`Use this if you want the cleanest output and have the disk space.`

			`## Run it on your phone`

			`This model is small enough to run entirely on-device. The Q4_K_M build is`
			`1.00 GB on disk and needs roughly 1.6 GB of free RAM`
			`during inference, so it fits on most phones with 4 GB+ RAM (roughly any`
			`Android flagship from 2020 onward, or iPhone 11 and newer).`

			`### iOS`

			`The easiest path is [PocketPal AI](https://apps.apple.com/app/id6502579498)`
			`(free, App Store):`

			`1. Install PocketPal AI from the App Store.`
			`2. Open the app and go to Models -> + -> Add from Hugging Face.`
			3. Search for `Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF` and select `FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf`.
			`4. Tap download; the file is 1.00 GB.`
			`5. Once downloaded, tap the model to load it. Open the chat tab.`
			`6. Set the system prompt (gear icon) to:`
			`> You are a financial sentiment analyst. For each headline you receive,`
			> write a short reasoning chain inside `<reasoning>...</reasoning>` tags,
			> then give a single label inside `<answer>...</answer>` tags. The label
			`> must be exactly one of: positive, negative, neutral.`
			`7. Send a headline like "Apple beats Q4 estimates as iPhone sales jump 12% YoY"`
			`and you'll get back the reasoning chain plus the label.`

			`[LLMFarm](https://apps.apple.com/app/id6443968971) and`
			`[Private LLM](https://privatellm.app/) work too if you already use them.`

			`### Android`

			`PocketPal AI is on`
			`[Google Play](https://play.google.com/store/apps/details?id=com.pocketpalai)`
			`as well, with the same flow as the iOS version.`

			`If you'd rather avoid the Play Store,`
			`[ChatterUI](https://github.com/Vali-98/ChatterUI) is a free, open-source`
			`client. Install the APK from the GitHub Releases page, then add the model`
			`from Hugging Face inside the app.`

			`### Tips for phone usage`

			`- Keep max output tokens around 256. A reasoning chain plus an answer`
			`rarely needs more than that.`
			`- Inference is fully offline once the model is downloaded. No data`
			`leaves your phone.`
			`- Heat and battery: one classification finishes in a few seconds, but`
			`running hundreds in a loop will warm the device up. Charge while batching.`
			`- Stick with Q4_K_M on phones. The quality difference vs Q5/Q8 for`
			`sentiment labels is small, and the smaller file leaves more headroom for`
			`the OS.`

			`## Prompt format`

			`Same as the base model. Use the system prompt verbatim, put the headline`
			or short snippet in the user turn, and parse the `<answer>...</answer>`
			`block for the label.`

			`## Limitations`

			`GGUF is a faithful conversion of the base model, so the same caveats apply:`

			`- English only`
			`- Short text only (training context was 2048 tokens)`
			`- Three labels: positive, negative, neutral`
			`- It explains its read but it isn't doing finance research; don't use the`
			`reasoning chain as investment advice`

			`Quantization adds a small extra error on top of the base model. For`
			`Q4_K_M on a model this size you'll see occasional disagreement with the`
			`bf16 model on borderline headlines, usually neutral-vs-positive flips.`

			`## Related FinSenti models`

			`Other sizes and bases trained with the same recipe:`

			`- Qwen3: [Qwen3-0.6B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-0.6B), [Qwen3-1.7B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-1.7B), [Qwen3-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-4B), [Qwen3-8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-8B)`
			`- Qwen3.5: [Qwen3.5-0.8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-0.8B), [Qwen3.5-2B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-2B), [Qwen3.5-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-4B), [Qwen3.5-9B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-9B)`

			`The full-precision SafeTensors version of this model is at`
			`[Ayansk11/FinSenti-DeepSeek-R1-1.5B](https://huggingface.co/Ayansk11/FinSenti-DeepSeek-R1-1.5B), and the`
			`training data is at`
			`[Ayansk11/FinSenti-Dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset).`

			`## Citation`

			```bibtex
			`@misc{shaikh2026finsenti,`
			`title = {FinSenti: Small Language Models for Financial Sentiment with Chain-of-Thought Reasoning},`
			`author = {Shaikh, Ayan},`
			`year = {2026},`
			`url = {https://huggingface.co/collections/Ayansk11/finsenti},`
			`note = {Indiana University}`
			`}`
			```

			`## License`

			`Apache 2.0.`