--- license: apache-2.0 language: - en base_model: Ayansk11/FinSenti-DeepSeek-R1-1.5B datasets: - Ayansk11/FinSenti-Dataset pipeline_tag: text-generation library_name: gguf tags: - finance - financial-sentiment - chain-of-thought - reasoning - gguf - llama-cpp - ollama - quantized - finsenti --- # FinSenti-DeepSeek-R1-1.5B - GGUF GGUF builds of [FinSenti-DeepSeek-R1-1.5B](https://huggingface.co/Ayansk11/FinSenti-DeepSeek-R1-1.5B) for use with [Ollama](https://ollama.com), [llama.cpp](https://github.com/ggerganov/llama.cpp), LM Studio, KoboldCpp, and other GGUF-compatible runtimes. This is the same model as the SafeTensors repo, just converted and quantized so you can run it on a CPU or a small GPU without pulling in PyTorch. ## Files in this repo | File | Quant | Size | Notes | |------|-------|------|-------| | `FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf` | Q4_K_M | 1.00 GB | Smallest, mild quality dip. Default pick for laptops. | | `FinSenti-DeepSeek-R1-1.5B.Q5_K_M.gguf` | Q5_K_M | 1.16 GB | Balanced quality and size. | | `FinSenti-DeepSeek-R1-1.5B.Q8_0.gguf` | Q8_0 | 1.70 GB | Closest to bf16, biggest file. | If you're not sure which to pick: **start with Q4_K_M**. It's the smallest file, it runs everywhere, and the quality drop versus the original bf16 weights is small for a model this size. ## Quick start (llama.cpp) ```bash # Download the Q4_K_M file (or pick a different quant from the table above) huggingface-cli download Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf --local-dir . # Run it ./llama-cli -m FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf \ --system "You are a financial sentiment analyst. For each headline you receive, write a short reasoning chain inside ... tags, then give a single label inside ... tags. The label must be exactly one of: positive, negative, neutral." \ -p "Apple beats Q4 estimates as iPhone sales jump 12% year over year." \ -n 256 ``` ## Quick start (Ollama) This repo ships a `Modelfile` for each quant. To register the Q4_K_M build under the name `finsenti-deepseek-r1-1-5b`: ```bash huggingface-cli download Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF \ FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf Modelfile.Q4_K_M --local-dir ./finsenti-tmp cd finsenti-tmp ollama create finsenti-deepseek-r1-1-5b -f Modelfile.Q4_K_M # Then chat with it ollama run finsenti-deepseek-r1-1-5b "Apple beats Q4 estimates as iPhone sales jump 12% year over year." ``` You should see output like: ``` Beating estimates is a positive earnings surprise. A 12% YoY iPhone sales jump in the company's biggest product line points to demand strength. Both signals push the read positive. positive ``` ## Quick start (Python via llama-cpp-python) ```python from llama_cpp import Llama llm = Llama( model_path="./FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf", n_ctx=2048, n_threads=8, ) system = ( "You are a financial sentiment analyst. For each headline you receive, " "write a short reasoning chain inside ... tags, " "then give a single label inside ... tags. The label " "must be exactly one of: positive, negative, neutral." ) resp = llm.create_chat_completion( messages=[ {"role": "system", "content": system}, {"role": "user", "content": "Apple beats Q4 estimates as iPhone sales jump 12% year over year."}, ], max_tokens=256, temperature=0.0, ) print(resp["choices"][0]["message"]["content"]) ``` ## Hardware The Q4_K_M build is about 1.00 GB on disk and needs roughly 2 GB of free RAM at runtime. On a modern laptop CPU you should see 15-40 tokens per second depending on the size of the model and your core count. Throwing it on a small GPU (Apple Silicon, a 6-8 GB NVIDIA card) gets you considerably faster generation. If you need more headroom, the Q5_K_M and Q8_0 files are progressively closer to the original bf16 quality at the cost of size. ## Picking a quant - **Q4_K_M** (1.00 GB): the default for laptops and small servers. Mild quality dip versus full precision but fits almost anywhere. - **Q5_K_M** (1.16 GB): a step up if you have the RAM. Most people won't notice the difference from Q8. - **Q8_0** (1.70 GB): closest to the bf16 weights. Use this if you want the cleanest output and have the disk space. ## Run it on your phone This model is small enough to run entirely on-device. The Q4_K_M build is 1.00 GB on disk and needs roughly 1.6 GB of free RAM during inference, so it fits on most phones with 4 GB+ RAM (roughly any Android flagship from 2020 onward, or iPhone 11 and newer). ### iOS The easiest path is [PocketPal AI](https://apps.apple.com/app/id6502579498) (free, App Store): 1. Install PocketPal AI from the App Store. 2. Open the app and go to **Models** -> **+** -> **Add from Hugging Face**. 3. Search for `Ayansk11/FinSenti-DeepSeek-R1-1.5B-GGUF` and select `FinSenti-DeepSeek-R1-1.5B.Q4_K_M.gguf`. 4. Tap download; the file is 1.00 GB. 5. Once downloaded, tap the model to load it. Open the chat tab. 6. Set the system prompt (gear icon) to: > You are a financial sentiment analyst. For each headline you receive, > write a short reasoning chain inside `...` tags, > then give a single label inside `...` tags. The label > must be exactly one of: positive, negative, neutral. 7. Send a headline like *"Apple beats Q4 estimates as iPhone sales jump 12% YoY"* and you'll get back the reasoning chain plus the label. [LLMFarm](https://apps.apple.com/app/id6443968971) and [Private LLM](https://privatellm.app/) work too if you already use them. ### Android PocketPal AI is on [Google Play](https://play.google.com/store/apps/details?id=com.pocketpalai) as well, with the same flow as the iOS version. If you'd rather avoid the Play Store, [ChatterUI](https://github.com/Vali-98/ChatterUI) is a free, open-source client. Install the APK from the GitHub Releases page, then add the model from Hugging Face inside the app. ### Tips for phone usage - **Keep max output tokens around 256.** A reasoning chain plus an answer rarely needs more than that. - **Inference is fully offline** once the model is downloaded. No data leaves your phone. - **Heat and battery:** one classification finishes in a few seconds, but running hundreds in a loop will warm the device up. Charge while batching. - **Stick with Q4_K_M on phones.** The quality difference vs Q5/Q8 for sentiment labels is small, and the smaller file leaves more headroom for the OS. ## Prompt format Same as the base model. Use the system prompt verbatim, put the headline or short snippet in the user turn, and parse the `...` block for the label. ## Limitations GGUF is a faithful conversion of the base model, so the same caveats apply: - English only - Short text only (training context was 2048 tokens) - Three labels: positive, negative, neutral - It explains its read but it isn't doing finance research; don't use the reasoning chain as investment advice Quantization adds a small extra error on top of the base model. For Q4_K_M on a model this size you'll see occasional disagreement with the bf16 model on borderline headlines, usually neutral-vs-positive flips. ## Related FinSenti models Other sizes and bases trained with the same recipe: - **Qwen3**: [Qwen3-0.6B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-0.6B), [Qwen3-1.7B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-1.7B), [Qwen3-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-4B), [Qwen3-8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-8B) - **Qwen3.5**: [Qwen3.5-0.8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-0.8B), [Qwen3.5-2B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-2B), [Qwen3.5-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-4B), [Qwen3.5-9B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-9B) The full-precision SafeTensors version of this model is at [Ayansk11/FinSenti-DeepSeek-R1-1.5B](https://huggingface.co/Ayansk11/FinSenti-DeepSeek-R1-1.5B), and the training data is at [Ayansk11/FinSenti-Dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset). ## Citation ```bibtex @misc{shaikh2026finsenti, title = {FinSenti: Small Language Models for Financial Sentiment with Chain-of-Thought Reasoning}, author = {Shaikh, Ayan}, year = {2026}, url = {https://huggingface.co/collections/Ayansk11/finsenti}, note = {Indiana University} } ``` ## License Apache 2.0.