114 lines
4.4 KiB
Markdown
114 lines
4.4 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
library_name: transformers
|
||
|
|
datasets:
|
||
|
|
- rLLM/rLLM-FinQA-Dataset
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
base_model:
|
||
|
|
- Qwen/Qwen3-4B-Instruct-2507
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
tags:
|
||
|
|
- finance
|
||
|
|
- tool-use
|
||
|
|
- agent
|
||
|
|
---
|
||
|
|
<div align="center">
|
||
|
|
<span style="font-family: default; font-size: 1.5em;">FinQA</span>
|
||
|
|
<div>
|
||
|
|
Training Financial Agents with Reinforcement Learning
|
||
|
|
</div>
|
||
|
|
</div>
|
||
|
|
<br>
|
||
|
|
<div align="center" style="line-height: 1;">
|
||
|
|
<a href="https://github.com/rllm-org/rllm" style="margin: 2px;">
|
||
|
|
<img alt="Code" src="https://img.shields.io/badge/FinQA-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||
|
|
</a>
|
||
|
|
<a href="https://rllm-project.com/post.html?post=finqa.md" target="_blank" style="margin: 2px;">
|
||
|
|
<img alt="Blog" src="https://img.shields.io/badge/Blog-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||
|
|
</a>
|
||
|
|
<a href="https://x.com/rllm_project" style="margin: 2px;">
|
||
|
|
<img alt="X.ai" src="https://img.shields.io/badge/rLLM-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||
|
|
</a>
|
||
|
|
<a href="https://huggingface.co/rLLM" style="margin: 2px;">
|
||
|
|
<img alt="Hugging Face" src="https://img.shields.io/badge/rLLM-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
|
||
|
|
</a>
|
||
|
|
</div>
|
||
|
|
</div>
|
||
|
|
</div>
|
||
|
|
|
||
|
|
## FinQA Overview
|
||
|
|
|
||
|
|
FinQA is a financial question-answering agent fine-tuned from Qwen3-4B-Instruct-2507 using reinforcement learning (RL). The model answers questions about SEC 10-K financial statements using specialized tools (SQL queries, table lookup, calculators), achieving 59.70% accuracy on Snorkel Finance Benchmark and 26.6% on Snorkel Finance Reasoning.
|
||
|
|
|
||
|
|
## Data
|
||
|
|
|
||
|
|
Our training dataset is built from SEC 10-K filings and consists of 5,110 question-answer pairs across:
|
||
|
|
- **207 companies** spanning multiple sectors
|
||
|
|
- **6,923 financial tables** extracted from 10-K filings
|
||
|
|
- **Single-table questions**: Direct lookups and calculations from individual tables
|
||
|
|
- **Multi-table questions**: Cross-table reasoning requiring data from multiple sources
|
||
|
|
|
||
|
|
The dataset is available on [HuggingFace](https://huggingface.co/datasets/rLLM/rLLM-FinQA-Dataset).
|
||
|
|
|
||
|
|
## Tools
|
||
|
|
|
||
|
|
The agent uses 4 specialized tools for financial analysis:
|
||
|
|
|
||
|
|
| Tool | Description |
|
||
|
|
|------|-------------|
|
||
|
|
| `get_table_names` | List available tables for a given company |
|
||
|
|
| `get_table_info` | Get table metadata, columns, dtypes, and sample values |
|
||
|
|
| `sql_query` | Execute SQL queries on financial tables (SQLite) |
|
||
|
|
| `calculator` | Evaluate mathematical expressions |
|
||
|
|
|
||
|
|
## Training
|
||
|
|
|
||
|
|
We fine-tune Qwen3-4B-Instruct-2507 using GRPO with LLM-as-judge rewards for correctness evaluation. A more detailed description of the training recipe can be found in our [documentation](https://rllm-project.readthedocs.io/en/latest/projects/finqa/).
|
||
|
|
|
||
|
|
## Evaluation
|
||
|
|
|
||
|
|
| Model | FinQA | FinQA Reasoning |
|
||
|
|
|-------|-------|-----------------|
|
||
|
|
| Qwen3-4B-Instruct-2507 (Base) | 27.90% | 13.90% |
|
||
|
|
| gpt-5-nano-2025-08-07 | 50.00% | 26.60% |
|
||
|
|
| Qwen3-235B-A22B | 51.37% | 18.90% |
|
||
|
|
| **rLLM-FinQA-4B (Ours)** | **59.70%** | **26.60%** |
|
||
|
|
| Gemini-2.5-Pro-Preview | 60.60% | 34.60% |
|
||
|
|
| GPT-4.1-2025-04-14 | 62.70% | 37.90% |
|
||
|
|
| o3-mini-2025-01-31 | 63.79% | 30.37% |
|
||
|
|
|
||
|
|
|
||
|
|
## Serving FinQA
|
||
|
|
|
||
|
|
Start a vLLM server and run the agent:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python -m vllm.entrypoints.openai.api_server \
|
||
|
|
--model rLLM/rLLM-FinQA-4B \
|
||
|
|
--host 0.0.0.0 \
|
||
|
|
--port 30000 \
|
||
|
|
--dtype bfloat16
|
||
|
|
|
||
|
|
python -m projects.finqa.run_finqa
|
||
|
|
```
|
||
|
|
|
||
|
|
For detailed setup instructions, see the [project README](https://github.com/rllm-org/rllm/tree/main/projects/finqa).
|
||
|
|
|
||
|
|
## Acknowledgement
|
||
|
|
|
||
|
|
- This is a joint collaboration between the [rLLM](https://github.com/rllm-org/rllm) team at UC Berkeley and [Snorkel AI](https://snorkel.ai/).
|
||
|
|
- Our model is trained on top of [`Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
|
||
|
|
- Our work is done as part of [Berkeley Sky Computing Lab](https://skycomputing.berkeley.edu/).
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
|
||
|
|
```bibtex
|
||
|
|
@misc{rllm2026finqa,
|
||
|
|
title={FinQA: Training Financial Agents with Reinforcement Learning},
|
||
|
|
author={Manan Roongta and Sijun Tan and Bhavishya Pohani and Charles Dickens and Christopher Glaze},
|
||
|
|
year={2026},
|
||
|
|
howpublished={\url{https://rllm-project.com/post.html?post=finqa.md}},
|
||
|
|
note={Blog Post}
|
||
|
|
}
|
||
|
|
```
|