初始化项目,由ModelHub XC社区提供模型
Model: KBTG-Labs/THaLLE-0.2-ThaiLLM-8B-fa Source: Original Platform
This commit is contained in:
117
README.md
Normal file
117
README.md
Normal file
@@ -0,0 +1,117 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model:
|
||||
- ThaiLLM/ThaiLLM-8B
|
||||
- Qwen/Qwen3-8B
|
||||
- Qwen/Qwen3-8B-Base
|
||||
pipeline_tag: text-generation
|
||||
language:
|
||||
- en
|
||||
- th
|
||||
tags:
|
||||
- finance
|
||||
- mergekit
|
||||
- merge
|
||||
---
|
||||
# THaLLE-ThaiLLM: Domain-Specialized Small LLMs for Finance and Thai
|
||||
|
||||
## Model Overview
|
||||
|
||||
This 8B language model is developed as an extension of ThaiLLM-8B, with a focus on enhancing instruction-following capabilities and financial knowledge. The model is constructed using [mergekit](https://github.com/arcee-ai/mergekit) that integrates ThaiLLM-8B with Qwen3-8B and THaLLE, the latter of which was trained on 80 CFA examination sets.
|
||||
|
||||
**THaLLE-0.2-ThaiLLM-8B-fa** has the following features:
|
||||
- **Supports switching between thinking and non-thinking modes**, similar to [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
|
||||
- **Offers enhanced Thai language understanding** from [ThaiLLM-8B](https://huggingface.co/ThaiLLM/ThaiLLM-8B).
|
||||
- **Incorporates the financial knowledge and understanding** expected of THaLLE fine-tuning.
|
||||
|
||||
## Usage
|
||||
|
||||
### Requirements
|
||||
|
||||
Since `KBTG-Labs/THaLLE-0.2-ThaiLLM-8B-fa` is a fine-tuned of Qwen3-8B you will need to install `transformers>=4.51.0`.
|
||||
|
||||
### Running using Transformers
|
||||
|
||||
Running the script below generates output based on the given input messages.
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
MODEL_ID: str = "KBTG-Labs/THaLLE-0.2-ThaiLLM-8B-fa"
|
||||
|
||||
def inference(messages: list[dict[str, str]], model, tokenizer) -> str:
|
||||
text = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True,
|
||||
enable_thinking=False, # Switches thinking modes.
|
||||
)
|
||||
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
||||
|
||||
generated_ids = model.generate(
|
||||
model_inputs.input_ids,
|
||||
max_new_tokens=768,
|
||||
do_sample=False,
|
||||
temperature=None,
|
||||
top_p=None,
|
||||
top_k=None,
|
||||
pad_token_id=tokenizer.eos_token_id,
|
||||
)
|
||||
generated_ids = [
|
||||
output_ids[len(input_ids) :]
|
||||
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
||||
]
|
||||
|
||||
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
||||
return response
|
||||
|
||||
if __name__ == "__main__":
|
||||
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
MODEL_ID,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
messages = [{"role": "user", "content": "สวัสดี!"}]
|
||||
inference(messages, model, tokenizer)
|
||||
|
||||
```
|
||||
|
||||
## Results
|
||||
|
||||
For more details, see our [Technical Report.](https://arxiv.org/abs/2601.04597)
|
||||
|
||||
| Model | M3 Exam | M6 Exam | Flare CFA* | IC |
|
||||
| --------------------------------------- | --------- | --------- | ---------- | --------- |
|
||||
| Non-Thinking | | | | |
|
||||
| `Qwen3-8B` | 0.660 | 0.545 | 0.753 | 0.640 |
|
||||
| `ThaiLLM-8B-Instruct`** | 0.707 | **0.623** | 0.762 | **0.720** |
|
||||
| `THaLLE-0.2-ThaiLLM-8B-fa` | **0.725** | 0.572 | **0.771** | **0.720** |
|
||||
| Thinking | | | | |
|
||||
| `Qwen3-8B` | 0.706 | 0.590 | 0.806 | 0.600 |
|
||||
| `ThaiLLM-8B-Instruct`** | 0.720 | 0.661 | 0.820 | 0.720 |
|
||||
| `THaLLE-0.2-ThaiLLM-8B-fa` | **0.779** | **0.678** | **0.852** | **0.840** |
|
||||
|
||||
[*] Flare CFA is `"TheFinAI/flare-cfa"`
|
||||
|
||||
[**] `"ThaiLLM-8B-Instruct"` is [KBTG-Labs/ThaiLLM-8B-Instruct](https://huggingface.co/KBTG-Labs/ThaiLLM-8B-Instruct)
|
||||
|
||||
[vLLM](https://github.com/vllm-project/vllm) was used for evaluations, results might vary.
|
||||
|
||||
## Citation
|
||||
|
||||
If you find our work useful, please cite:
|
||||
|
||||
```
|
||||
@misc{labs2026thallethaillmdomainspecializedsmallllms,
|
||||
title={THaLLE-ThaiLLM: Domain-Specialized Small LLMs for Finance and Thai -- Technical Report},
|
||||
author={KBTG Labs and : and Anuruth Lertpiya and Danupat Khamnuansin and Kantapong Sucharitpongpan and Pornchanan Balee and Tawunrat Chalothorn and Thadpong Pongthawornkamol and Monchai Lertsutthiwong},
|
||||
year={2026},
|
||||
eprint={2601.04597},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL},
|
||||
url={https://arxiv.org/abs/2601.04597},
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user