--- license: apache-2.0 base_model: - ThaiLLM/ThaiLLM-8B - Qwen/Qwen3-8B - Qwen/Qwen3-8B-Base pipeline_tag: text-generation language: - en - th tags: - finance - mergekit - merge --- # THaLLE-ThaiLLM: Domain-Specialized Small LLMs for Finance and Thai ## Model Overview This 8B language model is developed as an extension of ThaiLLM-8B, with a focus on enhancing instruction-following capabilities and financial knowledge. The model is constructed using [mergekit](https://github.com/arcee-ai/mergekit) that integrates ThaiLLM-8B with Qwen3-8B and THaLLE, the latter of which was trained on 80 CFA examination sets. **THaLLE-0.2-ThaiLLM-8B-fa** has the following features: - **Supports switching between thinking and non-thinking modes**, similar to [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). - **Offers enhanced Thai language understanding** from [ThaiLLM-8B](https://huggingface.co/ThaiLLM/ThaiLLM-8B). - **Incorporates the financial knowledge and understanding** expected of THaLLE fine-tuning. ## Usage ### Requirements Since `KBTG-Labs/THaLLE-0.2-ThaiLLM-8B-fa` is a fine-tuned of Qwen3-8B you will need to install `transformers>=4.51.0`. ### Running using Transformers Running the script below generates output based on the given input messages. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_ID: str = "KBTG-Labs/THaLLE-0.2-ThaiLLM-8B-fa" def inference(messages: list[dict[str, str]], model, tokenizer) -> str: text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False, # Switches thinking modes. ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=768, do_sample=False, temperature=None, top_p=None, top_k=None, pad_token_id=tokenizer.eos_token_id, ) generated_ids = [ output_ids[len(input_ids) :] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] return response if __name__ == "__main__": tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) model = AutoModelForCausalLM.from_pretrained( MODEL_ID, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [{"role": "user", "content": "สวัสดี!"}] inference(messages, model, tokenizer) ``` ## Results For more details, see our [Technical Report.](https://arxiv.org/abs/2601.04597) | Model | M3 Exam | M6 Exam | Flare CFA* | IC | | --------------------------------------- | --------- | --------- | ---------- | --------- | | Non-Thinking | | | | | | `Qwen3-8B` | 0.660 | 0.545 | 0.753 | 0.640 | | `ThaiLLM-8B-Instruct`** | 0.707 | **0.623** | 0.762 | **0.720** | | `THaLLE-0.2-ThaiLLM-8B-fa` | **0.725** | 0.572 | **0.771** | **0.720** | | Thinking | | | | | | `Qwen3-8B` | 0.706 | 0.590 | 0.806 | 0.600 | | `ThaiLLM-8B-Instruct`** | 0.720 | 0.661 | 0.820 | 0.720 | | `THaLLE-0.2-ThaiLLM-8B-fa` | **0.779** | **0.678** | **0.852** | **0.840** | [*] Flare CFA is `"TheFinAI/flare-cfa"` [**] `"ThaiLLM-8B-Instruct"` is [KBTG-Labs/ThaiLLM-8B-Instruct](https://huggingface.co/KBTG-Labs/ThaiLLM-8B-Instruct) [vLLM](https://github.com/vllm-project/vllm) was used for evaluations, results might vary. ## Citation If you find our work useful, please cite: ``` @misc{labs2026thallethaillmdomainspecializedsmallllms, title={THaLLE-ThaiLLM: Domain-Specialized Small LLMs for Finance and Thai -- Technical Report}, author={KBTG Labs and : and Anuruth Lertpiya and Danupat Khamnuansin and Kantapong Sucharitpongpan and Pornchanan Balee and Tawunrat Chalothorn and Thadpong Pongthawornkamol and Monchai Lertsutthiwong}, year={2026}, eprint={2601.04597}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2601.04597}, } ```