340 lines
15 KiB
Markdown
340 lines
15 KiB
Markdown
---
|
||
license: llama3.2
|
||
language:
|
||
- en
|
||
- zh
|
||
base_model:
|
||
- meta-llama/Llama-3.2-3B
|
||
library_name: transformers
|
||
tags:
|
||
- Taiwan
|
||
- R.O.C
|
||
- zhtw
|
||
- SLM
|
||
- Llama-32
|
||
datasets:
|
||
- lianghsun/tw-reasoning-instruct
|
||
- minyichen/tw-instruct-R1-200k
|
||
- minyichen/tw_mm_R1
|
||
model-index:
|
||
- name: Llama-3.2-3B-F1-Instruct
|
||
results:
|
||
- task:
|
||
type: question-answering
|
||
name: Single Choice Question
|
||
dataset:
|
||
type: ikala/tmmluplus
|
||
name: tmmlu+
|
||
config: all
|
||
split: test
|
||
revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
|
||
metrics:
|
||
- name: single choice
|
||
type: accuracy
|
||
value: 44.11
|
||
- task:
|
||
type: question-answering
|
||
name: Single Choice Question
|
||
dataset:
|
||
type: cais/mmlu
|
||
name: mmlu
|
||
config: all
|
||
split: test
|
||
revision: c30699e
|
||
metrics:
|
||
- name: single choice
|
||
type: accuracy
|
||
value: 50.64
|
||
- task:
|
||
type: question-answering
|
||
name: Single Choice Question
|
||
dataset:
|
||
type: lianghsun/tw-legal-benchmark-v1
|
||
name: tw-legal-benchmark-v1
|
||
config: all
|
||
split: test
|
||
revision: 66c3a5f
|
||
metrics:
|
||
- name: single choice
|
||
type: accuracy
|
||
value: 35.24
|
||
metrics:
|
||
- accuracy
|
||
---
|
||
|
||
# Model Card for Llama-3.2-3B-F1-Instruct (a.k.a __Formosa-1__ or __F1__)
|
||
|
||
<div align="center" style="line-height: 1;">
|
||
<a href="https://discord.gg/Cx737yw4ed" target="_blank" style="margin: 2px;">
|
||
<img alt="Discord" src="https://img.shields.io/badge/Discord-Twinkle%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
|
||
</a>
|
||
<a href="https://huggingface.co/twinkle-ai" target="_blank" style="margin: 2px;">
|
||
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Twinkle%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||
</a>
|
||
</div>
|
||
|
||
<div align="center" style="line-height: 1;">
|
||
<a href="https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt" style="margin: 2px;">
|
||
<img alt="License" src="https://img.shields.io/badge/License-llama3.2-f5de53?&color=0081fb" style="display: inline-block; vertical-align: middle;"/>
|
||
</a>
|
||
</div>
|
||
|
||

|
||
|
||
|
||
<!-- Provide a quick summary of what the model is/does. -->
|
||
**Llama-3.2-3B-F1-Instruct**(a.k.a **Formosa-1** or **F1**) 是由 **[Twinkle AI](https://huggingface.co/twinkle-ai)** 與 **[APMIC](https://www.apmic.ai/)** 合作開發,並在[國家高速網路與計算中心](https://www.nchc.org.tw/)技術指導之下,針對中華民國台灣語境與任務需求所微調之繁體中文語言模型,涵蓋法律、教育、生活應用等多元場景,並以高指令跟隨能力為目標進行強化。
|
||
|
||
## Model Details
|
||
|
||
### Model Description
|
||
|
||
<!-- Provide a longer summary of what this model is. -->
|
||
|
||
- **Developed by:** [Liang Hsun Huang](https://huggingface.co/lianghsun)、[Min Yi Chen](https://huggingface.co/minyichen)、[Wen Bin Lin](https://huggingface.co/tedslin)、[Chao Chun Chuang](https://huggingface.co/c00cjz00) & [Dave Sung](https://huggingface.co/k1dave6412) (All authors have contributed equally to this work.)
|
||
- **Funded by:** [APMIC](https://www.apmic.ai/)
|
||
- **Model type:** LlamaForCausalLM
|
||
- **Language(s) (NLP):** Tranditional Chinese & English
|
||
- **License:** [llama3.2](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt)
|
||
|
||
### Model Sources
|
||
<!-- Provide the basic links for the model. -->
|
||
|
||
- **Repository:** [twinkle-ai/Llama-3.2-3B-F1-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Instruct)
|
||
- **Paper:** (TBA)
|
||
|
||
## Evaluation
|
||
|
||
### Results
|
||
|
||
下表採用 [🌟 Twinkle Eval](https://github.com/ai-twinkle/Eval) 評測框架
|
||
| 模型 | 評測模式 | TMMLU+(%) | 台灣法律(%) | MMLU(%) | 測試次數 | 選項排序 |
|
||
|------------------------------------|---------|----------------|----------------|----------------|---------|---------|
|
||
| [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) | box | 56.15 (±0.0172) | 37.48 (±0.0098) | 74.61 (±0.0154) | 3 | 隨機 |
|
||
| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | box | 15.49 (±0.0104) | 25.68 (±0.0200) | 6.90 (±0.0096) | 3 | 隨機 |
|
||
| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | pattern | 35.85 (±0.0174) | 32.22 (±0.0023) | 59.33 (±0.0168) | 3 | 隨機 |
|
||
| [MediaTek-Research/Llama-Breeze2-3B-Instruct](https://huggingface.co/MediaTek-Research/Llama-Breeze2-3B-Instruct) | pattern | 40.32 (±0.0181) | 38.92 (±0.0193) | 55.37 (±0.0180) | 3 | 隨機 |
|
||
| 🌟[twinkle-ai/Llama-3.2-3B-F1-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Instruct) (ours) | box | 44.11 (±0.0179) | 35.24 (±0.0119) | 50.64 (±0.0189) | 3 | 隨機 |
|
||
|
||
|
||
|
||
### Function Calling Benchmark
|
||
|
||
我們採用了 **BFCL (Berkeley Function Calling Leaderboard)** 來評估模型在 Function Calling(函式呼叫)任務中的表現。
|
||
|
||
測試使用的指標如下:
|
||
|
||
- **AST Accuracy(AST 正確率)**:
|
||
比較模型生成的函式呼叫與目標答案在抽象語法樹(AST)上的結構相似度。涵蓋四種題型:
|
||
- 單一函式(Simple Function)
|
||
- 多函式(Multiple Function)
|
||
- 平行函式(Parallel Function)
|
||
- 平行多函式(Parallel Multiple Function)
|
||
|
||
|
||
| Model | Overall Accuracy | AST Accuracy (S.) | AST Accuracy (M.) | AST Accuracy (P.) | AST Accuracy (P.M.) |
|
||
|-----------------|------------------|-------------------|-------------------|-------------------|---------------------|
|
||
| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | 84 | 92 | 92 | 80 | 74 |
|
||
| [MediaTek-Research/Llama-Breeze2-3B-Instruct](https://huggingface.co/MediaTek-Research/Llama-Breeze2-3B-Instruct) | 85 | 92 | 92 | 84 | 81 |
|
||
| [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | 57 | 56 | 54 | 49 | 35 |
|
||
| [MediaTek-Research/Llama-Breeze2-8B-Instruct](https://huggingface.co/MediaTek-Research/Llama-Breeze2-8B-Instruct) | 87 | 91 | 93 | 86 | 81 |
|
||
| GPT-4o-mini(2024-07-18) | 87 | 91 | 93 | 90 | 84 |
|
||
| 🌟[twinkle-ai/Llama-3.2-3B-F1-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Instruct) (ours) | **91** | 93 | 95 | 91 | 87 |
|
||
|
||
_**Note:** 部分數據取自 [Breeze 的論文](https://arxiv.org/pdf/2501.13921)。_
|
||
|
||
---
|
||
|
||
## 🔧 Tool Calling
|
||
|
||
本模型使用 Hermes 格式訓練,並支援平行呼叫(Parallel calling),以下為完整範例流程。
|
||
Tool call 模板已經為大家寫好放進 chat-template 了,Enjoy it!
|
||
|
||
### 1️⃣ 啟動 vLLM 後端
|
||
|
||
```bash
|
||
vllm serve twinkle-ai/Llama-3.2-3B-F1-Instruct \
|
||
--port 8001 \
|
||
--enable-auto-tool-choice \
|
||
--tool-call-parser hermes
|
||
```
|
||
|
||
### 2️⃣ 定義工具(Functions)
|
||
|
||
```python
|
||
def get_weather(location: str, unit: str):
|
||
return f"{location}的氣溫是{unit}26度,晴朗無風"
|
||
|
||
def search(query: str):
|
||
return "川普終於宣布對等關稅政策,針對 18 個經濟體課徵一半的對等關稅,並從 4/5 起對所有進口產品徵收10%的基準關稅!美國將針對被認定為不當貿易行為(不公平貿易) 的國家,於 4/9 起課徵報復型對等關稅 (Discounted Reciprocal Tariff),例如:日本將被課徵 24% 的關稅,歐盟則為 20%,以取代普遍性的 10% 關稅。\n針對中國則開啟新一波 34% 關稅,並疊加於先前已實施的關稅上,這將使中國進口商品的基本關稅稅率達到 54%,而且這尚未包含拜登總統任內或川普第一任期所施加的額外關稅。加拿大與墨西哥則不適用這套對等關稅制度,但川普認為這些國家在芬太尼危機與非法移民問題尚未完全解決,因此計畫對這兩國的大多數進口商品施加 25% 關稅。另外原本針對汽車與多數其他商品的關稅豁免將於 4/2 到期。\n台灣的部分,美國擬向台灣課徵32%的對等關稅,雖然並未針對晶片特別課徵關稅,但仍在記者會中提到台灣搶奪所有的電腦與半導體晶片,最終促成台積電對美國投資計劃額外加碼 1,000 億美元的歷史性投資;歐盟則課徵20%的對等關稅。最後是汽車關稅將於 4/2 起,對所有外國製造的汽車課徵25% 關稅。"
|
||
|
||
tools = [
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "get_weather",
|
||
"description": "Get the current weather in a given location",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"location": {"type": "string", "description": "國家或城市名, e.g., 'Taipei'、'Jaipei'"},
|
||
"unit": {"type": "string", "description": "氣溫單位,亞洲城市使用攝氏;歐美城市使用華氏", "enum": ["celsius", "fahrenheit"]}
|
||
},
|
||
"required": ["location", "unit"]
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "search",
|
||
"description": "這是一個類似 Google 的搜尋引擎,關於知識、天氣、股票、電影、小說、百科等等問題,如果你不確定答案就搜尋一下。",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"query": {"type": "string", "description": "should be a search query, e.g., '2024 南韓 戒嚴'"}
|
||
},
|
||
"required": ["query"]
|
||
}
|
||
}
|
||
}
|
||
]
|
||
```
|
||
|
||
### 3️⃣ 執行工具調用(Tool Calls)
|
||
|
||
> **⚠️ 注意:system_prompt 可以不用帶,除非是需要時間基準的工具。**
|
||
```python
|
||
response = client.chat.completions.create(
|
||
model=client.models.list().data[0].id,
|
||
messages=[
|
||
{"role": "system", "content": "記住你的知識截止於 2024/12,今天是 2025/4/7"},
|
||
{"role": "user", "content": "台北氣溫如何? 另外,告訴我川普最新關稅政策"},
|
||
],
|
||
max_tokens=1500,
|
||
temperature=0.6,
|
||
top_p=0.95,
|
||
tools=tools,
|
||
tool_choice="auto",
|
||
extra_body={"skip_special_tokens": False}
|
||
)
|
||
|
||
print(response.choices[0].message.tool_calls)
|
||
```
|
||
|
||
#### ⚙️ Tool Calls List:
|
||
|
||
|
||
```json
|
||
[ChatCompletionMessageToolCall(id='chatcmpl-tool-35e74420119349999913a10133b84bd3', function=Function(arguments='{"location": "Taipei", "unit": "celsius"}', name='get_weather'), type='function'), ChatCompletionMessageToolCall(id='chatcmpl-tool-7ffdcb98e59f4134a6171defe7f2e31b', function=Function(arguments='{"query": "Donald Trump latest tariffs policy"}', name='search'), type='function')]
|
||
```
|
||
|
||
### 4️⃣ 產生最終回答
|
||
|
||
```python
|
||
response = client.chat.completions.create(
|
||
model=client.models.list().data[0].id,
|
||
messages=[
|
||
{"role": "system", "content": "記住你的知識截止於 2024/12,今天是 2025/4/7"},
|
||
{"role": "user", "content": "台北氣溫如何? 另外,告訴我川普最新關稅政策"},
|
||
{
|
||
"role": "assistant",
|
||
"content": "",
|
||
"tool_calls": [
|
||
{
|
||
"id": response.choices[0].message.tool_calls[0].id,
|
||
"type": "function",
|
||
"function": {
|
||
"name": response.choices[0].message.tool_calls[0].function.name,
|
||
"arguments": response.choices[0].message.tool_calls[0].function.arguments
|
||
}
|
||
},
|
||
{
|
||
"id": response.choices[0].message.tool_calls[1].id,
|
||
"type": "function",
|
||
"function": {
|
||
"name": response.choices[0].message.tool_calls[1].function.name,
|
||
"arguments": response.choices[0].message.tool_calls[1].function.arguments
|
||
}
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"role": "tool",
|
||
"content": search(**json.loads(response.choices[0].message.tool_calls[0].function.arguments)),
|
||
"tool_call_id": response.choices[0].message.tool_calls[0].id # tool_call_id 必須要帶,才能正確配對 工具 及 tool_call
|
||
},
|
||
{
|
||
"role": "tool",
|
||
"content": get_weather(**json.loads(response.choices[0].message.tool_calls[1].function.arguments)),
|
||
"tool_call_id": response.choices[0].message.tool_calls[1].id # tool_call_id 必須要帶,才能正確配對 工具 及 tool_call
|
||
}
|
||
],
|
||
max_tokens=1500,
|
||
temperature=0.6,
|
||
top_p=0.95,
|
||
tools=tools,
|
||
tool_choice="auto",
|
||
extra_body={"skip_special_tokens": False}
|
||
)
|
||
|
||
print(response.choices[0].message.content)
|
||
```
|
||
|
||
#### 📋 最終輸出內容:
|
||
|
||
```text
|
||
以下是您請求的資訊:
|
||
|
||
**臺北市氣溫**
|
||
- 目前的氣溫為 **26°C**(攝氏)
|
||
- 天候狀況:晴朗無風
|
||
|
||
**川普最新關稅政策概述**
|
||
1. **對等關稅政策**
|
||
- 對 18 個經濟體課徵 50% 的對等關稅
|
||
- 自 4 月 5 日起,所有進口產品全面徵收 10% 基本關稅
|
||
|
||
2. **報復型對等關稅**
|
||
- 日本 24%、歐盟 20%
|
||
|
||
3. **對中國的高額關稅**
|
||
- 增加至 54%(原有關稅 + 新增 34%)
|
||
|
||
4. **特殊案例**
|
||
- 加拿大與墨西哥不適用,但其他商品課徵 25%
|
||
- 汽車與部分商品的免稅即將到期
|
||
|
||
5. **對台灣的影響**
|
||
- 美國計畫對台灣課徵 32% 關稅,但晶片暫無額外課稅
|
||
|
||
6. **全球視角**
|
||
- 歐盟與日本關稅比例相對較高
|
||
```
|
||
|
||
|
||
## Citation
|
||
|
||
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
||
```yaml
|
||
@misc{twinkleai2025llama3.2f1,
|
||
title = {Llama-3.2-3B-F1-Instruct: A Traditional Chinese Instruction-Tuned Language Model for Taiwan},
|
||
author = {Huang, Liang Hsun and Chen, Min Yi and Lin, Wen Bin and Chuang, Chao Chun and Sung, Dave},
|
||
year = {2025},
|
||
howpublished = {\url{https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Instruct}},
|
||
note = {Twinkle AI and APMIC. All authors contributed equally.}
|
||
}
|
||
```
|
||
|
||
## Acknowledge
|
||
- 特此感謝[國家高速網路與計算中心](https://www.nchc.org.tw/)的指導與 [APMIC](https://www.apmic.ai/) 的算力支援,才得以讓本專案訓利完成。
|
||
- 特此致謝黃啟聖老師、許武龍(哈爸)、臺北市立第一女子高級中學物理科陳姿燁老師、[奈視科技](https://nanoseex.com/) CTO Howard、[AIPLUX Technology](https://aiplux.com/)、郭家嘉老師以及所有在資料集製作過程中提供寶貴協助的夥伴。
|
||
|
||
## Model Card Authors
|
||
|
||
[Twinkle AI](https://huggingface.co/twinkle-ai)
|
||
|
||
## Model Card Contact
|
||
|
||
[Twinkle AI](https://huggingface.co/twinkle-ai)
|