Files
Llama-3.2-Taiwan-3B-Instruct/README.md
ModelHub XC 5197abcc1a 初始化项目,由ModelHub XC社区提供模型
Model: lianghsun/Llama-3.2-Taiwan-3B-Instruct
Source: Original Platform
2026-05-22 22:02:30 +08:00

48 KiB
Raw Blame History

license, language, base_model, datasets, pipeline_tag, library_name, tags, model-index, widget, metrics
license language base_model datasets pipeline_tag library_name tags model-index widget metrics
llama3.2
zh
en
it
de
fr
ja
ko
meta-llama/Llama-3.2-3B
lianghsun/Llama-3.2-Taiwan-3B
lianghsun/tw-emergency-medicine-bench
lianghsun/tw-legal-nlp
lianghsun/tw-legal-synthetic-qa
lianghsun/tw-law-article-qa
lianghsun/tw-judgment-qa
lianghsun/tw-judgment-gist-chat
lianghsun/tw-bar-examination-2020-chat
lianghsun/tw-structured-law-article
lianghsun/tw-judgment-gist-chat
lianghsun/tw-contract-review-chat
lianghsun/reasoning-base-20k-chat
lianghsun/vulnerability-mitigation-qa-zh_tw
lianghsun/tw-instruct
rombodawg/Everything_Instruct_Multilingual
xzuyn/manythings-translations-alpaca
neural-bridge/rag-dataset-12000
minyichen/glaive_toolcall_zh_tw
text-generation transformers
Taiwan
ROC
zh-tw
instruct
chat
llama3.2
SLM
name results
Llama-3.2-Taiwan-3B-Instruct
task dataset metrics
type name
text-generation Single Choice Question
type name
lianghsun/tw-legal-benchmark-v1 tw-legal-benchmark-v1
name type value
single choice accuracy 31.1
task dataset metrics
type name
text-generation Single Choice Question
type name config split revision
lianghsun/Formosa-bench (Society) Formosa Taiwan Knowledge Bench society test v2024.11.27
name type value
single choice accuracy 60.42
task dataset metrics
type name
text-generation Single Choice Question
type name config split revision
lianghsun/Formosa-bench (Governmnt) Formosa Taiwan Knowledge Bench governmnt test v2024.11.27
name type value
single choice accuracy 44.25
task dataset metrics
type name
text-generation Single Choice Question
type name config split revision
lianghsun/Formosa-bench (Geography) Formosa Taiwan Knowledge Bench geography test v2024.11.27
name type value
single choice accuracy 47.54
task dataset metrics
type name
text-generation Single Choice Question
type name config split revision
lianghsun/Formosa-bench (History) Formosa Taiwan Knowledge Bench history test v2024.11.27
name type value
single choice accuracy 60
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (geography_of_taiwan) tmmlu++ geography_of_taiwan test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 36.2
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (dentistry) tmmlu++ dentistry test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 33.83
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (technical) tmmlu++ technical test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 35.07
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (statistics_and_machine_learning) tmmlu++ statistics_and_machine_learning test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 28.57
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (clinical_psychology) tmmlu++ clinical_psychology test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 29.6
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (tve_design) tmmlu++ tve_design test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 38.54
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (three_principles_of_people) tmmlu++ three_principles_of_people test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 48.2
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (introduction_to_law) tmmlu++ introduction_to_law test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 29.96
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (linear_algebra) tmmlu++ linear_algebra test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 21.43
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (agriculture) tmmlu++ agriculture test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 24.5
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (jce_humanities) tmmlu++ jce_humanities test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 38.89
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (music) tmmlu++ music test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 25.9
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (secondary_physics) tmmlu++ secondary_physics test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 33.04
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (physics) tmmlu++ physics test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 27.84
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (advance_chemistry) tmmlu++ advance_chemistry test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 27.64
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (junior_science_exam) tmmlu++ junior_science_exam test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 30.05
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (veterinary_pathology) tmmlu++ veterinary_pathology test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 25.09
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (financial_analysis) tmmlu++ financial_analysis test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 25.13
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (national_protection) tmmlu++ national_protection test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 42.65
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (macroeconomics) tmmlu++ macroeconomics test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 26.76
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (politic_science) tmmlu++ politic_science test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 27.44
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (ttqav2) tmmlu++ ttqav2 test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 61.06
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (junior_chinese_exam) tmmlu++ junior_chinese_exam test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 30.86
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (traditional_chinese_medicine_clinical_medicine) tmmlu++ traditional_chinese_medicine_clinical_medicine test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 25.9
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (junior_math_exam) tmmlu++ junior_math_exam test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 21.71
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (auditing) tmmlu++ auditing test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 21.82
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (anti_money_laundering) tmmlu++ anti_money_laundering test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 37.31
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (pharmacology) tmmlu++ pharmacology test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 30.68
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (trust_practice) tmmlu++ trust_practice test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 28.18
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (tve_mathematics) tmmlu++ tve_mathematics test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 18.67
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (human_behavior) tmmlu++ human_behavior test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 32.04
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (pharmacy) tmmlu++ pharmacy test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 22.76
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (tve_chinese_language) tmmlu++ tve_chinese_language test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 36.65
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (optometry) tmmlu++ optometry test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 25.11
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (physical_education) tmmlu++ physical_education test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 30.73
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (organic_chemistry) tmmlu++ organic_chemistry test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 35.78
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (tve_natural_sciences) tmmlu++ tve_natural_sciences test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 33.73
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (education) tmmlu++ education test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 37.9
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (mechanical) tmmlu++ mechanical test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 42.37
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (taiwanese_hokkien) tmmlu++ taiwanese_hokkien test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 14.73
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (nautical_science) tmmlu++ nautical_science test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 30.49
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (business_management) tmmlu++ business_management test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 39.57
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (logic_reasoning) tmmlu++ logic_reasoning test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 27.34
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (marketing_management) tmmlu++ marketing_management test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 39.78
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (economics) tmmlu++ economics test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 25.95
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (basic_medical_science) tmmlu++ basic_medical_science test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 28.41
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (occupational_therapy_for_psychological_disorders) tmmlu++ occupational_therapy_for_psychological_disorders test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 35.73
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (general_principles_of_law) tmmlu++ general_principles_of_law test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 31.13
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (junior_chemistry) tmmlu++ junior_chemistry test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 24.88
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (veterinary_pharmacology) tmmlu++ veterinary_pharmacology test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 36.3
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (educational_psychology) tmmlu++ educational_psychology test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 33.52
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (finance_banking) tmmlu++ finance_banking test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 32.59
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (official_document_management) tmmlu++ official_document_management test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 32.43
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (fire_science) tmmlu++ fire_science test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 30.65
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (junior_social_studies) tmmlu++ junior_social_studies test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 47.62
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (accounting) tmmlu++ accounting test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 20.94
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (engineering_math) tmmlu++ engineering_math test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 27.18
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (education_(profession_level)) tmmlu++ education_(profession_level) test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 24.07
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (chinese_language_and_literature) tmmlu++ chinese_language_and_literature test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 27.64
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (management_accounting) tmmlu++ management_accounting test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 24.19
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (culinary_skills) tmmlu++ culinary_skills test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 39.38
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (administrative_law) tmmlu++ administrative_law test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 25.71
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (insurance_studies) tmmlu++ insurance_studies test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 33.42
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (real_estate) tmmlu++ real_estate test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 22.83
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (computer_science) tmmlu++ computer_science test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 31.61
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (taxation) tmmlu++ taxation test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 27.47
task dataset metrics
type name
question-answering Single Choice Question
type name config split revision
ikala/tmmluplus (trade) tmmlu++ trade test c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
name type value
single choice accuracy 20.32
text
中華民國憲法第一條
accuracy

Model Card for lianghsun/Llama-3.2-Taiwan-3B-Instruct

[👋 歡迎加入 Discord 討論 🎉]

image/png 圖像生成來自 OpenArtAn anime-style 🦙 standing proudly atop the summit of Taiwans Yushan (Jade Mountain), gazing forward.

採用 lianghsun/Llama-3.2-Taiwan-3B基礎模型foundation model,使用大量中華民國台灣的繁體中文對話集和多國語言對話集進行模型指令微調instruction fine-tuning和多輪迭代直接偏好優化direct preference optimization, DPO,旨在訓練出具有中華民國台灣知識及風格的小語言模型small langugae model, SLM之對話模型。

Model Change Log
Update Date Model Version Key Changes
2025/01/01 v2025.01.01 Fine-tuning is based on the foundation model version v2024.12.28, and it uses self-prepared instruction datasets for this round of fine-tuning.
2024/12/13 v2024.12.13 Completed 1st round DPO training (10/10 epochs). Preparing for next round DPO training.
2024/11/27 v2024.11.27 Completed SFT training (5/5 epochs). Preparing for multi-round DPO training.
2024/11/25 v2024.11.25 Updated model version to v2024.11.25, training progressed to (3/5) epochs. Still in SFT stage, DPO training remains pending.
2024/11/22 v2024.11.22 Initial upload: Model version v2024.11.22, training completed up to (1/5) epochs. Currently trained only on SFT, DPO training not yet performed.

Model Details

Model Description

Model Sources

user: 請介紹台灣
assistant: 台灣位於亞洲東部地處太平洋與菲律賓海之間面積約36,000平方公里人口約2,300萬是民主自由的國家經濟實力強勁擁有世界第10大經濟體。台灣以美食、文化、自然美景著稱還有豐富的歷史與傳統吸引全球遊客。台灣語為官方語言但中文也廣為使用英語也常用於國際交流。台灣政治多元執政黨為民進黨台灣是全球科技產業的重鎮擁有先進的製造業與服務業。台灣氣候溫暖潮濕四季分明夏季炎熱冬季涼爽雨季則在5月至10月。台灣的美食以小吃為主如滷肉飯、珍珠

Uses

Direct Use

本模型已經具備有繁體中文對話能力,使用者可以直接部署推論端點使用。

Downstream Use

若需強化模型在特定領域的知識,可透過微調進一步提升其性能與專業能力。

Out-of-Scope Use

本模型旨在提供資訊,不參與任何政治或法律問題的評斷或立場表達。

Bias, Risks, and Limitations

語言模型的生成內容可能因訓練集的多樣性而帶有偏見、特定立場,或包含與事實不符的言論,請使用者務必在使用過程中仔細確認內容的準確性與中立性。

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

要使用 vLLM Docker image 來啟動此模型,您可以按照以下操作:

docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model lianghsun/Llama-3.2-Taiwan-3B-Instruct

請注意,如果想要使用不同版本的 checkpoint請加上 --revision <tag_name>

docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model lianghsun/Llama-3.2-Taiwan-3B-Instruct --revision <tag_name>

Training Details

Training Data

繁體中文對話資料集
多國語系對話資料集

Training Procedure

Preprocessing

(WIP)

Training Hyperparameters

SFT stage for v2024.11.27

Note: 以下包含 v2024.11.22v2025.11.25 的超參數設定

  • learning_rate: 5e-05
  • min_learning_rate: 5e-07
  • train_batch_size: 105
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 50
  • total_train_batch_size: 21,000
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 5.0
  • global_step: 590

Speeds, Sizes, Times

SFT stage for v2024.11.27

Note: 以下包含 v2024.11.22v2025.11.25 的超參數設定

  • Duration: 5 days, 16:15:11.17
  • Train runtime: 490,511.1789
  • Train samples per second: 25.37
  • Train steps per second: 0.001
  • Total training FLOPs: 26,658,386,120,540,160
  • Train loss: 0.8533

Evaluation

Testing Data, Factors & Metrics

Formosa Taiwan Knowledge Bench

Testing Data

lianghsun/Formosa-bench

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

lianghsun/tw-legal-benchmark-v1

Testing Data

Factors

[More Information Needed]

Metrics

Accuracy

Results

  • Model Revision: v2024.11.27
Subset Split Score
lianghsun/tw-legal-benchmark-v1 train 31.1

Summary

tmmlu++

Testing Data

  • Dataset: ikala/tmmluplus
  • Revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c

Factors

[More Information Needed]

Metrics

Accuracy

Results

  • Model Revision: v2024.11.27
Subset Split Score
geography_of_taiwan test 36.2
dentistry test 33.83
technical test 35.07
statistics_and_machine_learning test 28.57
clinical_psychology test 29.6
tve_design test 38.54
three_principles_of_people test 48.2
introduction_to_law test 29.96
linear_algebra test 21.43
agriculture test 24.5
jce_humanities test 38.89
music test 25.9
secondary_physics test 33.04
physics test 27.84
advance_chemistry test 27.64
junior_science_exam test 30.05
veterinary_pathology test 25.09
financial_analysis test 25.13
national_protection test 42.65
macroeconomics test 26.76
politic_science test 27.44
ttqav2 test 61.06
junior_chinese_exam test 30.86
traditional_chinese_medicine_clinical_medicine test 25.9
junior_math_exam test 21.71
auditing test 21.82
anti_money_laundering test 37.31
pharmacology test 30.68
trust_practice test 28.18
tve_mathematics test 18.67
human_behavior test 32.04
pharmacy test 22.76
tve_chinese_language test 36.65
optometry test 25.11
physical_education test 30.73
organic_chemistry test 35.78
tve_natural_sciences test 33.73
education test 37.9
mechanical test 42.37
taiwanese_hokkien test 14.73
nautical_science test 30.49
business_management test 39.57
logic_reasoning test 27.34
marketing_management test 39.78
economics test 25.95
basic_medical_science test 28.41
occupational_therapy_for_psychological_disorders test 35.73
general_principles_of_law test 31.13
junior_chemistry test 24.88
veterinary_pharmacology test 36.3
educational_psychology test 33.52
finance_banking test 32.59
official_document_management test 32.43
fire_science test 30.65
junior_social_studies test 47.62
accounting test 20.94
engineering_math test 27.18
education_(profession_level) test 24.07
chinese_language_and_literature test 27.64
management_accounting test 24.19
culinary_skills test 39.38
administrative_law test 25.71
insurance_studies test 33.42
real_estate test 22.83
computer_science test 31.61
taxation test 27.47
trade test 20.32

Summary

模型版號 v2024.11.27,無論是基礎模型(lianghsun/Llama-3.2-Taiwan-3B)還是指令微調模型(lianghsun/Llama-3.2-Taiwan-3B-Instruct),均未接受過 tmmlu++ 資料集的訓練,以確保測試的公平性。經測試,目前該模型在 tmmlu++ 上表現普遍不佳,未達及格分數,可能需要加入專業領域的資料集來強化基礎模型能力。

Model Examination [optional]

[More Information Needed]

Environmental Impact

Technical Specifications

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

  • CPU count: 32
  • Logical CPU count: 64
  • GPU count: 4
  • GPU type: NVIDIA H100 NVL

Software

  • OS version: Linux-5.15.0-124-generic-x86_64-with-glibc2.35
  • Python version: 3.12.7

Citation

@misc{lianghsun2024llama32taiwan3binstruct,
  author = {Huang, Liang Hsun},
  title = {Llama-3.2-Taiwan-3B-Instruct},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B-Instruct}},
  note = {Accessed: 2024-11-25}
}

Glossary [optional]

N/A

More Information

Acknowledge

image/png 在此致謝鴻鵠國際股份有限公司蔡長明先生無償地贊助算力以及曾經幫忙過廖振翔、chweng、Ben、kevin、Maxxchu、Lam 和陳林彥…等朋友們,才能讓這個模型得以訓練完成,提供算力者乃人生父母。

Usage

如果你基於此指令模型進行微調,希望能不吝嗇在 模型卡片(model card) 裡標註 基礎模型 為:

base_model: lianghsun/Llama-3.2-Taiwan-3B-Instruct

標註和 ❤️ 是給予我們最大的鼓勵,謝謝。😀

Model Card Authors

Huang Liang Hsun

Model Card Contact

Huang Liang Hsun

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.4.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.0