初始化项目,由ModelHub XC社区提供模型
Model: Cartik/BastiAI-2-Instruct Source: Original Platform
This commit is contained in:
251
llmtf_eval/evaluation_log.txt
Normal file
251
llmtf_eval/evaluation_log.txt
Normal file
@@ -0,0 +1,251 @@
|
||||
INFO: 2024-10-17 21:30:14,019: llmtf.base.evaluator: Starting eval on ['darumeru/multiq']
|
||||
INFO: 2024-10-17 21:30:14,019: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 21:30:14,019: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 21:30:20,481: llmtf.base.darumeru/MultiQ: Loading Dataset: 6.46s
|
||||
INFO: 2024-10-17 21:35:59,593: llmtf.base.darumeru/MultiQ: Processing Dataset: 339.11s
|
||||
INFO: 2024-10-17 21:35:59,593: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
|
||||
INFO: 2024-10-17 21:35:59,594: llmtf.base.darumeru/MultiQ: {'f1': 0.3346248767848689, 'em': 0.22275334608030592}
|
||||
INFO: 2024-10-17 21:35:59,599: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 21:35:59,599: llmtf.base.evaluator:
|
||||
mean darumeru/MultiQ
|
||||
0.279 0.279
|
||||
INFO: 2024-10-17 21:36:08,809: llmtf.base.evaluator: Starting eval on ['darumeru/parus']
|
||||
INFO: 2024-10-17 21:36:08,810: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 21:36:08,810: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 21:36:12,969: llmtf.base.darumeru/PARus: Loading Dataset: 4.16s
|
||||
INFO: 2024-10-17 21:36:18,316: llmtf.base.darumeru/PARus: Processing Dataset: 5.35s
|
||||
INFO: 2024-10-17 21:36:18,317: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
|
||||
INFO: 2024-10-17 21:36:18,327: llmtf.base.darumeru/PARus: {'acc': 0.7}
|
||||
INFO: 2024-10-17 21:36:18,327: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 21:36:18,328: llmtf.base.evaluator:
|
||||
mean darumeru/MultiQ darumeru/PARus
|
||||
0.489 0.279 0.700
|
||||
INFO: 2024-10-17 21:36:27,550: llmtf.base.evaluator: Starting eval on ['darumeru/rcb']
|
||||
INFO: 2024-10-17 21:36:27,550: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 21:36:27,551: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 21:36:31,450: llmtf.base.darumeru/RCB: Loading Dataset: 3.90s
|
||||
INFO: 2024-10-17 21:36:38,683: llmtf.base.darumeru/RCB: Processing Dataset: 7.23s
|
||||
INFO: 2024-10-17 21:36:38,683: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
|
||||
INFO: 2024-10-17 21:36:38,686: llmtf.base.darumeru/RCB: {'acc': 0.5454545454545454, 'f1_macro': 0.49090309951702227}
|
||||
INFO: 2024-10-17 21:36:38,687: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 21:36:38,688: llmtf.base.evaluator:
|
||||
mean darumeru/MultiQ darumeru/PARus darumeru/RCB
|
||||
0.499 0.279 0.700 0.518
|
||||
INFO: 2024-10-17 21:36:48,734: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa']
|
||||
INFO: 2024-10-17 21:36:48,735: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 21:36:48,735: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 21:36:54,900: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 6.17s
|
||||
INFO: 2024-10-17 21:38:00,519: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 65.62s
|
||||
INFO: 2024-10-17 21:38:00,520: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
|
||||
INFO: 2024-10-17 21:38:00,532: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7302405498281787, 'f1_macro': 0.7304546157096631}
|
||||
INFO: 2024-10-17 21:38:00,541: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 21:38:00,542: llmtf.base.evaluator:
|
||||
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA
|
||||
0.557 0.279 0.700 0.518 0.730
|
||||
INFO: 2024-10-17 21:38:09,745: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree']
|
||||
INFO: 2024-10-17 21:38:09,745: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 21:38:09,745: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 21:38:14,102: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 4.36s
|
||||
INFO: 2024-10-17 21:38:16,932: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.83s
|
||||
INFO: 2024-10-17 21:38:16,933: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
|
||||
INFO: 2024-10-17 21:38:16,936: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9043404138496471}
|
||||
INFO: 2024-10-17 21:38:16,936: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 21:38:16,937: llmtf.base.evaluator:
|
||||
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA darumeru/ruWorldTree
|
||||
0.626 0.279 0.700 0.518 0.730 0.905
|
||||
INFO: 2024-10-17 21:38:26,077: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd']
|
||||
INFO: 2024-10-17 21:38:26,077: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 21:38:26,077: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 21:38:30,781: llmtf.base.darumeru/RWSD: Loading Dataset: 4.70s
|
||||
INFO: 2024-10-17 21:38:36,497: llmtf.base.darumeru/RWSD: Processing Dataset: 5.72s
|
||||
INFO: 2024-10-17 21:38:36,497: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
|
||||
INFO: 2024-10-17 21:38:36,498: llmtf.base.darumeru/RWSD: {'acc': 0.6029411764705882}
|
||||
INFO: 2024-10-17 21:38:36,499: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 21:38:36,500: llmtf.base.evaluator:
|
||||
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree
|
||||
0.622 0.279 0.700 0.518 0.603 0.730 0.905
|
||||
INFO: 2024-10-17 21:38:45,688: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
|
||||
INFO: 2024-10-17 21:38:45,688: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 21:38:45,688: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 21:39:02,002: llmtf.base.daru/treewayextractive: Loading Dataset: 16.31s
|
||||
INFO: 2024-10-17 21:42:05,777: llmtf.base.daru/treewayextractive: Processing Dataset: 183.77s
|
||||
INFO: 2024-10-17 21:42:05,777: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
|
||||
INFO: 2024-10-17 21:42:06,010: llmtf.base.daru/treewayextractive: {'r-prec': 0.3917218614718615}
|
||||
INFO: 2024-10-17 21:42:06,052: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 21:42:06,054: llmtf.base.evaluator:
|
||||
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree
|
||||
0.589 0.392 0.279 0.700 0.518 0.603 0.730 0.905
|
||||
INFO: 2024-10-17 21:42:15,170: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
|
||||
INFO: 2024-10-17 21:42:15,170: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 21:42:15,170: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 21:46:47,282: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 272.11s
|
||||
INFO: 2024-10-17 21:56:29,398: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 582.12s
|
||||
INFO: 2024-10-17 21:56:29,399: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
|
||||
INFO: 2024-10-17 21:56:29,464: llmtf.base.nlpcoreteam/ruMMLU: metric
|
||||
subject
|
||||
abstract_algebra 0.340000
|
||||
anatomy 0.414815
|
||||
astronomy 0.611842
|
||||
business_ethics 0.610000
|
||||
clinical_knowledge 0.554717
|
||||
college_biology 0.548611
|
||||
college_chemistry 0.380000
|
||||
college_computer_science 0.450000
|
||||
college_mathematics 0.400000
|
||||
college_medicine 0.526012
|
||||
college_physics 0.470588
|
||||
computer_security 0.620000
|
||||
conceptual_physics 0.565957
|
||||
econometrics 0.377193
|
||||
electrical_engineering 0.537931
|
||||
elementary_mathematics 0.529101
|
||||
formal_logic 0.365079
|
||||
global_facts 0.360000
|
||||
high_school_biology 0.664516
|
||||
high_school_chemistry 0.487685
|
||||
high_school_computer_science 0.700000
|
||||
high_school_european_history 0.751515
|
||||
high_school_geography 0.722222
|
||||
high_school_government_and_politics 0.564767
|
||||
high_school_macroeconomics 0.528205
|
||||
high_school_mathematics 0.433333
|
||||
high_school_microeconomics 0.533613
|
||||
high_school_physics 0.403974
|
||||
high_school_psychology 0.713761
|
||||
high_school_statistics 0.523148
|
||||
high_school_us_history 0.661765
|
||||
high_school_world_history 0.717300
|
||||
human_aging 0.587444
|
||||
human_sexuality 0.618321
|
||||
international_law 0.735537
|
||||
jurisprudence 0.666667
|
||||
logical_fallacies 0.564417
|
||||
machine_learning 0.392857
|
||||
management 0.650485
|
||||
marketing 0.752137
|
||||
medical_genetics 0.580000
|
||||
miscellaneous 0.632184
|
||||
moral_disputes 0.583815
|
||||
moral_scenarios 0.299441
|
||||
nutrition 0.637255
|
||||
philosophy 0.617363
|
||||
prehistory 0.561728
|
||||
professional_accounting 0.386525
|
||||
professional_law 0.377445
|
||||
professional_medicine 0.481618
|
||||
professional_psychology 0.516340
|
||||
public_relations 0.500000
|
||||
security_studies 0.648980
|
||||
sociology 0.756219
|
||||
us_foreign_policy 0.720000
|
||||
virology 0.439759
|
||||
world_religions 0.719298
|
||||
INFO: 2024-10-17 21:56:29,473: llmtf.base.nlpcoreteam/ruMMLU: metric
|
||||
subject
|
||||
STEM 0.503308
|
||||
humanities 0.586259
|
||||
other (business, health, misc.) 0.543782
|
||||
social sciences 0.599968
|
||||
INFO: 2024-10-17 21:56:29,478: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5583294528508019}
|
||||
INFO: 2024-10-17 21:56:29,516: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 21:56:29,518: llmtf.base.evaluator:
|
||||
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU
|
||||
0.586 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.558
|
||||
INFO: 2024-10-17 21:56:39,535: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
|
||||
INFO: 2024-10-17 21:56:39,536: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 21:56:39,536: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 21:58:54,966: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 135.43s
|
||||
INFO: 2024-10-17 22:08:04,419: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 549.45s
|
||||
INFO: 2024-10-17 22:08:04,426: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
|
||||
INFO: 2024-10-17 22:08:04,492: llmtf.base.nlpcoreteam/enMMLU: metric
|
||||
subject
|
||||
abstract_algebra 0.380000
|
||||
anatomy 0.637037
|
||||
astronomy 0.717105
|
||||
business_ethics 0.700000
|
||||
clinical_knowledge 0.705660
|
||||
college_biology 0.715278
|
||||
college_chemistry 0.470000
|
||||
college_computer_science 0.580000
|
||||
college_mathematics 0.330000
|
||||
college_medicine 0.664740
|
||||
college_physics 0.509804
|
||||
computer_security 0.740000
|
||||
conceptual_physics 0.642553
|
||||
econometrics 0.508772
|
||||
electrical_engineering 0.600000
|
||||
elementary_mathematics 0.547619
|
||||
formal_logic 0.412698
|
||||
global_facts 0.360000
|
||||
high_school_biology 0.783871
|
||||
high_school_chemistry 0.581281
|
||||
high_school_computer_science 0.710000
|
||||
high_school_european_history 0.800000
|
||||
high_school_geography 0.757576
|
||||
high_school_government_and_politics 0.854922
|
||||
high_school_macroeconomics 0.679487
|
||||
high_school_mathematics 0.455556
|
||||
high_school_microeconomics 0.773109
|
||||
high_school_physics 0.437086
|
||||
high_school_psychology 0.844037
|
||||
high_school_statistics 0.652778
|
||||
high_school_us_history 0.833333
|
||||
high_school_world_history 0.843882
|
||||
human_aging 0.677130
|
||||
human_sexuality 0.786260
|
||||
international_law 0.768595
|
||||
jurisprudence 0.814815
|
||||
logical_fallacies 0.803681
|
||||
machine_learning 0.446429
|
||||
management 0.786408
|
||||
marketing 0.858974
|
||||
medical_genetics 0.760000
|
||||
miscellaneous 0.795658
|
||||
moral_disputes 0.667630
|
||||
moral_scenarios 0.311732
|
||||
nutrition 0.732026
|
||||
philosophy 0.704180
|
||||
prehistory 0.712963
|
||||
professional_accounting 0.503546
|
||||
professional_law 0.457627
|
||||
professional_medicine 0.658088
|
||||
professional_psychology 0.668301
|
||||
public_relations 0.709091
|
||||
security_studies 0.697959
|
||||
sociology 0.800995
|
||||
us_foreign_policy 0.800000
|
||||
virology 0.506024
|
||||
world_religions 0.801170
|
||||
INFO: 2024-10-17 22:08:04,506: llmtf.base.nlpcoreteam/enMMLU: metric
|
||||
subject
|
||||
STEM 0.572187
|
||||
humanities 0.687100
|
||||
other (business, health, misc.) 0.667521
|
||||
social sciences 0.740042
|
||||
INFO: 2024-10-17 22:08:04,511: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6667125709237595}
|
||||
INFO: 2024-10-17 22:08:04,554: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 22:08:04,556: llmtf.base.evaluator:
|
||||
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
|
||||
0.595 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.667 0.558
|
||||
INFO: 2024-10-17 22:08:14,512: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
|
||||
INFO: 2024-10-17 22:08:14,513: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 22:08:14,513: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 22:08:18,791: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.28s
|
||||
INFO: 2024-10-17 22:11:46,260: llmtf.base.daru/treewayabstractive: Processing Dataset: 207.47s
|
||||
INFO: 2024-10-17 22:11:46,260: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
|
||||
INFO: 2024-10-17 22:11:46,261: llmtf.base.daru/treewayabstractive: {'rouge1': 0.33109987599556284, 'rouge2': 0.11202889150257295}
|
||||
INFO: 2024-10-17 22:11:46,262: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 22:11:46,263: llmtf.base.evaluator:
|
||||
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
|
||||
0.557 0.222 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.667 0.558
|
||||
INFO: 2024-10-17 22:11:55,717: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru']
|
||||
INFO: 2024-10-17 22:11:55,717: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
|
||||
INFO: 2024-10-17 22:11:55,717: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
|
||||
INFO: 2024-10-17 22:11:59,846: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.13s
|
||||
INFO: 2024-10-17 22:14:29,975: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 150.13s
|
||||
INFO: 2024-10-17 22:14:29,975: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
|
||||
INFO: 2024-10-17 22:14:29,976: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.993754090002875, 'len': 0.9986883734384026, 'lcs': 0.98}
|
||||
INFO: 2024-10-17 22:14:29,977: llmtf.base.evaluator: Ended eval
|
||||
INFO: 2024-10-17 22:14:29,977: llmtf.base.evaluator:
|
||||
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
|
||||
0.596 0.222 0.392 0.279 0.700 0.518 0.603 0.980 0.730 0.905 0.667 0.558
|
||||
Reference in New Issue
Block a user