初始化项目,由ModelHub XC社区提供模型
Model: nv-community/Nemotron-Cascade-8B Source: Original Platform
This commit is contained in:
@@ -0,0 +1,59 @@
|
||||
dataset_name: abstract_algebra
|
||||
description: The following are multiple choice questions (with answers) about abstract
|
||||
algebra.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Statement 1 | Every element of a group generates a cyclic subgroup of
|
||||
the group. Statement 2 | The symmetric group S_10 has 10 elements.
|
||||
|
||||
(A) True, True (B) False, False (C) True, False (D) False, True'
|
||||
target: Let's think step by step. A cyclic group is a group that is generated
|
||||
by a single element. Hence a subgroup generated by a single element of a group
|
||||
is cyclic and Statement 1 is True. The answer is (C).
|
||||
- question: 'The symmetric group $S_n$ has $
|
||||
|
||||
actorial{n}$ elements, hence it is not true that $S_{10}$ has 10 elements.
|
||||
|
||||
Find the characteristic of the ring 2Z.
|
||||
|
||||
(A) 0 (B) 3 (C) 12 (D) 30'
|
||||
target: Let's think step by step. A characteristic of a ring is R is $n$ if the
|
||||
statement $ka = 0$ for all $a\in 2Z$ implies that $k$ is a multiple of $n$.
|
||||
Assume that $ka = 0$ for all $a\in 2Z$ for some $k$. In particular $2k = 0$.
|
||||
Hence $k=0$ and $n=0$. The answer is (A).
|
||||
- question: 'Statement 1| Every function from a finite set onto itself must be one
|
||||
to one. Statement 2 | Every subgroup of an abelian group is abelian.
|
||||
|
||||
(A) True, True (B) False, False (C) True, False (D) False, True'
|
||||
target: "Let's think step by step. Statement 1 is true. Let $S$ be a finite set.\
|
||||
\ If $f:S \nightarrow S$ is a onto function, then $|S| = |f(S)|$. If $f$ was\
|
||||
\ not one to one, then for finite domain $S$ the image would have less than\
|
||||
\ $S$ elements, a contradiction.\nStatement 2 is true. Let $G$ be an abelian\
|
||||
\ group and $H$ be a subgroup of $G$. We need to show that $H$ is abelian. Let\
|
||||
\ $a,b \\in H$. Then $a,b \\in G$ and $ab=ba$. Since $G$ is abelian, $ab=ba$.\
|
||||
\ Since $H$ is a subgroup of $G$, $ab \\in H$. Therefore, $ab=ba$ and $H$ is\
|
||||
\ abelian. The answer is (A)."
|
||||
- question: 'Statement 1 | If aH is an element of a factor group, then |aH| divides
|
||||
|a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.
|
||||
|
||||
(A) True, True (B) False, False (C) True, False (D) False, True'
|
||||
target: Let's think step by step. Statement 2 is false. Let $H$ be a subgroup
|
||||
of $S_3$ generated by the cycle $(1,2)$ and $K$ be a subgroup of $S_3$ generated
|
||||
by the cycle $(1,3)$. Both $H$ and $K$ have two elements, the generators and
|
||||
the identity. However $HK$ contains cycles (1,2), (1,3) and (2,3,1), but the
|
||||
inverse of (2,3,1) is (2,1,3) and it does not belong to HK, hence HK is not
|
||||
a subgroup. The answer is (B).
|
||||
- question: 'Find all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.
|
||||
|
||||
(A) 0 (B) 1 (C) 2 (D) 3'
|
||||
target: 'Let''s think step by step. Z_3[x]/(x^2 + c) is a field if and only if
|
||||
x^2 + c does not have roots in Z_3. That is x^2 + c != 0 for every x in Z_3.
|
||||
If c = 0, then x^2 + c = x^2 has root 0. If c = 1 then x^2 + c = x^2 + 1 = 0
|
||||
+ 1 for x = 0, 1 + 1 = 2 for x = 1 and 1 + 1 = 2 for x = 2, hence x^2 + 1 does
|
||||
not have any roots. For c = 2 the polynomial x^2 + 2 has two roots at x = 1
|
||||
and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only if c = 1. The answer
|
||||
is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_abstract_algebra
|
||||
75
evaluation/data/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
Normal file
75
evaluation/data/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
Normal file
@@ -0,0 +1,75 @@
|
||||
dataset_name: anatomy
|
||||
description: The following are multiple choice questions (with answers) about anatomy.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of the following is the body cavity that contains the pituitary
|
||||
gland?
|
||||
|
||||
(A) Abdominal (B) Cranial (C) Pleural (D) Spinal'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
|
||||
\ help. Let\u2019s solve this problem step by step. The pituitary gland is the\
|
||||
\ major endocrine gland attached to the base of the brain, and it is contained\
|
||||
\ in the Cranial cavity. The answer is (B)."
|
||||
- question: 'Which of these branches of the trigeminal nerve contain somatic motor
|
||||
processes?
|
||||
|
||||
(A) The supraorbital nerve (B) The infraorbital nerve (C) The mental nerve (D)
|
||||
None of the above'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
|
||||
\ help. Let\u2019s solve this problem step by step. \nWe know the following:\
|
||||
\ (A) The supraorbital nerve (also known as the frontal nerve) is the largest\
|
||||
\ branch of the ophthalmic nerve and branch of ophthalmic division of the trigeminal\
|
||||
\ nerve. (B) The infraorbital nerve is a branch of the maxillary division of\
|
||||
\ the trigeminal nerve. (C) The mental nerve is a branch of the mandibular division\
|
||||
\ of the trigeminal nerve. Because all these nerves are purely sensory nerves\
|
||||
\ and do not contain any somatic motor processes. Therefore, the answer should\
|
||||
\ be none of the above, which is (D). The answer is (D)."
|
||||
- question: 'In Angle''s Class II Div 2 occlusion there is
|
||||
|
||||
(A) excess overbite of the upper lateral incisors. (B) negative overjet of the
|
||||
upper central incisors. (C) excess overjet of the upper lateral incisors. (D)
|
||||
excess overjet of the upper central incisors.'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
|
||||
\ help. Let\u2019s solve this problem step by step. This is a question related\
|
||||
\ to anatomy and orthodontics. Excess overjet is associated with Class II occlusions;\
|
||||
\ therefore, we can safely eliminate (B) from the list, as negative overjet\
|
||||
\ is often associated with Class III occlusions. Now, we need to determine the\
|
||||
\ location of the excess overjet, and that would be the upper (maxillary) lateral\
|
||||
\ incisors. Only (C) has the correct information. The answer is (C)."
|
||||
- question: 'The pleura
|
||||
|
||||
(A) have no sensory innervation. (B) are separated by a 2 mm space. (C) extend
|
||||
into the neck. (D) are composed of respiratory epithelium.'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
|
||||
\ help. Let\u2019s solve this problem step by step. First, recall that the pleura\
|
||||
\ refers to the thin layer of tissue that covers the lungs and lines the interior\
|
||||
\ wall of the chest cavity. Now, let\u2019s look at each option:\nOption (A):\
|
||||
\ \u201CThe pleura have no sensory innervation.\u201D This information is not\
|
||||
\ correct. The pleura do have a sensory innervation.\nOption (B): \u201CThe\
|
||||
\ pleura are separated by a 2 mm space.\u201D This information is not correct.\
|
||||
\ There is a very thin \u201Cpotential\u201D space between the layers of the\
|
||||
\ pleura; however, it is typically filled with serous pleural fluid. \nOption\
|
||||
\ (C): \u201CThe pleura extend into the neck.\u201D This information is actuakky\
|
||||
\ true. The cervical pleura, also known as the dome of the pleuradome of the\
|
||||
\ pleura, lines the extendsiton of the pleural cavity into the neck.\nOption\
|
||||
\ (D): \u201CThe pleura are composed of respiratory epithelium.\u201D This information\
|
||||
\ is not correct. The pleaura are composed of connective tissue (CT).\nBecause\
|
||||
\ (A), (B), and (D) are all incorrect, (D) is the only correct answer. The answer\
|
||||
\ is (C)."
|
||||
- question: 'What is the embryological origin of the hyoid bone?
|
||||
|
||||
(A) The first pharyngeal arch (B) The first and second pharyngeal arches (C)
|
||||
The second pharyngeal arch (D) The second and third pharyngeal arches'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
|
||||
\ help. Let\u2019s solve this problem step by step. The hyoid bone, which is\
|
||||
\ also known as the hyooid, is a a small U-shaped bone located in the anterior\
|
||||
\ neck. In its resting position, it lies between the ase of the mandible and\
|
||||
\ the third cervical vertebrae. We know that the second and the third pharyngeal\
|
||||
\ arches give rise to the horns of the hyoid bone; therefore, the embryological\
|
||||
\ origin of the hyoid bone are the second and the third pharyngeal arches\u2014\
|
||||
this information is covered in the last option (D). Therefore, we conclude that\
|
||||
\ (D) must be the correct answer. The answer is (D).\n\n"
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_anatomy
|
||||
70
evaluation/data/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
Normal file
70
evaluation/data/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
Normal file
@@ -0,0 +1,70 @@
|
||||
dataset_name: astronomy
|
||||
description: The following are multiple choice questions (with answers) about astronomy.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Where do most short-period comets come from and how do we know?
|
||||
|
||||
(A) The Kuiper belt; short period comets tend to be in the plane of the solar
|
||||
system just like the Kuiper belt. (B) The Kuiper belt; short period comets tend
|
||||
to come from random directions indicating a spherical distribution of comets
|
||||
called the Kuiper belt. (C) The asteroid belt; short period comets have orbital
|
||||
periods similar to asteroids like Vesta and are found in the plane of the solar
|
||||
system just like the asteroid belt. (D) The Oort cloud; short period comets
|
||||
tend to be in the plane of the solar system just like the Oort cloud.'
|
||||
target: Let's think step by step. Most short-period comets come from the Kuiper
|
||||
belt, and we know because short period coments tend to be in the plane of the
|
||||
solar system, just like the Kuiper belt is. The answer is (A).
|
||||
- question: 'You are pushing a truck along a road. Would it be easier to accelerate
|
||||
this truck on Mars? Why? (Assume there is no friction)
|
||||
|
||||
(A) It would be harder since the truck is heavier on Mars. (B) It would be easier
|
||||
since the truck is lighter on Mars. (C) It would be harder since the truck is
|
||||
lighter on Mars. (D) It would be the same no matter where you are.'
|
||||
target: "Let's think step by step. If we assume that there is no friction, the\
|
||||
\ force needed to accelerate the truck is by Newton\u2019s second law only dependent\
|
||||
\ on the mass of the truck. Hence (A), (B) and (C) are incorrect since it doesn\u2019\
|
||||
t matter that it\u2019s on Mars, and (D) is the correct answer. The answer is\
|
||||
\ (D)."
|
||||
- question: 'Say the pupil of your eye has a diameter of 5 mm and you have a telescope
|
||||
with an aperture of 50 cm. How much more light can the telescope gather than
|
||||
your eye?
|
||||
|
||||
(A) 10000 times more (B) 100 times more (C) 1000 times more (D) 10 times more'
|
||||
target: Let's think step by step. The amount of light is proportional to the aperture
|
||||
area $A = \pi D^2/4$ for a lens with diameter $D$, so the relative amounts of
|
||||
light between the eye with diameter 5mm and the telescope with diameter 50mm
|
||||
is $(50 cm)^2/(5mm)^2 = 10000$. The answer is (A).
|
||||
- question: 'Why isn''t there a planet where the asteroid belt is located?
|
||||
|
||||
(A) A planet once formed here but it was broken apart by a catastrophic collision.
|
||||
(B) There was not enough material in this part of the solar nebula to form a
|
||||
planet. (C) There was too much rocky material to form a terrestrial planet but
|
||||
not enough gaseous material to form a jovian planet. (D) Resonance with Jupiter
|
||||
prevented material from collecting together to form a planet.'
|
||||
target: "Let's think step by step. The asteroid belt is a stellar disc consisting\
|
||||
\ of a large number of asteroids between Mars and Jupiter's orbits. The asteroids\
|
||||
\ in this belt are affected by the gravitational pull from both other asteroids\
|
||||
\ and nearby planets. Due to the strong gravitational force of Jupiter there\
|
||||
\ are resonances that give rise to low density regions of asteroids known as\
|
||||
\ the Kirkwood gap. So (B) and (C) are not correct since it\u2019s not a lack\
|
||||
\ of material that prevents a planet from being formed, and (A) is incorrect\
|
||||
\ because the Kirkwood gap would have prevented a planet from forming in the\
|
||||
\ first place, and (D) is the correct option. The answer is (D)."
|
||||
- question: 'Why is Mars red?
|
||||
|
||||
(A) Because the surface is covered with heavily oxidized ("rusted") minerals.
|
||||
(B) Because the atmosphere scatters more light at bluer wavelengths transmitting
|
||||
mostly red light. (C) Because Mars is covered with ancient lava flows which
|
||||
are red in color. (D) Because flowing water on Mars''s surface altered the surface
|
||||
minerals several billion years ago.'
|
||||
target: 'Let''s think step by step. Option (B) is not correct because if the red
|
||||
color was caused by the scattering off the atmosphere, then the earth with a
|
||||
much thicker atmosphere would also look red. Options (C) and (D) are not specific
|
||||
enough about why the color of the surface would be red, while (A) is correct
|
||||
because it explains that the surface is red due to the rusted materials on the
|
||||
surface and the red color comes from the rust. So the correct option is (A).
|
||||
The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_astronomy
|
||||
@@ -0,0 +1,75 @@
|
||||
dataset_name: business_ethics
|
||||
description: The following are multiple choice questions (with answers) about business
|
||||
ethics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'In contrast to _______, _______ aim to reward favourable behaviour by
|
||||
companies. The success of such campaigns have been heightened through the use
|
||||
of ___________, which allow campaigns to facilitate the company in achieving
|
||||
_________ .
|
||||
|
||||
(A) Buycotts, Boycotts, Blockchain technology, Charitable donations (B) Buycotts,
|
||||
Boycotts, Digital technology, Increased Sales (C) Boycotts, Buyalls, Blockchain
|
||||
technology, Charitable donations (D) Boycotts, Buycotts, Digital technology,
|
||||
Increased Sales'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on business\
|
||||
\ ethics for help. The sentence that best uses the possible options above is\
|
||||
\ \u201CIn contrast to *boycotts*, *buycotts* aim to reward favourable behavior\
|
||||
\ by companies. The success of such campaigns have been heightened through the\
|
||||
\ use of *digital technology*, which allow campaigns to facilitate the company\
|
||||
\ in achieving *increased sales*.\u201D The answer is (D)."
|
||||
- question: '_______ is the direct attempt to formally or informally manage ethical
|
||||
issues or problems, through specific policies, practices and programmes.
|
||||
|
||||
(A) Corporate social responsibility (B) Business ethics management (C) Sustainability
|
||||
(D) Environmental management'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on business ethics
|
||||
for help. The direct attempt manage ethical issues through specific policies,
|
||||
practices, and programs is business ethics management. The answer is (B).
|
||||
- question: 'Three contrasting tactics that CSO''s can engage in to meet their aims
|
||||
are ________ which typically involves research and communication, ________,
|
||||
which may involve physically attacking a company''s operations or ________,
|
||||
often involving some form of _______.
|
||||
|
||||
(A) Non-violent direct action, Violent direct action, Indirect action, Boycott
|
||||
(B) Indirect action, Instrumental action, Non-violent direct action, Information
|
||||
campaign (C) Indirect action, Violent direct action, Non-violent direct-action
|
||||
Boycott (D) Non-violent direct action, Instrumental action, Indirect action,
|
||||
Information campaign'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on business\
|
||||
\ ethics for help. The sentence that best uses the possible options above is\
|
||||
\ \u201CThree contrasting tactics that CSO's can engage in to meet their aims\
|
||||
\ are *indirect action*, which typically involves research and communication,\
|
||||
\ *violent direct action*, which may involve physically attacking a company's\
|
||||
\ operations or *non-violent direct action*, often involving some form of *boycott*.\u201D\
|
||||
\ The answer is (C)."
|
||||
- question: 'To ensure the independence of the non-executive board members, there are
|
||||
a number of steps which can be taken, which include non-executives being drawn
|
||||
from _______ the company, being appointed for a _________ time period as well
|
||||
as being appointed _________.
|
||||
|
||||
(A) Outside, Limited, Independently (B) Inside, Limited, Intermittently (C)
|
||||
Outside, Unlimited, Intermittently (D) Inside, Unlimited, Independently'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on business\
|
||||
\ ethics for help. The sentence that best uses the possible options above is\
|
||||
\ \u201CTo ensure the independence of the non-executive board members, there\
|
||||
\ are a number of steps which can be taken, which include non-executives being\
|
||||
\ draw from *outside* the company, being appointed for a *limited* time period\
|
||||
\ as well as being imported *independently*. The answer is (A)."
|
||||
- question: 'Beyond the business case for engaging in CSR there are a number of moral
|
||||
arguments relating to: negative _______, the _______that corporations possess
|
||||
and the ________ of business and society.
|
||||
|
||||
(A) Externalities, Power, Independence (B) Publicity, Insubstantial resources,
|
||||
Mutual dependence (C) Publicity, Power, Independence (D) Externalities, Power,
|
||||
Mutual dependence'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on business\
|
||||
\ ethics for help. The sentence that best uses the possible options above is\
|
||||
\ \u201CBeyond the business case for engaging the CSR there are a number of\
|
||||
\ moral arguments relating to: negative *externalities*, the *power* that corporations\
|
||||
\ possess and the *mutual independence* of business and society. The answer\
|
||||
\ is (D).\n\n"
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_business_ethics
|
||||
@@ -0,0 +1,48 @@
|
||||
dataset_name: clinical_knowledge
|
||||
description: The following are multiple choice questions (with answers) about clinical
|
||||
knowledge.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Glycolysis is the name given to the pathway involving the conversion of:
|
||||
|
||||
(A) glycogen to glucose-1-phosphate. (B) glycogen or glucose to fructose. (C)
|
||||
glycogen or glucose to pyruvate or lactate. (D) glycogen or glucose to pyruvate
|
||||
or acetyl CoA.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on clinical knowledge
|
||||
for help. Glycolysis is the name given to the pathway involving conversion of
|
||||
glycogen or glucose to pyruvate or lactate. The answer is (C).
|
||||
- question: 'What is the difference between a male and a female catheter?
|
||||
|
||||
(A) Male and female catheters are different colours. (B) Male catheters are
|
||||
longer than female catheters. (C) Male catheters are bigger than female catheters.
|
||||
(D) Female catheters are longer than male catheters.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on clinical knowledge
|
||||
for help. The difference between a male and female catheter is that male catheters
|
||||
tend to be longer than female catheters. The answer is (B).
|
||||
- question: 'How many attempts should you make to cannulate a patient before passing
|
||||
the job on to a senior colleague, according to the medical knowledge of 2020?
|
||||
|
||||
(A) 4 (B) 3 (C) 2 (D) 1'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on clinical knowledge
|
||||
for help. According to the medical protocol as of 2020, you should make two
|
||||
attempts to cannulate a patient before passing the job on to a more-senior practitioner.
|
||||
The answer is (C).
|
||||
- question: 'In the assessment of the hand function which of the following is true?
|
||||
|
||||
(A) Abduction of the thumb is supplied by spinal root T2 (B) Opposition of the
|
||||
thumb by opponens policis is supplied by spinal root T1 (C) Finger adduction
|
||||
is supplied by the median nerve (D) Finger abduction is mediated by the palmar
|
||||
interossei'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on clinical knowledge
|
||||
for help. Of all the options, it is only true that the opposition of the thumb
|
||||
by opponens pollicis is supplied by spinal root T1. The answer is (B).
|
||||
- question: 'The energy for all forms of muscle contraction is provided by:
|
||||
|
||||
(A) ATP. (B) ADP. (C) phosphocreatine. (D) oxidative phosphorylation.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on clinical
|
||||
knowledge for help. The energy for muscular contraction is provided by ATP (adenosine
|
||||
triphosphate), which is the powerhouse of the cell. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_clinical_knowledge
|
||||
@@ -0,0 +1,75 @@
|
||||
dataset_name: college_biology
|
||||
description: The following are multiple choice questions (with answers) about college
|
||||
biology.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of the following represents an accurate statement concerning arthropods?
|
||||
|
||||
(A) They possess an exoskeleton composed primarily of peptidoglycan. (B) They
|
||||
possess an open circulatory system with a dorsal heart. (C) They are members
|
||||
of a biologically unsuccessful phylum incapable of exploiting diverse habitats
|
||||
and nutrition sources. (D) They lack paired, jointed appendages.'
|
||||
target: Let's think step by step. Peptidoglycan is known to comprise the plasma
|
||||
membrane of most bacteria, rather than the exoskeleton of arthropods, which
|
||||
is made of chitin, which rules out (A). The answer (C) is false because arthropods
|
||||
are a highly successful phylum. Likewise, arthropods have paired, jointed appendages,
|
||||
which rules out (D). The only remaining option is (B), as arthropods have an
|
||||
open circulatory system with a dorsal tubular heart. The answer is (B).
|
||||
- question: 'In a given population, 1 out of every 400 people has a cancer caused by
|
||||
a completely recessive allele, b. Assuming the population is in Hardy-Weinberg
|
||||
equilibrium, which of the following is the expected proportion of individuals
|
||||
who carry the b allele but are not expected to develop the cancer?
|
||||
|
||||
(A) 1/400 (B) 19/400 (C) 20/400 (D) 38/400'
|
||||
target: "Let's think step by step. According to the Hardy Weinberg Law, $p^2 +\
|
||||
\ 2 p q + q^2 = 1$, and $p + q = 1$ where $p$ is the frequency of the dominant\
|
||||
\ allele, $q$ is the frequency of the recessive allele, and $p^2$, $q^2$, and\
|
||||
\ $2pq$ are the frequencies of dominant homozygous, recessive homozygous, and\
|
||||
\ heterozygous individuals, respectively. \u200BThe frequency of the recessive\
|
||||
\ allele (q) is $\\sqrt{\frac{1}{400}} = 0.05$. We have $p = 1 - q = 0.95$.\
|
||||
\ The frequency of heterozygous individuals is $2pq = 2 \\cdot 0.05 \\cdot 0.95\
|
||||
\ = 0.095$. The number of heterozygous individuals is equal to the frequency\
|
||||
\ of heterozygous individuals times the size of the population, or $0.095 *\
|
||||
\ 400 = 38$. So we end up with 38/400. The answer is (D)."
|
||||
- question: 'According to the pressure-flow model of movement of phloem contents, photosynthate
|
||||
movement from source to sink is driven by
|
||||
|
||||
(A) an ATP-dependent pressure-flow pump (B) a water-pressure potential gradient
|
||||
(C) transpiration (D) apoplastic diffusion'
|
||||
target: Let's think step by step. It is a gradient in water pressure that induces
|
||||
the movement of phloem content, which refers to answer (B). The mechanism of
|
||||
movement does not rely on metabolism, which rules out (A). Transpiration refers
|
||||
to the exhalation of water vapor through plant stomata, and is also not related,
|
||||
which rules out (C). While the apoplastic pathway is one of two main pathways
|
||||
for water transport in plants, it is not central to the pressure flow model,
|
||||
which rules out (D). The answer is (B).
|
||||
- question: 'Which of the following contain DNA sequences required for the segregation
|
||||
of chromosomes in mitosis and meiosis?
|
||||
|
||||
(A) Telomeres (B) Centromeres (C) Nucleosomes (D) Spliceosomes'
|
||||
target: Let's think step by step. The genetic material in Telomeres is not used,
|
||||
which rules out (A). Nucleosomes are the repeating subunit that comprises chromatin
|
||||
packed in a cell nucleus, and do not specifically refer to DNA sequences necessary
|
||||
for segregating chromosomes in cell division, which rules out (C). A spliceosome
|
||||
is a large ribonucleoprotein that removes introns from transcribed pre-mRNA
|
||||
rather than governing chromosome segregation. Centromeres are directly responsible
|
||||
for segregating chromosomes in cell division. The answer is (B).
|
||||
- question: 'The presence of homologous structures in two different organisms, such
|
||||
as the humerus in the front limb of a human and a bird, indicates that
|
||||
|
||||
(A) the human and bird are polyphyletic species (B) a human''s and bird''s evolution
|
||||
is convergent (C) the human and bird belong to a clade (D) the human and bird
|
||||
developed by analogy'
|
||||
target: 'Let''s think step by step. Polyphyletic species are organisms that are
|
||||
grouped due to having similar characteristics but which do not have a common
|
||||
ancestor. This is not the case for humans and birds, which rules out (A). Convergent
|
||||
evolution refers to the indepdendent development of similar features in different
|
||||
species at different periods, which is also not the case for humans and birds,
|
||||
which rules out (B). Analogy refers to the superficial resemblance of structures
|
||||
that have different origins, which is not the case for the human and bird forearms,
|
||||
which rules out (D). Humans and birds do belong to the same clade - a group
|
||||
of organisms composed of a common ancestor. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_college_biology
|
||||
@@ -0,0 +1,49 @@
|
||||
dataset_name: college_chemistry
|
||||
description: The following are multiple choice questions (with answers) about college
|
||||
chemistry.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: "3 Cl\u2212(aq) + 4 CrO_4^2\u2212(aq) + 23 H+(aq) \u2192 3 HClO2(aq) +\
|
||||
\ 4 Cr3+(aq) + 10 H2O(l). In the reaction shown above, Cl\u2212(aq) behaves\
|
||||
\ as\n(A) an acid (B) a base (C) a catalyst (D) a reducing agent"
|
||||
target: Let's think step by step. A molecule that behaves as a base accepts an
|
||||
H+ ion (or proton) from another molecule, whereas a molecule that behaves as
|
||||
an acid donates an H+ ion (or proton) to another molecule. Neither of these
|
||||
is the case for Cl in this reaction, which rules out (A) and (B). A catalyst
|
||||
is a substance that only accelerates a reaction without itself undergoing chemical
|
||||
change, which is not the case here. This rules out (C). Instead, the $Cl^{-}
|
||||
molecules carry a negative charge, which they donate in the reaction to form
|
||||
3 HClO2. This is the behavior of a reducing agent, or (D). The answer is (D).
|
||||
- question: 'Which of the following statements about the lanthanide elements is NOT
|
||||
true?
|
||||
|
||||
(A) The most common oxidation state for the lanthanide elements is +3. (B) Lanthanide
|
||||
complexes often have high coordination numbers (> 6). (C) All of the lanthanide
|
||||
elements react with aqueous acid to liberate hydrogen. (D) The atomic radii
|
||||
of the lanthanide elements increase across the period from La to Lu.'
|
||||
target: Let's think step by step. The atomic radii of the lanthanide elements
|
||||
in fact decrease across the period from La to Lu. Options (A), (B), and (C)
|
||||
are all true. This means that only (D) is NOT true. The answer is (D).
|
||||
- question: 'Which of the following lists the hydrides of group-14 elements in order
|
||||
of thermal stability, from lowest to highest?
|
||||
|
||||
(A) PbH4 < SnH4 < GeH4 < SiH4 < CH4 (B) PbH4 < SnH4 < CH4 < GeH4 < SiH4 (C)
|
||||
CH4 < SiH4 < GeH4 < SnH4 < PbH4 (D) CH4 < PbH4 < GeH4 < SnH4 < SiH4'
|
||||
target: Let's think step by step. The thermal stability of group-14 hydrides decreases
|
||||
as we move from the top of group 14 to the bottom. The order of elements in
|
||||
the group from top to bottom is C, Si, Ge, Sn, Pb. Therefore in order of increasing
|
||||
thermal stability we have PbH4, SnH4, GeH4, SiH4, and CH4, or answer (A). The
|
||||
answer is (A).
|
||||
- question: "Predict the number of lines in the EPR spectrum of a solution of 13C-labelled\
|
||||
\ methyl radical (13CH3\u2022), assuming the lines do not overlap.\n(A) 4 (B)\
|
||||
\ 3 (C) 6 (D) 24 (E) 8"
|
||||
target: "Let's think step by step. The electron paramagnetic resonance spectrum\
|
||||
\ will be split by two forms of interactions. The first is the hyperfine interaction\
|
||||
\ with the 13C (nuclear spin $I = \nrac{1}{2}$) which will split the spectrum\
|
||||
\ into 2 lines. This will be further split into 4 lines by the interaction with\
|
||||
\ three equivalent 1H nuclei. The total number of lines is therefore $2 \\cdot\
|
||||
\ 4 = 8$. The answer is (E).\n\n"
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_college_chemistry
|
||||
@@ -0,0 +1,180 @@
|
||||
dataset_name: college_computer_science
|
||||
description: The following are multiple choice questions (with answers) about college
|
||||
computer science.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of the following regular expressions is equivalent to (describes
|
||||
the same set of strings as) (a* + b)*(c + d)?
|
||||
|
||||
(A) a*(c + d)+ b(c + d)
|
||||
|
||||
(B) a*(c + d)* + b(c + d)*
|
||||
|
||||
(C) a*(c + d)+ b*(c + d)
|
||||
|
||||
(D) (a + b)*c +(a + b)*d'
|
||||
target: 'Let''s think step by step. We know that:
|
||||
|
||||
1. (X* + Y)* = (X + Y)*
|
||||
|
||||
2. X(Y + Z)? = XY + XZ
|
||||
|
||||
Using equation 1 we can rewrite (a* + b)*(c + d)? as:
|
||||
|
||||
3. (a + b)*(c + d)?
|
||||
|
||||
Using equation 2 we can rewrite equation 3 as:
|
||||
|
||||
(a + b)*c + (a + b)*d The answer is (D).'
|
||||
- question: 'The Singleton design pattern is used to guarantee that only a single instance
|
||||
of a class may be instantiated. Which of the following is (are) true of this
|
||||
design pattern?
|
||||
|
||||
I. The Singleton class has a static factory method to provide its instance.
|
||||
|
||||
II. The Singleton class can be a subclass of another class.
|
||||
|
||||
III. The Singleton class has a private constructor.
|
||||
|
||||
(A) I only
|
||||
|
||||
(B) II only
|
||||
|
||||
(C) III only
|
||||
|
||||
(D) I, II, and III'
|
||||
target: 'Let''s think step by step. Statement I is a correct statement about a
|
||||
Singleton, because a Singleton restricts instantiation to a single, static method.
|
||||
Statement II is also correct, because there is no inherent restriction regarding
|
||||
the inheritance of a Singleton. Statement III is also correct, because a Singletons
|
||||
must be instantiated only once, so its constructor is made private to prevent
|
||||
any construction except via its static factory method.
|
||||
|
||||
Given these facts, statements I, II, and III are all correct. The answer is
|
||||
(D).'
|
||||
- question: 'A certain pipelined RISC machine has 8 general-purpose registers R0, R1,
|
||||
. . . , R7 and supports the following operations:
|
||||
|
||||
ADD Rs1, Rs2, Rd (Add Rs1 to Rs2 and put the sum in Rd)
|
||||
|
||||
MUL Rs1, Rs2, Rd (Multiply Rs1 by Rs2 and put the product in Rd)
|
||||
|
||||
An operation normally takes one cycle; however, an operation takes two cycles
|
||||
if it produces a result required by the immediately following operation in an
|
||||
operation sequence.
|
||||
|
||||
Consider the expression AB + ABC + BC, where variables A, B, C are located in
|
||||
registers R0, R1, R2. If the contents of these three registers must not be modified,
|
||||
what is the minimum number of clock cycles required for an operation sequence
|
||||
that computes the value of AB + ABC + BC?
|
||||
|
||||
(A) 5 (B) 6 (C) 7 (D) 8'
|
||||
target: 'Let''s think step by step. First, we are given that A is in R0, B is
|
||||
in R1, and C is in R2.
|
||||
|
||||
Next, we can see that we must compute three multiplies (AB, BC, and ABC) and
|
||||
two adds (AB + ABC, (AB + ABC) + BC) to compute our final answer, resulting
|
||||
in a minimum of five clock cycles.
|
||||
|
||||
Next, we can see that there is no way to avoid at least one pipeline stall when
|
||||
computing our final answer, because to compute our final sum we must wait at
|
||||
least one cycle for the results from the previous stage to be ready. Thus, our
|
||||
minimum number of cycles must be 6.
|
||||
|
||||
We can verify that we can create a solution that requires only six cycles as
|
||||
follows:
|
||||
|
||||
compute AB: MUL R0, R1, R3
|
||||
|
||||
compute BC: MUL R1, R2, R4
|
||||
|
||||
compute ABC: MUL R3, R4, R5
|
||||
|
||||
compute AB + BC: ADD R3, R4, R6
|
||||
|
||||
STALL
|
||||
|
||||
compute AB + ABC + BC: ADD R5, R6, R7
|
||||
|
||||
So there are 6 cycles. The answer is (B).'
|
||||
- question: 'A compiler generates code for the following assignment statement.
|
||||
|
||||
G := (A + B) * C - (D + E) * F
|
||||
|
||||
The target machine has a single accumulator and a single-address instruction
|
||||
set consisting of instructions load, store, add, subtract, and multiply. For
|
||||
the arithmetic operations, the left operand is taken from the accumulator and
|
||||
the result appears in the accumulator. The smallest possible number of instructions
|
||||
in the resulting code is
|
||||
|
||||
(A) 5 (B) 6 (C) 7 (D) 9'
|
||||
target: 'Let''s think step by step. We can compute the final answer with the following
|
||||
sequence of operations:
|
||||
|
||||
1. LOAD D (accumulator = D)
|
||||
|
||||
2. ADD E (accumulator = D+E)
|
||||
|
||||
3. MUL F (accumulator = (D+E)*F)
|
||||
|
||||
4. STORE X (X = (D+E)*F)
|
||||
|
||||
5. LOAD A (accumulator = A)
|
||||
|
||||
6. ADD B (accumulator = A+B)
|
||||
|
||||
7. MUL C (accumulator = (A+B)*C)
|
||||
|
||||
8. SUB X (accumulator = (A+B)*C - (D+E)*F)
|
||||
|
||||
9. STORE G (G = (A+B)*C - (D+E)*F)
|
||||
|
||||
This sequence takes 9 instructions. The answer is (D).'
|
||||
- question: 'Consider a computer design in which multiple processors, each with a private
|
||||
cache memory, share global memory using a single bus. This bus is the critical
|
||||
system resource. Each processor can execute one instruction every 500 nanoseconds
|
||||
as long as memory references are satisfied by its local cache. When a cache
|
||||
miss occurs, the processor is delayed for an additional 2,000 nanoseconds. During
|
||||
half of this additional delay, the bus is dedicated to serving the cache miss.
|
||||
During the other half, the processor cannot continue, but the bus is free to
|
||||
service requests from other processors. On average, each instruction requires
|
||||
2 memory references. On average, cache misses occur on 1 percent of references.
|
||||
What proportion of the capacity of the bus would a single processor consume,
|
||||
ignoring delays due to competition from other processors?
|
||||
|
||||
(A) 1/50 (B) 1/27 (C) 1/25 (D) 2/27'
|
||||
target: 'Let''s think step by step. We know that each instruction requires two
|
||||
memory references per instruction, and that there is an average cache miss rate
|
||||
of one percent.
|
||||
|
||||
Thus a given processor has:
|
||||
|
||||
(1 cache miss / 100 references) * (2 references / instruction) =
|
||||
|
||||
(2 cache misses / 100 instructions), so:
|
||||
|
||||
misses_per_instruction = 1 cache miss / 50 instructions.
|
||||
|
||||
Next, we know that each instruction requires 500 nanoseconds when there is no
|
||||
cache miss, and 500 + 2000 = 2500 nanoseconds when there is a cache miss. Thus:
|
||||
|
||||
50 instructions / (49 * 500) + (1 * 2500) nanoseconds, so:
|
||||
|
||||
instructions_per_ns = 50 instructions / 27000 nanoseconds.
|
||||
|
||||
Now, we know that each cache miss locks the bus for half of the 2000 nanosecond
|
||||
cache miss delay, or 1000 nanoseconds, so:
|
||||
|
||||
lock_ns_per_miss = 1000 nanoseconds / cache miss.
|
||||
|
||||
Thus we can see that on average a single processor will lock the bus for:
|
||||
|
||||
lock_ns_per_miss * misses_per_instruction * instructions_per_ns =
|
||||
|
||||
(1000 nanoseconds / cache miss) * (1 cache miss / 50 instructions) * (50 instructions
|
||||
/ 27000 nanoseconds) = 1000 * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer
|
||||
is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_college_computer_science
|
||||
@@ -0,0 +1,73 @@
|
||||
dataset_name: college_mathematics
|
||||
description: The following are multiple choice questions (with answers) about college
|
||||
mathematics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Let V be the set of all real polynomials p(x). Let transformations T,
|
||||
S be defined on V by T:p(x) -> xp(x) and S:p(x) -> p''(x) = d/dx p(x), and interpret
|
||||
(ST)(p(x)) as S(T(p(x))). Which of the following is true?
|
||||
|
||||
(A) ST = 0 (B) ST = T (C) ST = TS (D) ST - TS is the identity map of V onto
|
||||
itself.'
|
||||
target: "Let's think step by step. For a given polynomial $p$ we have\n\\[ST(p)\
|
||||
\ = (xp(x))\u2019 = p(x) + xp\u2019(x)\\]\nand\n\\[TS(p) = xp\u2019(x).\\]\n\
|
||||
Hence \\[ST(p) - TS(p) = p(x) + xp\u2019(x) - xp\u2019(x).\\] The answer is\
|
||||
\ (D)."
|
||||
- question: 'Suppose that f(1 + x) = f(x) for all real x. If f is a polynomial and
|
||||
f(5) = 11, then f(15/2)
|
||||
|
||||
(A) -11 (B) 0 (C) 11 (D) 33/2'
|
||||
target: Let's think step by step. The only polynomial so that $f(1 + x) = f(x)$
|
||||
is a constant polynomial. Hence $f(5) = 11 = f(15/2)$. The answer is (C).
|
||||
- question: 'Let A be a real 2x2 matrix. Which of the following statements must be
|
||||
true?
|
||||
|
||||
I. All of the entries of A^2 are nonnegative.
|
||||
|
||||
II. The determinant of A^2 is nonnegative.
|
||||
|
||||
III. If A has two distinct eigenvalues, then A^2 has two distinct eigenvalues.
|
||||
|
||||
(A) I only (B) II only (C) III only (D) II and III only'
|
||||
target: 'Let''s think step by step. We have \[ det(A^2) = (det(A))^2 \geq 0,\]
|
||||
hence II holds.
|
||||
|
||||
III is false: as a counterexample take a diagonal matrix with -1 and 1 on the
|
||||
diagonal. Then $A^2$ is the identity matrix. The answer is (B).'
|
||||
- question: 'Let A be the set of all ordered pairs of integers (m, n) such that 7m
|
||||
+ 12n = 22. What is the greatest negative number in the set B = {m + n : (m,
|
||||
n) \in A}?
|
||||
|
||||
(A) -5 (B) -4 (C) -3 (D) -2'
|
||||
target: Let's think step by step. We have 12n = 22 - 7m and one of the solutions
|
||||
is $m = -2$, $n = 3$. Then $m + n = 1$, hence we need to look for smaller $m$
|
||||
in order to make $m + n$ negative. The next solution is $m = -14$ and $n = 10$.
|
||||
For smaller $m$ we have $m + n$ smaller than $-4$. The answer is (B).
|
||||
- question: 'A tank initially contains a salt solution of 3 grams of salt dissolved
|
||||
in 100 liters of water. A salt solution containing 0.02 grams of salt per liter
|
||||
of water is sprayed into the tank at a rate of 4 liters per minute. The sprayed
|
||||
solution is continually mixed with the salt solution in the tank, and the mixture
|
||||
flows out of the tank at a rate of 4 liters per minute. If the mixing is instantaneous,
|
||||
how many grams of salt are in the tank after 100 minutes have elapsed?
|
||||
|
||||
(A) 2 (B) 2 - e^-2 (C) 2 + e^-2 (D) 2 + e^-4'
|
||||
target: "Let's think step by step. For all $t \\in \\mathbb{R}$, let $s(t)$ denote\
|
||||
\ the number grams of salt in the tank at the $t$ minute mark. Then $s(0) =\
|
||||
\ 3$.\nWe use $s$ and $s(t)$ interchangeably. We also use $s^{\\prime}$ and\
|
||||
\ $s^{\\prime}(t)$ interchangeably. The solution sprayed into the tank adds\
|
||||
\ $(0.02) 4=2 / 25$ grams of salt per minute. There are always 100 liters of\
|
||||
\ liquid in the tank, containing $s$ grams of salt. So the density of salt in\
|
||||
\ the tank is $s / 100$ grams per liter. The flow of water out of the tank therefore\
|
||||
\ subtracts $4(s / 100)=s / 25$ grams of salt per minute. Then, for all $t \\\
|
||||
in \\mathbb{R}$, we have $s^{\\prime}(t)=(2 / 25)-(s / 25)=(2-s) / 25$, and\
|
||||
\ so $[s(t)=2] \\Rightarrow\\left[s^{\\prime}(t)=0\right]$. For all $t \\in\
|
||||
\ \\mathbb{R}$,\n$$\n\frac{d}{d t}[\\ln (s-2)]=\frac{s^{\\prime}}{s-2}=\frac{-1}{25}=\f\
|
||||
rac{d}{d t}\\left[-\frac{t}{25}\right] .\n$$\nChoose $C \\in \\mathbb{R}$ such\
|
||||
\ that, for all $t \\in \\mathbb{R}, \\ln ((s(t)-2))=-[t / 25]+C$. Let $K:=e^{C}$.\
|
||||
\ Then, for all $t \\in \\mathbb{R}$, we have $(s(t))-2=K e^{-t / 25}$, and\
|
||||
\ so $s(t)=2+K e^{-t / 25}$. Then $3=s(0)=2+K e^{0}=2+K$, so $K=1$. Then $s(100)=2+K\
|
||||
\ e^{-100 / 25}=2+1 \\cdot e^{-4}=2+e^{-4}$. The answer is (D).\n\n"
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_college_mathematics
|
||||
@@ -0,0 +1,68 @@
|
||||
dataset_name: college_medicine
|
||||
description: The following are multiple choice questions (with answers) about college
|
||||
medicine.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'An expected side effect of creatine supplementation is:
|
||||
|
||||
(A) muscle weakness. (B) gain in body mass. (C) muscle cramps. (D) loss of electrolytes.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on medicine for
|
||||
help. Creatine supplementation is a dietary supplement that results in body
|
||||
mass gain. The answer is (B).
|
||||
- question: 'Which of the following is not a true statement?
|
||||
|
||||
(A) Muscle glycogen is broken down enzymatically to glucose-1-phosphate (B)
|
||||
Elite endurance runners have a high proportion of Type I fibres in their leg
|
||||
muscles (C) Liver glycogen is important in the maintenance of the blood glucose
|
||||
concentration (D) Insulin promotes glucose uptake by all tissues in the body'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on medicine\
|
||||
\ for help. Let\u2019s solve this step by step and go over each choice: \n(A)\
|
||||
\ \u201CMuscle glycogen is broken down enzymatically to glucose-1-phosphate\u201D\
|
||||
: This is a correct statement.\n(B) \u201CElite endurance runners have a high\
|
||||
\ proportion of Type I fibres in their leg muscles\u201D: This is a correct\
|
||||
\ statement.\n(C) \u201CLiver glycogen is important in the maintenance of the\
|
||||
\ blood glucose concentration\u201D: This is a correct statement. \n(D) \u201C\
|
||||
Insulin promotes glucose uptake by all tissues in the body\u201D: This is not\
|
||||
\ a correct statement, because insulin promotes glucose uptake by the liver,\
|
||||
\ adipose tissue, and muscle, but not all tissues. For instance, the tissues\
|
||||
\ in the brain and red blood cells are not affected by insulin. The answer is\
|
||||
\ (D)."
|
||||
- question: "A high school science teacher fills a 1 liter bottle with pure nitrogen\
|
||||
\ and seals the lid. The pressure is 1.70 atm, and the room temperature is 25\xB0\
|
||||
C. Which two variables will both increase the pressure of the system, if all\
|
||||
\ other variables are held constant?\n(A) Increasing temperature, increasing\
|
||||
\ moles of gas (B) Increasing temperature, increasing volume (C) Decreasing\
|
||||
\ volume, decreasing temperature (D) Decreasing moles of gas, increasing volume"
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on medicine
|
||||
for help. The relevant equation for this is the ideal gas law: PV=nRT. To increase
|
||||
the pressure of the system (P), then either n (number of moles of the gas) or
|
||||
T (temperature) have to increase. The answer is (A).'
|
||||
- question: 'In a genetic test of a newborn, a rare genetic disorder is found that
|
||||
has X-linked recessive transmission. Which of the following statements is likely
|
||||
true regarding the pedigree of this disorder?
|
||||
|
||||
(A) All descendants on the maternal side will have the disorder. (B) Females
|
||||
will be approximately twice as affected as males in this family. (C) All daughters
|
||||
of an affected male will be affected. (D) There will be equal distribution of
|
||||
males and females affected.'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on medicine\
|
||||
\ for help. Let\u2019s solve this step by step. Let's recall first that females\
|
||||
\ have two X chromosomes, while males have one X and one Y chromosome. This\
|
||||
\ is an important fact we need to know before answering this question. \nBecause\
|
||||
\ a male can only pass his only one X chromosome to a daughter, if he is affected\
|
||||
\ by this rare genetic disorder, then we know for sure that he will pass this\
|
||||
\ rare genetic disorder to all his future-born daughters. Therefore, \u201C\
|
||||
(C): All daughters of an affected male will be affected\u201D is a correct statement.\
|
||||
\ The answer is (C)."
|
||||
- question: 'Glucose is transported into the muscle cell:
|
||||
|
||||
(A) via protein transporters called GLUT4. (B) only in the presence of insulin.
|
||||
(C) via hexokinase. (D) via monocarbylic acid transporters.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on medicine
|
||||
for help. Glucose (also known as the blood sugar) is the main sugar found in
|
||||
the human body. It is transported into the muscle cell via diffusion through
|
||||
protein transporters called GLUT4. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_college_medicine
|
||||
@@ -0,0 +1,61 @@
|
||||
dataset_name: college_physics
|
||||
description: The following are multiple choice questions (with answers) about college
|
||||
physics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'A refracting telescope consists of two converging lenses separated by
|
||||
100 cm. The eye-piece lens has a focal length of 20 cm. The angular magnification
|
||||
of the telescope is
|
||||
|
||||
(A) 4 (B) 5 (C) 6 (D) 20'
|
||||
target: Let's think step by step. In a refracting telescope, if both lenses are
|
||||
converging, the focus of both lenses must be between the two lenses, and thus
|
||||
the focal lengths of the two lenses must add up to their separation. Since the
|
||||
focal length of one lens is 20 cm, the focal length of the other must be 80
|
||||
cm. The magnification is the ratio of these two focal lengths, or 4. The answer
|
||||
is (A).
|
||||
- question: 'The muon decays with a characteristic lifetime of about 10^-6 second into
|
||||
an electron, a muon neutrino, and an electron antineutrino. The muon is forbidden
|
||||
from decaying into an electron and just a single neutrino by the law of conservation
|
||||
of
|
||||
|
||||
(A) charge (B) mass (C) energy and momentum (D) lepton number'
|
||||
target: Let's think step by step. Lepton number must be conserved, meaning the
|
||||
total number of leptons minus the number of antileptons. If a muon decays into
|
||||
an electron and a single neutrino, the total lepton number would go from one
|
||||
to two, violating lepton number conservation. The answer is (D).
|
||||
- question: 'One end of a Nichrome wire of length 2L and cross-sectional area A is
|
||||
attached to an end of another Nichrome wire of length L and cross- sectional
|
||||
area 2A. If the free end of the longer wire is at an electric potential of 8.0
|
||||
volts, and the free end of the shorter wire is at an electric potential of 1.0
|
||||
volt, the potential at the junction of the two wires is most nearly equal to
|
||||
|
||||
(A) 2.4 V (B) 3.3 V (C) 4.5 V (D) 5.7 V'
|
||||
target: Let's think step by step. This is a simple voltage divider problem, where
|
||||
the longer wire has a resistance four times that of the shorter end. So the
|
||||
voltage divider ratio is 1 / 5, meaning that the potential in the middle is
|
||||
1.0 V + (8.0 V - 1.0 V) * 1/5 = 2.4 V. The answer is (A).
|
||||
- question: 'A refracting telescope consists of two converging lenses separated by
|
||||
100 cm. The eye-piece lens has a focal length of 20 cm. The angular magnification
|
||||
of the telescope is
|
||||
|
||||
(A) 4 (B) 5 (C) 6 (D) 20'
|
||||
target: Let's think step by step. In a refracting telescope, if both lenses are
|
||||
converging, the focus of both lenses must be between the two lenses, and thus
|
||||
the focal lengths of the two lenses must add up to their separation. Since the
|
||||
focal length of one lens is 20 cm, the focal length of the other must be 80
|
||||
cm. The magnification is the ratio of these two focal lengths, or 4. The answer
|
||||
is (A).
|
||||
- question: 'For which of the following thermodynamic processes is the increase in
|
||||
the internal energy of an ideal gas equal to the heat added to the gas?
|
||||
|
||||
(A) Constant temperature (B) Constant volume (C) Constant pressure (D) Adiabatic'
|
||||
target: 'Let''s think step by step. Heat added to the gas can go into the gases
|
||||
internal energy or work done against an external force. However, if the volume
|
||||
of the gas container is constant, no work will be done (since work is pressure
|
||||
times change in volume). So, at constant volume, all of the heat goes into the
|
||||
internal energy. The answer is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_college_physics
|
||||
@@ -0,0 +1,50 @@
|
||||
dataset_name: computer_security
|
||||
description: The following are multiple choice questions (with answers) about computer
|
||||
security.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'SHA-1 has a message digest of
|
||||
|
||||
(A) 160 bits (B) 512 bits (C) 628 bits (D) 820 bits'
|
||||
target: Let's think step by step. Since SHA-1 is a hash function which takes an
|
||||
question and produces a 160-bit (20-byte) hash value, its message digest is 160
|
||||
bits. The answer is (A).
|
||||
- question: "_____________ can modify data on your system \u2013 so that your system\
|
||||
\ doesn\u2019t run correctly or you can no longer access specific data, or it\
|
||||
\ may even ask for ransom in order to give your access.\n(A) IM \u2013 Trojans\
|
||||
\ (B) Backdoor Trojans (C) Trojan-Downloader (D) Ransom Trojan"
|
||||
target: Let's think step by step. The system is asking for trojans, which are
|
||||
for ransom, which means ransom trojan. The answer is (D).
|
||||
- question: 'What is ethical hacking?
|
||||
|
||||
(A) "Hacking" ethics so they justify unintended selfish behavior (B) Hacking
|
||||
systems (e.g., during penetration testing) to expose vulnerabilities so they
|
||||
can be fixed, rather than exploited (C) Hacking into systems run by those whose
|
||||
ethics you disagree with (D) A slang term for rapid software development, e.g.,
|
||||
as part of hackathons'
|
||||
target: Let's think step by step. Ethical hacking is a process of detecting vulnerabilities
|
||||
in an application, system, or organization's infrastructure that an attacker
|
||||
can use to exploit an individual or organization. They use this process to prevent
|
||||
cyberattacks and security breaches by lawfully hacking into the systems and
|
||||
looking for weak points. The answer is (B).
|
||||
- question: 'The ____________ is anything which your search engine cannot search.
|
||||
|
||||
(A) Haunted web (B) World Wide Web (C) Surface web (D) Deep Web'
|
||||
target: "Let's think step by step. The search engine searches on the Surface Web,\
|
||||
\ which is the portion of the world wide web which is visible so (B,C) are wrong.\
|
||||
\ The Haunted Web doesn\u2019t correspond to an internet concept. The Deep Web\
|
||||
\ is the part of the World Wide Web which is not indexed. The answer is (D)."
|
||||
- question: 'Exploitation of the Heartbleed bug permits
|
||||
|
||||
(A) overwriting cryptographic keys in memory (B) a kind of code injection (C)
|
||||
a read outside bounds of a buffer (D) a format string attack'
|
||||
target: 'Let''s think step by step. The Heartbleed Bug is a serious vulnerability
|
||||
in the popular OpenSSL cryptographic software library. Heartbleed resulted from
|
||||
improper question validation (due to a missing bounds check) in the implementation
|
||||
of the TLS heartbeat extension. The vulnerability was classified as a buffer
|
||||
over-read, a situation where more data can be read than should be allowed. The
|
||||
answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_computer_security
|
||||
@@ -0,0 +1,49 @@
|
||||
dataset_name: conceptual_physics
|
||||
description: '
|
||||
|
||||
The following are multiple choice questions (with answers) about conceptual physics.'
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Colors in a soap bubble result from light
|
||||
|
||||
(A) converted to a different frequency (B) deflection (C) interference (D) polarization'
|
||||
target: Let's think step by step. In a soap bubble film, the light bounces between
|
||||
the two soap-air interfaces many times, interfering with itself constructively
|
||||
or destructively depending on the width of the film. This results in different
|
||||
colors being visible. The answer is (C).
|
||||
- question: 'Compared with the mass of a uranium atom undergoing fission, the combined
|
||||
masses of the products after fission are
|
||||
|
||||
(A) less (B) more (C) the same (D) zero'
|
||||
target: Let's think step by step. Fission releases energy, which comes from the
|
||||
rest mass of its initial nucleus. Thus the mass of the products is less than
|
||||
the mass of the reactant uranium nucleus. The answer is (A).
|
||||
- question: 'Things that are equivalent according to the equivalence principle are
|
||||
|
||||
(A) space and time. (B) a traveling twin and a stay-at-home twin. (C) gravity
|
||||
and acceleration. (D) mass and energy.'
|
||||
target: "Let's think step by step. Einstein\u2019s famous equivalence principle\
|
||||
\ states that gravity and acceleration are equivalent. The answer is (C)."
|
||||
- question: 'Which of these three elements has the most mass per nucleon?
|
||||
|
||||
(A) Hydrogen (B) Iron (C) Uranium (D) Same in each'
|
||||
target: Let's think step by step. Due to nuclear binding energy, the mass of an
|
||||
atomic nucleus is less than the sum of individual masses of the free constituent
|
||||
protons and neutrons; this is known as the mass defect. Hydrogen has no mass
|
||||
defect because it has only a single nucleon, so it will have the most mass per
|
||||
nucleon. The answer is (A).
|
||||
- question: 'A model airplane flies slower when flying into the wind and faster with
|
||||
wind at its back. When launched at right angles to the wind a cross wind its
|
||||
groundspeed compared with flying in still air is
|
||||
|
||||
(A) the same (B) greater (C) less (D) either greater or less depending on wind
|
||||
speed'
|
||||
target: "Let's think step by step. The plane\u2019s speed in the direction of\
|
||||
\ the wind is greater than it would be in the absence of wind, and its direction\
|
||||
\ orthogonal to the wind is the same as it would be in the absence of the wind.\
|
||||
\ The total speed, which is these two components added in quadrature, is thus\
|
||||
\ greater than the speed in still air. The answer is (B).\n\n"
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_conceptual_physics
|
||||
87
evaluation/data/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
Normal file
87
evaluation/data/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
Normal file
@@ -0,0 +1,87 @@
|
||||
dataset_name: econometrics
|
||||
description: The following are multiple choice questions (with answers) about econometrics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Suppose now that a researcher wishes to use information criteria to determine
|
||||
the optimal lag length for a VAR. 500 observations are available for the bi-variate
|
||||
VAR, and the values of the determinant of the variance-covariance matrix of
|
||||
residuals are 0.0336, 0.0169, 0.0084, and 0.0062 for 1, 2, 3, and 4 lags respectively.
|
||||
What is the optimal model order according to Akaike''s information criterion?
|
||||
|
||||
(A) 1 lag (B) 2 lags (C) 3 lags (D) 4 lags'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on econometrics\
|
||||
\ for help. Let\u2019s solve this problem step by step. First of all, let\u2019\
|
||||
s recall that for a given set of data, Akaike's information criterion (AIC)\
|
||||
\ allows us to measure how well a statistical model fits the data; it is an\
|
||||
\ estimator of prediction error. Here in this problem we will need to use the\
|
||||
\ formula ln(det(sigma_hat)) + (2 * k / T) to determine the values of Akaike\u2019\
|
||||
s criterion, where ln denotes the natural log function, det the determinant\
|
||||
\ function, k the total number of parameters in total (across both equations),\
|
||||
\ and T the number of observations (which, in this case, is equal to 500). For\
|
||||
\ 1 lag, the number of parameters in total is equal to 6; for 2 lags, it is\
|
||||
\ 10; for 3 lags, it is 14; and for 4 lags, it is 18. Now, let\u2019s calculate\
|
||||
\ the values of the criterion for each lag:\n(A) 1 lag: ln(0.0336) + (2 * 6\
|
||||
\ / 500) = ln(0.0336) + (12 / 500) = -3.369\n(B) 2 lags: ln(0.0169) + (2 * 10\
|
||||
\ / 500) = ln(0.0169) + (20 / 500) = -4.040\n(C) 3 lags: ln(0.0084) + (2 * 14\
|
||||
\ / 500) = ln(0.0084) + (28 / 500) =-4.724\n(D) 4 lags: ln(0.0062) + (2 * 18\
|
||||
\ / 500) = ln(0.0062) + (36 / 500) =-5.011\nBecause the optimal model order\
|
||||
\ according to AIC minimizes the information criterion, the answer should be\
|
||||
\ the one with the lowest value. In this case, (D) has the lowest value. The\
|
||||
\ answer is (C)."
|
||||
- question: 'Consider the following AR(1) model with the disturbances having zero mean
|
||||
and unit variance
|
||||
|
||||
yt = 0.2 + 0.4 yt-1 + ut
|
||||
|
||||
The (unconditional) mean of y will be given by
|
||||
|
||||
(A) 0.2 (B) 0.4 (C) 0.5 (D) 0.33'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on econometrics\
|
||||
\ for help. Let\u2019s solve this problem step by step. If we have a an AR(1)\
|
||||
\ model with the disturbances having zero mean and unit variance, then the unconditional\
|
||||
\ mean of y is equal to the following:\nunconditional mean of y = (the intercept\
|
||||
\ term) / (1 - autoregressive coefficient)\nWe know that the intercept term\
|
||||
\ is 0.2 and the autoregressive coefficient is 0.4; thus, we have:\nunconditional\
|
||||
\ mean of y = (0.2) / (1 - 0.4) = (0.2) / (0.6) = 2 / 6 = 1 / 3, which is approximately\
|
||||
\ 0.33. That means that the answer should be (D) 0.33. The answer is (D)."
|
||||
- question: 'What would be then consequences for the OLS estimator if heteroscedasticity
|
||||
is present in a regression model but ignored?
|
||||
|
||||
(A) It will be biased (B) It will be inconsistent (C) It will be inefficient
|
||||
(D) All of (a), (b) and (c) will be true.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on econometrics
|
||||
for help. Heteroscedasticity refers to the condition where the variance of the
|
||||
error terms is not constant across multiple observations. If heteroscedasticity
|
||||
is present in a regression model, then the coefficient estimates in the OLS
|
||||
estimator will be not only unbiased and consistent but also inefficient. Because
|
||||
(A) and (B) are incorrect choices and (C) is a correct choice, (D) cannot be
|
||||
the right answer. Ultimately, (C) is the only true choice. The answer is (C).
|
||||
- question: 'Suppose that a test statistic has associated with it a p-value of 0.08.
|
||||
Which one of the following statements is true?
|
||||
|
||||
(i) If the size of the test were exactly 8%, we would be indifferent between
|
||||
rejecting and not rejecting the null hypothesis
|
||||
|
||||
(ii) The null would be rejected if a 10% size of test were used
|
||||
|
||||
(iii) The null would not be rejected if a 1% size of test were used
|
||||
|
||||
(iv) The null would be rejected if a 5% size of test were used.
|
||||
|
||||
(A) (ii) and (iv) only (B) (i) and (iii) only (C) (i), (ii), and (iii) only
|
||||
(D) (i), (ii), (iii), and (iv).'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on econometrics\
|
||||
\ for help. Let\u2019s reason about each of the options.\n(i) is a true statement.\n\
|
||||
(ii) is a true statement.\n(iii) is a true statement.\n(iv) is not a true statement.\
|
||||
\ Thus, (i), (ii), and (iii) are true. The answer is (C)."
|
||||
- question: 'For a stationary autoregressive process, shocks will
|
||||
|
||||
(A) Eventually die away (B) Persist indefinitely (C) Grow exponentially (D)
|
||||
Never occur'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on econometrics
|
||||
for help. This is a formal logic problem about stationally process. For a stationary
|
||||
autoregressive process, shocks will eventually die away. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_econometrics
|
||||
@@ -0,0 +1,47 @@
|
||||
dataset_name: electrical_engineering
|
||||
description: '
|
||||
|
||||
The following are multiple choice questions (with answers) about electrical engineering.'
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: "A point pole has a strength of 4\u03C0 * 10^-4 weber. The force in newtons\
|
||||
\ on a point pole of 4\u03C0 * 1.5 * 10^-4 weber placed at a distance of 10\
|
||||
\ cm from it will be\n(A) 15 N. (B) 20 N. (C) 7.5 N. (D) 3.75 N."
|
||||
target: "Let's think step by step. The force between two point poles is given\
|
||||
\ by m_1m_2/(mu_0 4 \\pi r^2), in analogy to Coulomb\u2019s law. Plugging in\
|
||||
\ the values given in the question, we calculate that the force is approximately\
|
||||
\ 15 N. The answer is (A)."
|
||||
- question: 'The coil of a moving coil meter has 100 turns, is 40 mm long and 30 mm
|
||||
wide. The control torque is 240*10-6 N-m on full scale. If magnetic flux density
|
||||
is 1Wb/m2 range of meter is
|
||||
|
||||
(A) 1 mA. (B) 2 mA. (C) 3 mA. (D) 4 mA.'
|
||||
target: Let's think step by step. The torque on a coil in a uniform magnetic field
|
||||
is given by BANI, where B is the magnetic flux density, A is the area of the
|
||||
coil, N is the number of turns, and I is the current. So we have that I = (Torque)/(BAN),
|
||||
or 240e-6/(1200e-6 * 100 * 1) = 2e-3. The answer is (B).
|
||||
- question: 'In an SR latch built from NOR gates, which condition is not allowed
|
||||
|
||||
(A) S=0, R=0 (B) S=0, R=1 (C) S=1, R=0 (D) S=1, R=1'
|
||||
target: Let's think step by step. An SR latch is a set-reset latch; in the case
|
||||
where S=1 and R=1, the circuit has no stable state; instead a race condition
|
||||
will be produced within the circuit, so the device will be in an undefined state.
|
||||
So S=1, R=1 is an illegal question. The answer is (D).
|
||||
- question: 'Two long parallel conductors carry 100 A. If the conductors are separated
|
||||
by 20 mm, the force per meter of length of each conductor will be
|
||||
|
||||
(A) 100 N. (B) 0.1 N. (C) 1 N. (D) 0.01 N.'
|
||||
target: Let's think step by step. The magnetic force-per-length between two current-carrying
|
||||
conductors is given by \mu_0 I_1 I_2 / (2 \pi r), where $r$ is the separation
|
||||
distance and I_1 and I_2 are the currents. Plugging in 100 A for I_1 and I_2,
|
||||
and 20 mm for r, gives 0.1 N. The answer is (B).
|
||||
- question: "In a 2 pole lap winding dc machine , the resistance of one conductor is\
|
||||
\ 2\u03A9 and total number of conductors is 100. Find the total resistance\n\
|
||||
(A) 200\u03A9 (B) 100\u03A9 (C) 50\u03A9 (D) 10\u03A9"
|
||||
target: 'Let''s think step by step. In lap winding, effectively two resistors
|
||||
are connected in parallel, so the actual resistance of each pair is 1 Ohm. Since
|
||||
we have 50 pairs, we get a total resistance of 50 Ohms. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_electrical_engineering
|
||||
@@ -0,0 +1,77 @@
|
||||
dataset_name: elementary_mathematics
|
||||
description: The following are multiple choice questions (with answers) about elementary
|
||||
mathematics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Olivia used the rule "Add 11" to create the number pattern shown below.
|
||||
10, 21, 32, 43, 54. Which statement about the number pattern is true?
|
||||
|
||||
(A) The 10th number in the pattern will be an even number.
|
||||
|
||||
(B) The number pattern will never have two even numbers next to each other.
|
||||
|
||||
(C) The next two numbers in the pattern will be an even number then an odd number.
|
||||
|
||||
(D) If the number pattern started with an odd number then the pattern would
|
||||
have only odd numbers in it.'
|
||||
target: Let's think step by step. Choice A is incorrect because every even-numbered
|
||||
term in the pattern is odd, and 10 is an even number. Choice B is correct, because
|
||||
adding an odd number (in this case 11) to an odd number produces an even number,
|
||||
and adding an odd number to an even number produces an odd number. Thus the
|
||||
terms in the pattern will alternate between odd and even, so there will never
|
||||
be two even numbers next to each other. Choice C is incorrect because the last
|
||||
term in the example is even (54), and we know that the terms will alternate
|
||||
between even and odd. Choice D is incorrect because the terms in the pattern
|
||||
will alternate between odd and even, regardless of the value of the first term.
|
||||
The answer is (B).
|
||||
- question: 'The population of the city where Michelle was born is 145,826. What is
|
||||
the value of the 5 in the number 145,826?
|
||||
|
||||
(A) 5 thousands
|
||||
|
||||
(B) 5 hundreds
|
||||
|
||||
(C) 5 tens
|
||||
|
||||
(D) 5 ones'
|
||||
target: 'Let''s think step by step. Choice A is correct, because there are three
|
||||
digits following the 5, so
|
||||
|
||||
the 5 is in the thousands place. Thus the other choices are incorrect. The answer
|
||||
is (A).'
|
||||
- question: "A store sells 107 different colors of paint. They have 25 cans of each\
|
||||
\ color in storage. The number of cans of paint the store has in storage can\
|
||||
\ be found using the expression below. 107 \xD7 25. How many cans of paint does\
|
||||
\ the store have in storage?\n(A) 749\n(B) 2,675\n(C) 2,945\n(D) 4,250"
|
||||
target: Let's think step by step. We can calculate 107 x 25 = (100 x 25) + (7
|
||||
x 25) = 2500 + 175 = 2675. The answer is (B).
|
||||
- question: 'A total of 30 players will play basketball at a park. There will be exactly
|
||||
5 players on each team. Which statement correctly explains how to find the number
|
||||
of teams needed?
|
||||
|
||||
(A) Add 5 to 30 to find 35 teams.
|
||||
|
||||
(B) Divide 30 by 5 to find 6 teams.
|
||||
|
||||
(C) Multiply 30 and 5 to find 150 teams.
|
||||
|
||||
(D) Subtract 5 from 30 to find 25 teams.'
|
||||
target: Let's think step by step. We want to find the number of teams. We know
|
||||
that there are 5 players/team, and 30 players. Thus to get the number of teams
|
||||
we divide players by players/team, so 30 players / 5 players/team = 6 teams.
|
||||
The answer is (B).
|
||||
- question: 'Which expression is equivalent to 5 x 9?
|
||||
|
||||
(A) (5 x 4) x (6 x 5)
|
||||
|
||||
(B) (5 x 5) + (5 x 4)
|
||||
|
||||
(C) (5 x 5) + (5 x 9)
|
||||
|
||||
(D) (5 x 9) x (6 x 9)'
|
||||
target: 'Let''s think step by step. We know that 9 = (5 + 4), so 5 x 9 = 5 x (5
|
||||
+ 4) = (5 x 5) + (5 x 4). The answer is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_elementary_mathematics
|
||||
70
evaluation/data/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
Normal file
70
evaluation/data/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
Normal file
@@ -0,0 +1,70 @@
|
||||
dataset_name: formal_logic
|
||||
description: The following are multiple choice questions (with answers) about formal
|
||||
logic.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: "Which of the given formulas of PL is the best symbolization of the following\
|
||||
\ sentence?\nTurtles live long lives and are happy creatures, unless they are\
|
||||
\ injured.\n(A) (L \u2022 H) \u2261 I (B) (L \u2022 H) \u2228 I (C) L \u2022\
|
||||
\ (H \u2228 I) (D) L \u2022 (H \u2283 R)."
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on formal logic\
|
||||
\ for help. Let\u2019s solve this step by step. Let \u201CL\u201D denote \u201C\
|
||||
living long\u201D, H \u201Cbeing happy\u201D, and \u201CI\u201D \u201Cbeing\
|
||||
\ injured\u201D. Now, consider each choice:\n(A) means (living long AND being\
|
||||
\ happy) is equivalent to (being injured). \n(B) means (living long AND being\
|
||||
\ happy) OR (being injured). \n(C) means (living long) AND (being happy OR being\
|
||||
\ injured). \n(D) means (living long) AND (being happy implies being R), but\
|
||||
\ what R denotes is not clear.\nObviously, (B) is the best symbolization of\
|
||||
\ the original sentence. The answer is (B)."
|
||||
- question: 'Select the best translation into predicate logic.George borrows Hector''s
|
||||
lawnmower. (g: George; h: Hector; l: Hector''s lawnmower; Bxyx: x borrows y
|
||||
from z).
|
||||
|
||||
(A) Blgh (B) Bhlg (C) Bglh (D) Bghl'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on formal logic\
|
||||
\ for help. Let\u2019s solve this step by step. We are told that \u201CBxyx\u201D\
|
||||
\ means \u201Cx borrows y from z\u201D. We can rewrite \u201CGeorge borrows\
|
||||
\ Hector's lawnmower\u201D as \u201CGeorge borrows a lawnmower from Hector\u201D\
|
||||
, which can then be translated into predicate logic as \u201CBglh\u201D. The\
|
||||
\ answer \u201CBglh\u201D appears in (C); therefore, (C) must be the correct\
|
||||
\ answer. The answer is (C)."
|
||||
- question: "\nSelect the best English interpretation of the given arguments in predicate\
|
||||
\ logic.\nDm\n(\u2200x)(Wx \u2283 ~Dx). \n(\u2200x)Wx \u2228 Ag\t/ (\u2203x)Ax\n\
|
||||
(A) Marina is a dancer. Some weaklings are not dancers. Either everything is\
|
||||
\ a weakling or Georgia plays volleyball. So something plays volleyball. (B)\
|
||||
\ Marina is a dancer. No weakling is a dancer. Everything is either a weakling\
|
||||
\ or plays volleyball. So something plays volleyball. (C) Marina is a dancer.\
|
||||
\ Some weaklings are not dancers. Everything is either a weakling or plays volleyball.\
|
||||
\ So something plays volleyball. (D) Marina is a dancer. No weakling is a dancer.\
|
||||
\ Either everything is a weakling or Georgia plays volleyball. So something\
|
||||
\ plays volleyball."
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on formal logic\
|
||||
\ for help. Let\u2019s solve this step by step. Let \u201CD\u201D denote \u201C\
|
||||
being a dancer\u201D, \u201Cm\u201D denote \u201CMaria\u201D, \u201Cg\u201D\
|
||||
\ denote \u201CGeorgia\u201D, \u201CW\u201D denote \u201Cweakling\u201D, \u201C\
|
||||
A\u201D denote \u201Cplaying volleyball\u201D. Then, we have the following:\n\
|
||||
1. Dm \u2192 Maria is a dance.\n2. (\u2200x)(Wx \u2283 ~Dx). \u2192 For all\
|
||||
\ x, if x is a weakling, then x is not a dancer. In other words, no weakling\
|
||||
\ is a dancer.\n3. (\u2200x)Wx \u2228 Ag\t/ (\u2203x)Ax \u2192 For all x, x\
|
||||
\ is a weakling or Georgia plays volleyball. So there exists an x that plays\
|
||||
\ volleyball. \nOptions (A) and (C) do claim that some weaklings are not dancers,\
|
||||
\ but the second argument strongly states that no weakling is a dancer. Thus,\
|
||||
\ we can eliminate them. Option (B) omits the important detail about Georgia\
|
||||
\ playing volleyball. Option (D) has all the details presented in the arguments\
|
||||
\ and is the best English interpretation of the arguments. The answer is (D)."
|
||||
- question: "Select the best translation into predicate logic: No people drive on Mars.\n\
|
||||
(A) ~Pd (B) (\u2200x)(Px \u2228 ~Dx) (C) (\u2200x)(Px \u2283 ~Dx) (D) ~Dp"
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on formal logic\
|
||||
\ for help. Let\u2019s solve this step by step. Let \u201CP\u201D denote \u201C\
|
||||
being on Mars\u201D and \u201CD\u201D denote \u201Cdriving on Mars\u201D. Then\
|
||||
\ let\u2019s consider each option:\nOption (A): ~Pd \u2192 d is not on Mars.\n\
|
||||
Option (B): (\u2200x)(Px \u2228 ~Dx) \u2192 For all x, x is on Mars and x do\
|
||||
\ not drive on Mars.\nOption (C): (\u2200x)(Px \u2283 ~Dx) \u2192 For all x,\
|
||||
\ x is on Mars implies that x do not drive on Mars.\nOption (D): ~Dp: \u2192\
|
||||
\ p do not drive on Mars.\nOf all these options, Option (C) appears to be the\
|
||||
\ best and most meaningful interpretation of the argument \u201CNo people drive\
|
||||
\ on Mars.\u201D The answer is (C).\n\n"
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_formal_logic
|
||||
49
evaluation/data/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
Normal file
49
evaluation/data/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
Normal file
@@ -0,0 +1,49 @@
|
||||
dataset_name: global_facts
|
||||
description: The following are multiple choice questions (with answers) about global
|
||||
facts.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: "As of 2017, how many of the world\u2019s 1-year-old children today have\
|
||||
\ been vaccinated against some disease? *\n(A) 80% (B) 60% (C) 40% (D) 20%"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on global facts
|
||||
for help. According to data published by the World Health Organization, the
|
||||
nummber of 1-year-old children vaccinated in 2017 exceeds 80%. The answer is
|
||||
(A).
|
||||
- question: 'As of 2019, about what percentage of Americans agree that the state is
|
||||
run for the benefit of all the people?
|
||||
|
||||
(A) 31% (B) 46% (C) 61% (D) 76%'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on global facts
|
||||
for help. In 2019, about 46% percentage of Americans agree that the state is
|
||||
run for the benefit of all the people. The answer is (B).
|
||||
- question: 'As of 2019, about what percentage of Russians say it is very important
|
||||
to have free media in our country without government/state censorship?
|
||||
|
||||
(A) 38% (B) 53% (C) 68% (D) 83%'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on global facts
|
||||
for help. As of 2019, about 38% of Russians say it is very important to have
|
||||
free media in our country. The answer is (A).
|
||||
- question: 'As of 2015, since 1990 forests have ____ in Europe and have ____ in Africa
|
||||
and the Americas.
|
||||
|
||||
(A) increased, increased (B) increased, decreased (C) decreased, increased (D)
|
||||
decreased, decreased'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on global facts
|
||||
for help. As of 2015, since 1990 forests have increased in Europe and have decreased
|
||||
in Africa and the Americas. The answer is (B).
|
||||
- question: 'Which of the following pairs of statements are both true (as of 2019)?
|
||||
|
||||
(A) People tend to be optimistic about their own future and the future of their
|
||||
nation or the world. (B) People tend to be optimistic about their own future
|
||||
but pessimistic about the future of their nation or the world. (C) People tend
|
||||
to be pessimistic about their own future but optimistic about the future of
|
||||
their nation or the world. (D) People tend to be pessimistic about their own
|
||||
future and the future of their nation or the world.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on global facts
|
||||
for help. As of 2019, most people tend to be optimistic about their own future
|
||||
but pessimistic about the future of their nation or the world. The answer is
|
||||
(B).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_global_facts
|
||||
@@ -0,0 +1,69 @@
|
||||
dataset_name: high_school_biology
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school biology.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: "In animal cells, which of the following represents the most likely pathway\
|
||||
\ that a secretory protein takes as it is synthesized in a cell?\n(A) Plasma\
|
||||
\ membrane\u2013Golgi apparatus\u2013ribosome\u2013secretory vesicle\u2013rough\
|
||||
\ ER (B) Ribosome\u2013Golgi apparatus\u2013rough ER\u2013secretory vesicle\u2013\
|
||||
plasma membrane (C) Plasma membrane\u2013Golgi apparatus\u2013ribosome\u2013\
|
||||
secretory vesicle\u2013rough ER (D) Ribosome\u2013rough ER\u2013Golgi apparatus\u2013\
|
||||
secretory vesicle\u2013plasma membrane"
|
||||
target: Let's think step by step. Protein synthesis starts at the ribosome, so
|
||||
we can eliminate (A) and (C). The ribosome is often in the endoplasmic reticulum
|
||||
and moves from there to the Golgi apparatus, where it is modified and packaged
|
||||
into a vesicle. The vesicle then floats to the plasma membrane and is secreted.
|
||||
The answer is (D).
|
||||
- question: "A mutation in a bacterial enzyme changed a previously polar amino acid\
|
||||
\ into a nonpolar amino acid. This amino acid was located at a site distant\
|
||||
\ from the enzyme\u2019s active site. How might this mutation alter the enzyme\u2019\
|
||||
s substrate specificity?\n(A) By changing the enzyme\u2019s pH optimum (B) By\
|
||||
\ changing the enzyme\u2019s location in the cell (C) By changing the shape\
|
||||
\ of the protein (D) An amino acid change away from the active site cannot alter\
|
||||
\ the enzyme\u2019s substrate specificity."
|
||||
target: Let's think step by step. A change in an amino acid leads to a change
|
||||
in the primary structure of the protein. A change in the primary structure may
|
||||
lead to a change in the secondary and the tertiary structure of the protein.
|
||||
A change in the tertiary structure means a change in the shape of the protein,
|
||||
so (C) has to be correct. Since the change does not affect the active site of
|
||||
the enzyme, we do not expect the activity of the enzyme to be affected. The
|
||||
answer is (C).
|
||||
- question: 'Which of the following is not a way to form recombinant DNA?
|
||||
|
||||
(A) Translation (B) Conjugation (C) Specialized transduction (D) Transformation'
|
||||
target: 'Let''s think step by step. The introduction of foreign DNA or RNA into
|
||||
bacteria or eukaryotic cells is a common technique in molecular biology and
|
||||
scientific research. There are multiple ways foreign DNA can be introduced into
|
||||
cells including transformation, transduction, conjugation, and transfection.
|
||||
In contrast, (A) is not a way to form DNA: during translation the ribosomes
|
||||
synthesize proteins from RNA. The answer is (A).'
|
||||
- question: 'Homologous structures are often cited as evidence for the process of natural
|
||||
selection. All of the following are examples of homologous structures EXCEPT
|
||||
|
||||
(A) the wings of a bird and the wings of a bat (B) the flippers of a whale and
|
||||
the arms of a man (C) the pectoral fins of a porpoise and the flippers of a
|
||||
seal (D) the forelegs of an insect and the forelimbs of a dog'
|
||||
target: "Let's think step by step. \u200B\u200BHomologous structures are similar\
|
||||
\ physical features in organisms that share a common ancestor \u200B\u200Bbut\
|
||||
\ different functions. Comparisons (B) and (C) are clearly homologous because\
|
||||
\ they share a common ancestor and the structures serve different purposes.\
|
||||
\ Bat wings and birg wings are also homologous, while they are both wings, the\
|
||||
\ forelimbs serve different purposes. Insects and dogs are very far ancestors\
|
||||
\ since one is vertebrate while the other is invertebrate and the forelimbs\
|
||||
\ serve the same purpose, so they are not homologous. The answer is (D)."
|
||||
- question: 'Which of the following is not known to be involved in the control of cell
|
||||
division?
|
||||
|
||||
(A) Cyclins (B) Protein kinases (C) Checkpoints (D) Fibroblast cells'
|
||||
target: 'Let''s think step by step. Normal cells move through the cell cycle in
|
||||
a regulated way. At the checkpoint stage, they use information about their own
|
||||
internal state and cues from the environment around them to decide whether to
|
||||
proceed with cell division. Cues like these act by changing the activity of
|
||||
core cell cycle regulators inside the cell. The most common regulators are cyclins
|
||||
and cyclin-dependent kinases. Fibroblast cells do not play any role in cell
|
||||
division. The answer is (D).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_biology
|
||||
@@ -0,0 +1,66 @@
|
||||
dataset_name: high_school_chemistry
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school chemistry.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of the following is considered an acid anhydride?
|
||||
|
||||
(A) HCl (B) H2SO3 (C) SO2 (D) Al(NO3)3'
|
||||
target: Let's think step by step. An acid anhydride is a compound that is derived
|
||||
by removing water from an acid. The chemical formula for water is H2O, which
|
||||
means that we need to determine which of these options, when combined with H2O,
|
||||
forms an acid. SO2, or Sulfur dioxide, when combined with H2O, makes H2SO4,
|
||||
or sulfuric acid. The answer is (C).
|
||||
- question: 'Which of the following is expected to be a polar molecule?
|
||||
|
||||
(A) PCl4F (B) BF3 (C) CO2 (D) Si(CH3)4'
|
||||
target: Let's think step by step. A polar molecule is one that has a slightly
|
||||
positive charge on one end of the molecule and a slightly negative charge on
|
||||
the other end. Boron trifluoride (BF3) has Boron as the center atom and three
|
||||
fluorine atoms attached to it; it is trigonal planar and symmetric, so it is
|
||||
nonpolar. Carbon Dioxide (CO2) has Carbon as the central atom with double bonds
|
||||
to two Oxygen atoms - this is also symmetrical and therefore nonpolar. The same
|
||||
is the case for tetramethyl silane (SI(CH3)4), which is a Silicon atom surrounded
|
||||
by four methyl groups. The structure of PCL4F is that Phosphorus is the central
|
||||
atom, attached to four chlorines and one fluorine atom. This is asymmetrical,
|
||||
and therefore has a net dipole and is expected to be a polar molecule. The answer
|
||||
is (A).
|
||||
- question: 'From the solubility rules, which of the following is true?
|
||||
|
||||
(A) All chlorides, bromides, and iodides are soluble (B) All sulfates are soluble
|
||||
(C) All hydroxides are soluble (D) All ammonium-containing compounds are soluble'
|
||||
target: Let's think step by step. The chlorides, bromides, and iodides of lead,
|
||||
silver, and mercury are not soluble in water. This rules out (A). The sulfates
|
||||
of lead, barium, and calcium are not soluble in water, which rules out (B).
|
||||
The hydroxides of any metal besides sodium, potassium, ammonium, calcium, and
|
||||
barium are insoluble. This rules out (C). Typically ammonium ions indicate a
|
||||
soluble ionic substance. The answer is (D).
|
||||
- question: 'A new compound is synthesized and found to be a monoprotic acid with a
|
||||
molar mass of 248 g/mol. When 0.0050 mol of this acid are dissolved in 0.500
|
||||
L of water, the pH is measured as 3.89. What is the pKa of this acid?
|
||||
|
||||
(A) 3.89 (B) 7.78 (C) 5.78 (D) 2.33'
|
||||
target: "Let's think step by step. Recall that $[A] = [H^{+}]$. Here, this is\
|
||||
\ equal to $$10^{-3.89}$. Then we have $K_{a} = $\nrac{[H^{+}][A^{-}]}{[HA]}\
|
||||
\ = \nrac{10^{-3.89} \\cdot 10^{-3.89}}{10^{-2}}. The resulting exponent is\
|
||||
\ $-3.89 + (-3.89) - (-2) = 5.78$, therefore $K_a = 10^{-5.78}$. The $pK_a$\
|
||||
\ is the negative log of $K_a$, which is equal to $5.78$. The answer is (C)."
|
||||
- question: 'A solution contains 2.00 mole of acetic acid, CH3COOH, and 1.00 mole of
|
||||
calcium acetate, Ca(CH3COO)2. The solution is able to resist the addition of
|
||||
a small amount of strong acid or strong base with only minor changes in the
|
||||
pH of the solution. Larger quantities of strong acid or strong base can cause
|
||||
a significant change in pH. How many moles of nitric acid, HNO3, may be added
|
||||
before the pH begins to change significantly?
|
||||
|
||||
(A) 0.500 mole (B) 1.00 mole (C) 2.00 mole (D) 3.00 mole'
|
||||
target: "Let's think step by step. We would like to compute the buffer capacity\
|
||||
\ of this solution. First we write the equation for the ionization of the weak\
|
||||
\ acid, in this case of acetic acid. $CH_{3}COOH (aq) + H_{2}O \nightarrow H_{3}O^{+}\
|
||||
\ + CH3COO^{-}$. The conjugate base is therefore the acetate ion. The added\
|
||||
\ strong acid, Nitric acid, will react with the conjugate base. Therefore the\
|
||||
\ maximum amount of acid that can be added will be equal to the amount of acetate\
|
||||
\ ion, or 2 moles. The answer is (C).\n\n"
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_chemistry
|
||||
@@ -0,0 +1,84 @@
|
||||
dataset_name: high_school_computer_science
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school computer science.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of the following is an example of the use of a device on the Internet
|
||||
of Things (IoT) ?
|
||||
|
||||
(A) A car alerts a driver that it is about to hit an object. (B) A hiker uses
|
||||
a G P S watch to keep track of her position. (C) A refrigerator orders milk
|
||||
from an online delivery service when the milk in the refrigerator is almost
|
||||
gone. (D) A runner uses a watch with optical sensors to monitor his heart rate.'
|
||||
target: Let's think step by step. The term Internet of Things (IoT) refers to
|
||||
common devices which are connected to the internet, enabling new functionality.
|
||||
Choice A is incorrect because it does not describe an internet connected device.
|
||||
In choice B, the watch is only described as having GPS functionality but no
|
||||
internet connectivity. Choice C describes a common device (a refrigerator) which
|
||||
has internet connectivity enabling new functionality (online ordering). Choice
|
||||
D does not mention internet connectivity for the watch, only optical sensors.
|
||||
The answer is (C).
|
||||
- question: 'Many Web browsers allow users to open anonymous windows. During a browsing
|
||||
session in an anonymous window, the browser does not record a browsing history
|
||||
or a list of downloaded files. When the anonymous window is exited, cookies
|
||||
created during the session are deleted. Which of the following statements about
|
||||
browsing sessions in an anonymous window is true?
|
||||
|
||||
(A) The activities of a user browsing in an anonymous window will not be visible
|
||||
to people who monitor the user''s network, such as the system administrator.
|
||||
(B) Items placed in a Web store''s shopping cart for future purchase during
|
||||
the anonymous browsing session will not be saved on the user''s computer. (C)
|
||||
A user will not be able to log in to e-mail or social media accounts during
|
||||
the anonymous browsing session. (D) A user browsing in an anonymous window will
|
||||
be protected from viruses launched from any web sites visited or files downloaded.'
|
||||
target: "Let's think step by step. Choice A is incorrect as it only describes\
|
||||
\ network traffic, which an anonymous browser does not change. Choice B is correct\
|
||||
\ as it correctly describes how an anonymous browser will prevent saving data\
|
||||
\ on the user\u2019s computer after the session is ended. Choice C is incorrect\
|
||||
\ because an anonymous browser will not prevent logging in to email or social\
|
||||
\ media accounts. Choice D is incorrect because an anonymous browser in itself\
|
||||
\ performs no virus protection. The answer is (B)."
|
||||
- question: "In the program below, the initial value of X is 5 and the initial value\
|
||||
\ of Y is 10.\nIF (X < 0){\n DISPLAY (\"Foxtrot\")\n} ELSE {\n IF (X > Y){\n\
|
||||
\ DISPLAY (\"Hotel\")\n } ELSE {\n IF (Y > 0){\n DISPLAY (\"November\")\n\
|
||||
\ } ELSE {\n DISPLAY (\"Yankee\")\n }\n }\n}\nWhat is displayed as a result\
|
||||
\ of running the program?\n(A) Foxtrot (B) Hotel (C) November (D) Yankee"
|
||||
target: Let's think step by step. Because X has the value 5, the first conditional
|
||||
IF (X < 0) is false, so we move to the first ELSE clause. Because X is 5 and
|
||||
Y is 10, the second conditional IF (X > Y) is false, so we move to the following
|
||||
ELSE clause. Since Y is 10, the conditional IF (Y > 0) is true, so the command
|
||||
DISPLAY ("November") is executed. The answer is (C).
|
||||
- question: 'What is the output of "abc"[::-1] in Python 3?
|
||||
|
||||
(A) Error (B) abc (C) cba (D) c'
|
||||
target: Let's think step by step. We know that the slicing operator [::-1] takes
|
||||
all of the elements in the string in reverse order, so we reverse the order
|
||||
of the string "abc", resulting in "cba". The answer is (C).
|
||||
- question: "A list of numbers has n elements, indexed from 1 to n. The following algorithm\
|
||||
\ is intended to display the number of elements in the list that have a value\
|
||||
\ greater than 100. The algorithm uses the variables count and position. Steps\
|
||||
\ 3 and 4 are missing.\n Step 1: Set count to 0 and position to 1.\n Step 2:\
|
||||
\ If the value of the element at index position is greater than 100, increase\
|
||||
\ the value of count by 1.\n Step 3: (missing step)\n Step 4: (missing step)\n\
|
||||
\ Step 5: Display the value of count.\nWhich of the following could be used\
|
||||
\ to replace steps 3 and 4 so that the algorithm works as intended?\n(A) Step\
|
||||
\ 3: Increase the value of position by 1.\n Step 4: Repeat steps 2 and 3 until\
|
||||
\ the value of count is greater than 100.\n(B) Step 3: Increase the value of\
|
||||
\ position by 1.\n Step 4: Repeat steps 2 and 3 until the value of position\
|
||||
\ is greater than n.\n(C) Step 3: Repeat step 2 until the value of count is\
|
||||
\ greater than 100.\n Step 4: Increase the value of position by 1.\n(D) Step\
|
||||
\ 3: Repeat step 2 until the value of position is greater than n.\n Step 4:\
|
||||
\ Increase the value of count by 1."
|
||||
target: 'Let''s think step by step. Choice A is incorrect, because its Step 4
|
||||
has an incorrect termination condition, stopping when count is greater than
|
||||
100. We need to stop after inspecting all elements in the list. Choice B is
|
||||
correct because it correctly increments both count and position, and correctly
|
||||
repeats these steps and terminates when all elements in the list have been inspected.
|
||||
Choice C is incorrect because it incorrectly increments the variable count until
|
||||
its value is greater than 100, regardless of the elements in the list. Choice
|
||||
D is incorrect because its step 3 does not increment the value of position,
|
||||
so it will repeat forever. The answer is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_computer_science
|
||||
@@ -0,0 +1,199 @@
|
||||
dataset_name: high_school_european_history
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school european history.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'This question refers to the following information.
|
||||
|
||||
Albeit the king''s Majesty justly and rightfully is and ought to be the supreme
|
||||
head of the Church of England, and so is recognized by the clergy of this realm
|
||||
in their convocations, yet nevertheless, for corroboration and confirmation
|
||||
thereof, and for increase of virtue in Christ''s religion within this realm
|
||||
of England, and to repress and extirpate all errors, heresies, and other enormities
|
||||
and abuses heretofore used in the same, be it enacted, by authority of this
|
||||
present Parliament, that the king, our sovereign lord, his heirs and successors,
|
||||
kings of this realm, shall be taken, accepted, and reputed the only supreme
|
||||
head in earth of the Church of England, called Anglicans Ecclesia; and shall
|
||||
have and enjoy, annexed and united to the imperial crown of this realm, as well
|
||||
the title and style thereof, as all honors, dignities, preeminences, jurisdictions,
|
||||
privileges, authorities, immunities, profits, and commodities to the said dignity
|
||||
of the supreme head of the same Church belonging and appertaining; and that
|
||||
our said sovereign lord, his heirs and successors, kings of this realm, shall
|
||||
have full power and authority from time to time to visit, repress, redress,
|
||||
record, order, correct, restrain, and amend all such errors, heresies, abuses,
|
||||
offenses, contempts, and enormities, whatsoever they be, which by any manner
|
||||
of spiritual authority or jurisdiction ought or may lawfully be reformed, repressed,
|
||||
ordered, redressed, corrected, restrained, or amended, most to the pleasure
|
||||
of Almighty God, the increase of virtue in Christ''s religion, and for the conservation
|
||||
of the peace, unity, and tranquility of this realm; any usage, foreign land,
|
||||
foreign authority, prescription, or any other thing or things to the contrary
|
||||
hereof notwithstanding.
|
||||
|
||||
English Parliament, Act of Supremacy, 1534
|
||||
|
||||
From the passage, one may infer that the English Parliament wished to argue
|
||||
that the Act of Supremacy would
|
||||
|
||||
(A) give the English king a new position of authority (B) give the position
|
||||
of head of the Church of England to Henry VIII alone and exclude his heirs (C)
|
||||
establish Calvinism as the one true theology in England (D) end various forms
|
||||
of corruption plaguing the Church in England'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on european history
|
||||
for help. The Act of Supremacy states that it grants authority to the king "to
|
||||
repress and extirpate all errors, heresies, and other enormities and abuses",
|
||||
referring to the corruption in the Church of England. The answer is (D).
|
||||
- question: "This question refers to the following information.\nRead the following\
|
||||
\ excerpt.\nThe revolutionary seed had penetrated into every country and spread\
|
||||
\ more or less. It was greatly developed under the r\xE9gime of the military\
|
||||
\ despotism of Bonaparte. His conquests displaced a number of laws, institutions,\
|
||||
\ and customs; broke through bonds sacred among all nations, strong enough to\
|
||||
\ resist time itself; which is more than can be said of certain benefits conferred\
|
||||
\ by these innovators.\nThe monarchs will fulfil the duties imposed upon them\
|
||||
\ by Him who, by entrusting them with power, has charged them to watch over\
|
||||
\ the maintenance of justice, and the rights of all, to avoid the paths of error,\
|
||||
\ and tread firmly in the way of truth. Placed beyond the passions which agitate\
|
||||
\ society, it is in days of trial chiefly that they are called upon to despoil\
|
||||
\ realities of their false appearances, and to show themselves as they are,\
|
||||
\ fathers invested with the authority belonging by right to the heads of families,\
|
||||
\ to prove that, in days of mourning, they know how to be just, wise, and therefore\
|
||||
\ strong, and that they will not abandon the people whom they ought to govern\
|
||||
\ to be the sport of factions, to error and its consequences, which must involve\
|
||||
\ the loss of society.\nUnion between the monarchs is the basis of the policy\
|
||||
\ which must now be followed to save society from total ruin. . . .\nLet them\
|
||||
\ not confound concessions made to parties with the good they ought to do for\
|
||||
\ their people, in modifying, according to their recognized needs, such branches\
|
||||
\ of the administration as require it.\nLet them be just, but strong; beneficent,\
|
||||
\ but strict.\nLet them maintain religious principles in all their purity, and\
|
||||
\ not allow the faith to be attacked and morality interpreted according to the\
|
||||
\ social contract or the visions of foolish sectarians.\nLet them suppress Secret\
|
||||
\ Societies; that gangrene of society.\n\u2014Klemens von Metternich, Political\
|
||||
\ Confession of Faith, 1820\nWhich of the following was the greatest cause of\
|
||||
\ the fears expressed by Metternich in the document above?\n(A) The ideas of\
|
||||
\ personal liberty and nationalism conceived during the Enlightenment resulted\
|
||||
\ in radical revolutions that could spread throughout Europe. (B) The conquest\
|
||||
\ of Europe by Napoleon led to the creation of new factions and shifted the\
|
||||
\ European balance of power. (C) The power of monarchs had grown to the point\
|
||||
\ where it needed to be checked by other powers within each nation or domination\
|
||||
\ of civilians would occur. (D) The rising and falling economic cycle of the\
|
||||
\ newly emerging capitalist economy could lead to civilian unrest that must\
|
||||
\ be suppressed."
|
||||
target: Let's think step by step. We refer to Wikipedia articles on european history
|
||||
for help. The fears of revolution in early 19th century Europe expressed by
|
||||
Klemens von Metternich, a conservative Austrian statesman, were a direct result
|
||||
of the age of Enlightenment, a period of European history where the absolute
|
||||
power of the monarchy was challenged with ideas of individual liberty and nationalism,
|
||||
leading to the French revolution and its effects all over Europe. The answer
|
||||
is (A).
|
||||
- question: 'This question refers to the following information.
|
||||
|
||||
The excerpts below are from the Navigation Acts of 1651.
|
||||
|
||||
[A]fter the first day of December, one thousand six hundred fifty and one, and
|
||||
from thence forwards, no goods or commodities whatsoever of the growth, production
|
||||
or manufacture of Asia, Africa or America, or of any part thereof; or of any
|
||||
islands belonging to them, or which are described or laid down in the usual
|
||||
maps or cards of those places, as well of the English plantations as others,
|
||||
shall be imported or brought into this Commonwealth of England, or into Ireland,
|
||||
or any other lands, islands, plantations, or territories to this Commonwealth
|
||||
belonging, or in their possession, in any other ship or ships, vessel or vessels
|
||||
whatsoever, but only in such as do truly and without fraud belong only to the
|
||||
people of this Commonwealth, or the plantations thereof, as the proprietors
|
||||
or right owners thereof; and whereof the master and mariners are also of the
|
||||
people of this Commonwealth, under the penalty of the forfeiture and loss of
|
||||
all the goods that shall be imported contrary to this act, , , ,
|
||||
|
||||
[N]o goods or commodities of the growth, production, or manufacture of Europe,
|
||||
or of any part thereof, shall after the first day of December, one thousand
|
||||
six hundred fifty and one, be imported or brought into this Commonwealth of
|
||||
England, or any other lands or territories to this Commonwealth belonging, or
|
||||
in their possession, in any ship or ships, vessel or vessels whatsoever, but
|
||||
in such as do truly and without fraud belong only to the people of this Commonwealth,
|
||||
and in no other, except only such foreign ships and vessels as do truly and
|
||||
properly belong to the people of that country or place, of which the said goods
|
||||
are the growth, production or manufacture.
|
||||
|
||||
Which of the following best describes the outcome of the Navigation Acts of
|
||||
1651?
|
||||
|
||||
(A) They served as a catalyst for the growth of English shipping and overseas
|
||||
trade, but did little to limit the prospects of the Dutch in the seventeenth
|
||||
century. (B) They brought about almost immediate hardships for the Dutch economy
|
||||
as their dominance of overseas trade quickly ended. (C) They were rescinded
|
||||
during the restoration of the Stuarts as they sought normal diplomatic relations
|
||||
with the Dutch so not as to need Parliament''s financial support for war. (D)
|
||||
They led to nearly a century of recurrent war between England and the Netherlands,
|
||||
which would not end until after American independence.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on european history
|
||||
for help. The Navigation Acts of 1651 helped English shipping by restricting
|
||||
the ability of ships from other European countries, especially the Dutch, to
|
||||
transport goods from colonies in Asia and Africa into England. The answer is
|
||||
(A).
|
||||
- question: "This question refers to the following information.\nIn Russia there was\
|
||||
\ nothing going on well, and [Souvarine] was in despair over the news he had\
|
||||
\ received. His old companions were all turning to the politicians; the famous\
|
||||
\ Nihilists who made Europe tremble-sons of village priests, of the lower middle\
|
||||
\ class, of tradesmen-could not rise above the idea of national liberation,\
|
||||
\ and seemed to believe that the world would be delivered-when they had killed\
|
||||
\ their despot&\u2026\n\"Foolery! They'll never get out of it with their foolery.\"\
|
||||
\nThen, lowering his voice still more, in a few bitter words he described his\
|
||||
\ old dream of fraternity. He had renounced his rank and his fortune; he had\
|
||||
\ gone among workmen, only in the hope of seeing at last the foundation of a\
|
||||
\ new society of labour in common. All the sous in his pockets had long gone\
|
||||
\ to the urchins of the settlement; he had been as tender as a brother with\
|
||||
\ the colliers, smiling at their suspicion, winning them over by his quiet workmanlike\
|
||||
\ ways and his dislike of chattering. But decidedly the fusion had not taken\
|
||||
\ place.\nHis voice changed, his eyes grew bright, he fixed them on \xE9tienne,\
|
||||
\ directly addressing him:\n\"Now, do you understand that? These hatworkers\
|
||||
\ at Marseilles who have won the great lottery prize of a hundred thousand francs\
|
||||
\ have gone off at once and invested it, declaring that they are going to live\
|
||||
\ without doing anything! Yes, that is your idea, all of you French workmen;\
|
||||
\ you want to unearth a treasure in order to devour it alone afterwards in some\
|
||||
\ lazy, selfish corner. You may cry out as much as you like against the rich,\
|
||||
\ you haven't got courage enough to give back to the poor the money that luck\
|
||||
\ brings you. You will never be worthy of happiness as long as you own anything,\
|
||||
\ and your hatred of the bourgeois proceeds solely from an angry desire to be\
|
||||
\ bourgeois yourselves in their place.\"\n\xE9mile Zola, French writer, Germinal,\
|
||||
\ 1885\nThe passage displays the direct concern for the welfare of the working\
|
||||
\ classes that was typically a part of which movement?\n(A) Capitalist (B) Scientific\
|
||||
\ (C) Communist (D) Existentialist"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on european history
|
||||
for help. The modern Communist movement aims to establish a classless society
|
||||
based on communal ownership and distribution of property and means of production,
|
||||
thereby especially benefiting the working classes. The answer is (C).
|
||||
- question: "This question refers to the following information.\nThe following excerpt\
|
||||
\ is from a pamphlet.\nYou will do me the justice to remember, that I have always\
|
||||
\ strenuously supported the Right of every man to his own opinion, however different\
|
||||
\ that opinion might be to mine. He who denies to another this right, makes\
|
||||
\ a slave of himself to his present opinion, because he precludes himself the\
|
||||
\ right of changing it.\nThe most formidable weapon against errors of every\
|
||||
\ kind is Reason. I have never used any other, and I trust I never shall.\n\
|
||||
The circumstance that has now taken place in France of the total abolition of\
|
||||
\ the whole national order of priesthood, and of everything appertaining to\
|
||||
\ compulsive systems of religion, and compulsive articles of faith, has not\
|
||||
\ only precipitated my intention, but rendered a work of this kind exceedingly\
|
||||
\ necessary, lest in the general wreck of superstition, of false systems of\
|
||||
\ government, and false theology, we lose sight of morality, of humanity, and\
|
||||
\ of the theology that is true.\nI believe in one God, and no more; and I hope\
|
||||
\ for happiness beyond this life.\nI believe in the equality of man; and I believe\
|
||||
\ that religious duties consist in doing justice, loving mercy, and endeavoring\
|
||||
\ to make our fellow-creatures happy.\nI do not believe in the creed professed\
|
||||
\ by the Jewish church, by the Roman church, by the Greek church, by the Turkish\
|
||||
\ church, by the Protestant church, nor by any church that I know of. My own\
|
||||
\ mind is my own church.\nAll national institutions of churches, whether Jewish,\
|
||||
\ Christian or Turkish, appear to me no other than human inventions, set up\
|
||||
\ to terrify and enslave mankind, and monopolize power and profit.\nI do not\
|
||||
\ mean by this declaration to condemn those who believe otherwise; they have\
|
||||
\ the same right to their belief as I have to mine.\n\u2014Thomas Paine, The\
|
||||
\ Age of Reason, 1794\u20131795\nWhich of the following Enlightenment philosophes\
|
||||
\ designed a system of checks and balances for government to avoid abuses of\
|
||||
\ power?\n(A) Jean Jacques Rousseau (B) Baron Montesquieu (C) Mary Wollstonecraft\
|
||||
\ (D) Adam Smith"
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on european
|
||||
history for help. Baron Montesquieu was a 18th centrury French philsopher who
|
||||
wrote extensively against the monoplization of power and advocated for a system
|
||||
of checks and balances in government to prevent the rise of despotism. The answer
|
||||
is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_european_history
|
||||
@@ -0,0 +1,53 @@
|
||||
dataset_name: high_school_geography
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school geography.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which one of the following items is an example of nonmaterial culture?
|
||||
|
||||
(A) Dove soap (B) Dove candy bar (C) Dove symbol (D) A dove (bird).'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on geography
|
||||
for help. Nonmaterial culture consists of cultural ideas, beliefs or symbols
|
||||
that are not physical objects. The answer is (C).
|
||||
- question: 'During the third stage of the demographic transition model, which of the
|
||||
following is true?
|
||||
|
||||
(A) Birth rates increase and population growth rate is less rapid. (B) Birth
|
||||
rates decline and population growth rate is less rapid. (C) Birth rates increase
|
||||
and population growth rate increases. (D) Birth rates decrease and population
|
||||
growth rate increases.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on geography
|
||||
for help. The demographic transition model models the five different stages
|
||||
of population growth as a country goes through economic development, where the
|
||||
third stage refers to a period of declining birth rates and lower population
|
||||
growth. The answer is (B).
|
||||
- question: 'The practice of hiring a foreign third-party service provider to run an
|
||||
operation is called
|
||||
|
||||
(A) outsourcing. (B) offshoring. (C) maquiladoras. (D) locational interdependence.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on geography
|
||||
for help. "Offshoring" literally means to move or base some of the activities
|
||||
or processes of a company to a foreign country. The answer is (B).
|
||||
- question: 'Which of the following statements is NOT accurate regarding the services
|
||||
provided by local governments in the United States?
|
||||
|
||||
(A) Duplication of efforts occurs often. (B) Social problems of the central
|
||||
city spill over into the surrounding residential suburbs. (C) Inefficiency in
|
||||
providing services occurs often. (D) One neighborhood''s efforts to reduce pollution
|
||||
are always supported by neighboring communities.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on geography
|
||||
for help. There may be economic, social or political reasons for two neighboring
|
||||
communities and their local governments not agreeing to pollution reduction
|
||||
efforts initiated by one of them. The answer is (D).
|
||||
- question: 'The rate of natural increase of a population is found by subtracting the
|
||||
|
||||
(A) crude death rate from the crude birth date. (B) crude birth rate from the
|
||||
crude death rate. (C) doubling time from the crude birth rate. (D) fertility
|
||||
rate from the crude death rate.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on geography
|
||||
for help. The difference between number of births and deaths gives the population
|
||||
increase at any given time. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_geography
|
||||
@@ -0,0 +1,61 @@
|
||||
dataset_name: high_school_government_and_politics
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school government and politics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of the following best states an argument made by James Madison in
|
||||
The Federalist number 10?
|
||||
|
||||
(A) Honest politicians can prevent factions from developing. (B) Factions are
|
||||
more likely to occur in large republics than in small ones. (C) The negative
|
||||
effects of factionalism can be reduced by a republican government. (D) Free
|
||||
elections are the people''s best defense against factionalism.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on government
|
||||
and politics for help. In the Federalist number 10, James Madison advocated
|
||||
for a representative republican form of government to guard against factionalism.
|
||||
The answer is (C).
|
||||
- question: 'The term "budget deficit" refers to the
|
||||
|
||||
(A) annual increase in federal spending on the military (B) amount of interest
|
||||
on the national debt (C) difference between the initial budget proposals made
|
||||
by the president and Congress (D) amount the government spends in excess of
|
||||
its revenues'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on government
|
||||
and politics for help. When the goverment spends more than it earns, their difference
|
||||
is the budget deficit. The answer is (D).
|
||||
- question: 'Which of the following statements about cabinet departments is FALSE?
|
||||
|
||||
(A) They are established by the legislative branch. (B) Their members often
|
||||
don''t have much influence over presidential decisions. (C) They cannot all
|
||||
be run by leaders who belong to the same political party the president does.
|
||||
(D) Not every federal agency is a cabinet department.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on government
|
||||
and politics for help. There is no law stipulating that some cabinet department
|
||||
leaders have to belong to a political party different from that of the president.
|
||||
The answer is (C).
|
||||
- question: 'Which of the following cases established the precedent that a defendant
|
||||
must be informed of the right to remain silent, the right to a lawyer, and protection
|
||||
from self-incrimination?
|
||||
|
||||
(A) Weeks v. United States (B) Betts v. Brady (C) Mapp v. Ohio (D) Miranda v.
|
||||
Arizona'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on government
|
||||
and politics for help. In the landmark Miranda v. Arizona in 1966, the US Supreme
|
||||
Court, based on the Fifth and Sixth Amendment of the US Constitution, guaranteed
|
||||
a defendant's right to an attorney and protection from self-incrimination. The
|
||||
answer is (D).
|
||||
- question: 'Uncertainty over the limits to presidential power is caused primarily
|
||||
by the fact that
|
||||
|
||||
(A) the constitutional definition of those powers is broad and unspecific (B)
|
||||
most people agree that the Constitution places too many limits on presidential
|
||||
power (C) the Supreme Court consistently refuses to rule on cases concerning
|
||||
presidential powers (D) constitutional amendments have greatly increased presidential
|
||||
powers'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on government
|
||||
and politics for help. The US Constitution is not very specific about the powers
|
||||
of the president, leading to uncertainty over its limits. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_government_and_politics
|
||||
@@ -0,0 +1,53 @@
|
||||
dataset_name: high_school_macroeconomics
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school macroeconomics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of the following policies best describes supply-side fiscal policy?
|
||||
|
||||
(A) An increase in the money supply (B) Increased government spending (C) Lower
|
||||
taxes on research and development of new technology (D) Higher taxes on household
|
||||
income'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on macroeconomics
|
||||
for help. Supply-side fiscal policy stimulates the economy by encouraging more
|
||||
production of goods and services through reduction in taxes and deregulation.
|
||||
The answer is (C).
|
||||
- question: 'The short-run Phillips curve indicates a
|
||||
|
||||
(A) direct relation between unemployment and inflation (B) direct relation between
|
||||
price and quantity demanded (C) inverse relation between price and quantity
|
||||
demanded (D) inverse relation between unemployment and inflation'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on macroeconomics
|
||||
for help. The short-run Phillips curve shows that whenever unemployment decreases
|
||||
below a natural level, the inflation starts increasing, and vice-versa. The
|
||||
answer is (D).
|
||||
- question: 'Holding all else equal which of the following monetary policies would
|
||||
be used to boost U.S. exports?
|
||||
|
||||
(A) Increasing the discount rate (B) Increasing the reserve ratio (C) Buying
|
||||
government securities (D) Lowering tariffs'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on macroeconomics
|
||||
for help. Buying government securities leads to reduction in demand for US dollars
|
||||
from foreign buyers, thereby making it cheaper and hence making US exports more
|
||||
attractive. The answer is (C).
|
||||
- question: 'A federal deficit occurs when
|
||||
|
||||
(A) exports exceed imports. (B) imports exceed exports. (C) federal tax collections
|
||||
exceed spending. (D) federal spending exceeds federal tax revenues.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on macroeconomics
|
||||
for help. A federal deficit occurs when federal spending exceeds federal income
|
||||
which is primarily from tax revenues. The answer is (D).
|
||||
- question: 'Which of the following is not included in the U.S. GDP?
|
||||
|
||||
(A) The U.S. military opens a new base in a foreign country with 1000 U.S. personnel.
|
||||
(B) Japanese consumers buy thousands of CDs produced in the United States. (C)
|
||||
An American pop singer performs a sold-out concert in Paris. (D) A French theatrical
|
||||
production tours dozens of American cities.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on macroeconomics
|
||||
for help. The economic transactions related to the performance of the American
|
||||
pop-singer in Paris happens entirely outside the U.S. and hence is not included
|
||||
in the GDP numbers. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_macroeconomics
|
||||
@@ -0,0 +1,51 @@
|
||||
dataset_name: high_school_mathematics
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school mathematics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Simplify and write the result with a rational denominator: $$\sqrt{\sqrt[3]{\sqrt{\frac{1}{729}}}}$$
|
||||
|
||||
(A) \frac{3\sqrt{3}}{3} (B) \frac{1}{3} (C) \sqrt{3} (D) \frac{\sqrt{3}}{3}'
|
||||
target: Let's think step by step. Factoring $729=3^6$ and combining the roots
|
||||
$\frac{1}{2}\frac{1}{3}\frac{1}{2}=\frac{1}{12}$, we get that $\sqrt{\sqrt[3]{\sqrt{\frac{1}{729}}}}=\left(\frac{1}{3^6}\right)^{\frac{1}{12}}=\frac{1}{3^{\frac{1}{2}}}=\frac{3}{\sqrt{3}}$
|
||||
The answer is (D).
|
||||
- question: 'Five thousand dollars compounded annually at an $x\%$ interest rate takes
|
||||
six years to double. At the same interest rate, how many years will it take
|
||||
$\$300$ to grow to $\$9600$?
|
||||
|
||||
(A) 12 (B) 1 (C) 30 (D) 5'
|
||||
target: Let's think step by step. To go from $\$300$ to $\$9600$, the value must
|
||||
go up by a factor of $9600/300=32=2^5$. Since at this interest rate it takes
|
||||
six years for it to double, it will take $5*6=30$ years to grow to $\$9600$.
|
||||
The answer is (C).
|
||||
- question: "Ten students take a biology test and receive the following scores: 45,\
|
||||
\ 55, 50, 70, 65, 80, 40, 90, 70, 85. What is the mean of the students\u2019\
|
||||
\ test scores?\n(A) 55 (B) 60 (C) 62 (D) 65"
|
||||
target: Let's think step by step. There are 10 students and the sum of their scores
|
||||
is $45 + 55 + 50 + 70 + 65 + 80 + 40 + 90 + 70 + 85 = 650$, the mean is $650/10=65$.
|
||||
The answer is (D).
|
||||
- question: 'The variable $x$ varies directly as the square of $y$, and $y$ varies
|
||||
directly as the cube of $z$. If $x$ equals $-16$ when $z$ equals 2, what is
|
||||
the value of $x$ when $z$ equals $\frac{1}{2}$?
|
||||
|
||||
(A) -1 (B) 16 (C) -\frac{1}{256} (D) \frac{1}{16}'
|
||||
target: Let's think step by step. We know that $x \propto y^2$ and $y \propto
|
||||
z^3$, so $x = k z^6$ for some constant $k$. Plugging in for $x=-16$ and $z=2$,
|
||||
the constant value is $k=\frac{x}{z^6}=\frac{-16}{64}=-\frac{1}{4}$. So, when
|
||||
$z=\frac{1}{2}$, the value of $x$ is $x=kz^6=-\frac{1}{4}\frac{1}{2^6}=-\frac{1}{256}$.
|
||||
The answer is (C).
|
||||
- question: 'Joe was in charge of lights for a dance. The red light blinks every two
|
||||
seconds, the yellow light every three seconds, and the blue light every five
|
||||
seconds. If we include the very beginning and very end of the dance, how many
|
||||
times during a seven minute dance will all the lights come on at the same time?
|
||||
(Assume that all three lights blink simultaneously at the very beginning of
|
||||
the dance.)
|
||||
|
||||
(A) 3 (B) 15 (C) 6 (D) 5'
|
||||
target: 'Let''s think step by step. The least common multiple of 2, 3 and 5 is
|
||||
30, so during a 7 minute dance, all the three lights will come on at the same
|
||||
time $2*7+1=15$ times. The answer is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_mathematics
|
||||
@@ -0,0 +1,56 @@
|
||||
dataset_name: high_school_microeconomics
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school microeconomics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of the following is necessarily a characteristic of oligopoly?
|
||||
|
||||
(A) Free entry into and exit from the market (B) A few large producers (C) One
|
||||
producer of a good with no close substitutes (D) A homogenous product'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on microeconomics
|
||||
for help. An oligopoly is when a market is dominated by just one or a few number
|
||||
of sellers or producers. To get oligopoly, the market should have high barriers
|
||||
to new entry, and the product has differentiation. The answer is (B).
|
||||
- question: 'If the government subsidizes producers in a perfectly competitive market,
|
||||
then
|
||||
|
||||
(A) the demand for the product will increase (B) the demand for the product
|
||||
will decrease (C) the consumer surplus will increase (D) the consumer surplus
|
||||
will decrease'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on microeconomics
|
||||
for help. (A) and (B) are wrong because the demand curve does not change at
|
||||
all. If the government subsidizes producers, the supply will increase, and thus
|
||||
the consumer surplus also increases. The answer is (C).
|
||||
- question: 'Which of the following is true of a price floor?
|
||||
|
||||
(A) The price floor shifts the demand curve to the left. (B) An effective floor
|
||||
creates a shortage of the good. (C) The price floor shifts the supply curve
|
||||
of the good to the right. (D) To be an effective floor, it must be set above
|
||||
the equilibrium price.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on microeconomics
|
||||
for help. Price floor does not shift the demand or shift curve. An effective
|
||||
price floor should be set above the equilibrium price, otherwise the market
|
||||
bears and the floor does not have effective effect. The answer is (D).
|
||||
- question: 'The concentration ratio for a monopoly is
|
||||
|
||||
(A) 0 (B) 5 (C) 10 (D) 100'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on microeconomics
|
||||
for help. The concentration ratio is calculated as the sum of market share of
|
||||
a specific number of largest companies. Monopoly means one company or entity
|
||||
controls the entire market, therefore, the concentration ratio is 100 percent.
|
||||
The answer is (D).
|
||||
- question: 'In a competitive labor market for housepainters, which of the following
|
||||
would increase the demand for housepainters?
|
||||
|
||||
(A) An effective minimum wage imposed on this labor market. (B) An increase
|
||||
in the price of gallons of paint. (C) An increase in the construction of new
|
||||
houses. (D) An increase in the price of mechanical painters so long as the output
|
||||
effect exceeds the substitution effect.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on microeconomics
|
||||
for help. An increase in the construction of new houses means an increase demand
|
||||
of in-house painting, thus increases the demand for housepainters. The answer
|
||||
is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_microeconomics
|
||||
@@ -0,0 +1,50 @@
|
||||
dataset_name: high_school_physics
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school physics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'A microwave oven is connected to an outlet, 120 V, and draws a current
|
||||
of 2 amps. At what rate is energy being used by the microwave oven?
|
||||
|
||||
(A) 10 W (B) 30 W (C) 60 W (D) 240 W'
|
||||
target: Let's think step by step. Rate of energy usage is known as power; in an
|
||||
dissipative electrical circuit, power is given by voltage times current. So
|
||||
in our case, the power is 120 V times 2 amps, or 240 W. The answer is (D).
|
||||
- question: "A point charge, Q = +1 mC, is fixed at the origin. How much work is required\
|
||||
\ to move a charge, Q = +8 \xB5C, from the point (0, 4 meters) to the point\
|
||||
\ (3 meters, 0)?\n(A) 3.5 J (B) 6.0 J (C) 22.5 J (D) 40 J"
|
||||
target: "Let's think step by step. To calculate the work required to move a charge\
|
||||
\ from one location to another in a fixed electric field, it is enough to calculate\
|
||||
\ the potential difference between the two locations. Here, the potential only\
|
||||
\ depends on the distance between the charges; it\u2019s $k q_1 q_2 / r$, where\
|
||||
\ $k$ is Coulomb\u2019s constant. Plugging in values $q_1 = $ 1 mC, $q_2 = 8\
|
||||
\ \\mu$ C, gives the answer as 5.992 J, which rounds to 6 J. The answer is (B)."
|
||||
- question: 'Which of the following conditions will ensure that angular momentum is
|
||||
conserved? I. Conservation of linear momentum II. Zero net external force III.
|
||||
Zero net external torque
|
||||
|
||||
(A) I and II only (B) I and III only (C) II and III only (D) III only'
|
||||
target: Let's think step by step. Torque is defined as the change in angular momentum;
|
||||
if there is zero external torque, angular momentum is conserved. The answer
|
||||
is (D).
|
||||
- question: "A photocell of work function \u03D5 = 2eV is connected to a resistor in\
|
||||
\ series. Light of frequency f = 1 \xD7 10^15 Hz hits a metal plate of the photocell.\
|
||||
\ If the power of the light is P = 100 W, what is the current through the resistor?\n\
|
||||
(A) 2:00 AM (B) 6:00 AM (C) 12:00 AM (D) 24 A"
|
||||
target: Let's think step by step. The only answer above which has units of current
|
||||
is D, 24 A. The answer is (D).
|
||||
- question: "A pipe full of air is closed at one end. A standing wave is produced in\
|
||||
\ the pipe, causing the pipe to sound a note. Which of the following is a correct\
|
||||
\ statement about the wave\u2019s properties at the closed end of the pipe?\n\
|
||||
(A) The pressure is at a node, but the particle displacement is at an antinode.\
|
||||
\ (B) The pressure is at an antinode, but the particle displacement is at a\
|
||||
\ node. (C) The pressure and the particle displacement are both at nodes. (D)\
|
||||
\ The pressure and the particle displacement are both at antinodes."
|
||||
target: 'Let''s think step by step. At the closed end of the pipe, the particles
|
||||
cannot have any net displacement because the pipe closure stops them. So the
|
||||
particle displacement is at a node. This closure also causes the pressure to
|
||||
be maximal, i.e. an antinode. The answer is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_physics
|
||||
@@ -0,0 +1,64 @@
|
||||
dataset_name: high_school_psychology
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school psychology.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Pascale is interested in the processing strategies children use to learn
|
||||
new information. Pascale would best be classified as what type of psychologist?
|
||||
|
||||
(A) sociocultural (B) clinical (C) cognitive (D) behaviorist'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. Sociocultural psychologist focuses on the effect of societal factors
|
||||
on people. Clinical psychologist focuses on people with mental issues. Cognitive
|
||||
psychologist focuses on how people think and learn, including the processing
|
||||
strategies. Behaviorist focuses more on the environment and experience effect
|
||||
on people. The answer is (C).
|
||||
- question: 'According to Caplan''s model of consultee-centered case consultation,
|
||||
the consultant is primarily interested in
|
||||
|
||||
(A) identifying the causes and solutions of the client''s presenting problems
|
||||
(B) identifying and eliminating the causes of the consultee''s difficulties
|
||||
in handling a problem (C) establishing a hierarchy of authority to enable effective
|
||||
decision making (D) presenting a single, well-defined and unambiguous course
|
||||
of action for the consultant to overcome skills deficits'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. Caplan defines two type of consultation. Client-centered case consultation
|
||||
aims to handle client's problems, while consultee-centered case consultation
|
||||
aims to identify the reason of client's difficulty to solve problems. The answer
|
||||
is (B).
|
||||
- question: 'According to the Individuals with Disabilities Education Improvement Act,
|
||||
which of the following must an educational agency do before it changes the educational
|
||||
placement of a student with a disability?
|
||||
|
||||
(A) Give the child a trial period in the new environment (B) Notify the parents
|
||||
in writing (C) Obtain school board approval (D) Obtain parental consent'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. When the decision to change the educational placement of a student
|
||||
with a disability is made, the educational agency must notify the parents in
|
||||
writing on that date. The answer is (B).
|
||||
- question: 'While swimming in the ocean, Ivan is frightened by a dark shadow in the
|
||||
water even before he has the chance to identify what the shadow is. The synaptic
|
||||
connections taking place during this incident of fright are best described by
|
||||
which of the following?
|
||||
|
||||
(A) Messages are sent from the thalamus directly to the amygdala. (B) Messages
|
||||
are sent from the thalamus to the "what" and "where" pathways. (C) Messages
|
||||
are sent from the parasympathetic nervous system to the cerebral cortex. (D)
|
||||
Messages are sent from the frontal lobes to the pituitary gland.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. Our neural system has a mechanism that can respond immediate emotional
|
||||
signal before going to the thought center. In the Ivan's case, messages travel
|
||||
directly from thalamus to amygdala. The answer is (A).
|
||||
- question: 'Ani believes that her attitudes and behavior play a central role in what
|
||||
happens to her. Such a belief is likely to be associated with
|
||||
|
||||
(A) a strong superego. (B) low self-esteem. (C) low self-efficacy. (D) an internal
|
||||
locus of control.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. People with an external locus of control believes fate and luck play
|
||||
an important role in their lives, while people with an internal locus of control
|
||||
believes they control their lives. The answer is (D).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_psychology
|
||||
@@ -0,0 +1,81 @@
|
||||
dataset_name: high_school_statistics
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school statistics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'A new smartwatch is manufactured in one part of a factory, then secured
|
||||
for shipping in another, independent part of the factory. The weight of the
|
||||
smartwatch has a mean of 62 grams and a standard deviation of 1.0 grams. The
|
||||
weight of the packaging (box, user''s guide, bubble wrap, etc.) has a mean of
|
||||
456 grams and a standard deviation of 6 grams. Together, the distribution of
|
||||
the weight of the smartwatch and its packaging would have the following mean
|
||||
and standard deviation:
|
||||
|
||||
(A) Mean 518 grams; standard deviation 7.0 grams (B) Mean 518 grams; standard
|
||||
deviation 3.5 grams (C) Mean 518 grams; standard deviation 6.1 grams (D) Mean
|
||||
394 grams; standard deviation 6.1 grams'
|
||||
target: Let's think step by step. Since the weight of the watch and the weight
|
||||
of the packaging are independent random variables, the mean and variance of
|
||||
their sum is equal to the sum of their individual means and variances. So the
|
||||
mean is 62 + 456 = 518 grams, and the variances is 1.0^2 + 6.0^2 = 37, leading
|
||||
to a standard deviation of 6.1 grams. The answer is (C).
|
||||
- question: 'After a frost warning was issued, the owner of a large orange grove asked
|
||||
his workers to spray all his trees with water. The water was supposed to freeze
|
||||
and form a protective covering of ice around the orange blossom. Nevertheless,
|
||||
the owner suspected that some trees suffered considerable damage due to the
|
||||
frost. To estimate the proportion of trees that suffered more than 50 percent
|
||||
damage due to the frost, he took a random sample of 100 trees from his grove.
|
||||
What is the response variable in this experiment?
|
||||
|
||||
(A) The proportion of trees that suffered more than 50 percent damage due to
|
||||
frost. (B) The number of trees affected by the frost. (C) The number of trees
|
||||
sampled from the grove. (D) For each sampled tree, whether it suffered more
|
||||
than 50 percent damage or at most 50 percent damage.'
|
||||
target: Let's think step by step. In this experiment, the response variable is
|
||||
what is measured. For each tree, what is measured is whether or not it suffered
|
||||
more than 50 percent damage due to the frost. The answer is (D).
|
||||
- question: 'Suppose X and Y are random variables with E(X) = 37, var(X) = 5, E(Y)
|
||||
= 62, and var(Y) = 12. What are the expected value and variance of the random
|
||||
variable X + Y?
|
||||
|
||||
(A) E(X + Y) = 99, var(X + Y) = 8.5 (B) E(X + Y) = 99, var(X + Y) = 13 (C) E(X
|
||||
+ Y) = 99, var(X + Y) = 17 (D) There is insufficient information to answer this
|
||||
question.'
|
||||
target: Let's think step by step. While means of sums of random variables add
|
||||
(regardless of whether the variables are independent) in order to determine
|
||||
the variance of a sum of random variables, we need to know not just their individual
|
||||
variances but the covariance of the two variables, which is not given in this
|
||||
problem. The answer is (D).
|
||||
- question: 'Which of the following sets has the smallest standard deviation? Which
|
||||
has the largest?
|
||||
|
||||
I: {1,2,3}
|
||||
|
||||
II: {-10,10}
|
||||
|
||||
III: {100}
|
||||
|
||||
(A) I, II (B) II, III (C) III, I (D) III, II'
|
||||
target: Let's think step by step. The variance of distribution I is the expected
|
||||
squared deviation from its mean (which is 2), so the variance is 2/3 . The variance
|
||||
of distribution II is 10^2 (because both elements are 10 away from the mean
|
||||
of zero). The variance of distribution III is 0, since it has a single entry.
|
||||
So distribution III has the smallest standard deviation and distribution II
|
||||
has the largest. The answer is (D).
|
||||
- question: 'Which of the following is a correct statement about correlation?
|
||||
|
||||
(A) If the slope of the regression line is exactly 1, then the correlation is
|
||||
exactly 1. (B) If the correlation is 0, then the slope of the regression line
|
||||
is undefined. (C) Switching which variable is called x and which is called y
|
||||
changes the sign of the correlation. (D) The correlation r is equal to the slope
|
||||
of the regression line when z-scores for the y-variable are plotted against
|
||||
z-scores for the x-variable.'
|
||||
target: 'Let''s think step by step. Statement A is false because the slope of
|
||||
the regression line being exactly 1 can occur even when the two variables are
|
||||
not perfectly correlated. Statement B is false because uncorrelated variables
|
||||
regression lines can have slope zero. Statement C is false because correlation
|
||||
is symmetric in the two random variables. The answer is (D).'
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_statistics
|
||||
@@ -0,0 +1,156 @@
|
||||
dataset_name: high_school_us_history
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school us history.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: "This question refers to the following information.\nI come not to urge\
|
||||
\ personal claims, nor to seek individual benefits; I appear as the advocate\
|
||||
\ of those who cannot plead their own cause; I come as the friend of those who\
|
||||
\ are deserted, oppressed, and desolate. In the Providence of God, I am the\
|
||||
\ voice of the maniac whose piercing cries from the dreary dungeons of your\
|
||||
\ jails penetrate not your Halls of Legislation. I am the Hope of the poor crazed\
|
||||
\ beings who pine in the cells, and stalls, and cages, and waste rooms of your\
|
||||
\ poor-houses. I am the Revelation of hundreds of wailing, suffering creatures,\
|
||||
\ hidden in your private dwellings, and in pens and cabins\u2014shut out, cut\
|
||||
\ off from all healing influences, from all mind-restoring cares.\u2026 Could\
|
||||
\ their melancholy histories be spread before you as revealed to my grieved\
|
||||
\ spirit during the last three months, how promptly, how earnestly would you\
|
||||
\ search out the most approved means of relief; how trifling, how insignificant,\
|
||||
\ by comparison, would appear the sacrifices you are asked to make; how would\
|
||||
\ a few dimes and dollars, gathered from each citizen, diminish in value as\
|
||||
\ a possession, compared with the certain benefits and vast good to be secured\
|
||||
\ for the suffering insane...by the consecration and application of a sufficient\
|
||||
\ fund to the construction of a suitable hospital.\u2026\n\u2014Dorothea Dix,\
|
||||
\ Memorial Soliciting a State Hospital for the Protection and Cure of the Insane,\n\
|
||||
Submitted to the General Assembly of North Carolina, November 1848\nDorothea\
|
||||
\ Dix can best be compared to whom?\n(A) Abigail Adams (B) Clara Barton (C)\
|
||||
\ Shirley Temple (D) Hillary Clinton"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on us history
|
||||
for help. Both Dorothea Dix and Clara barton are American nurses. The answer
|
||||
is (B).
|
||||
- question: "This question refers to the following information.\n\"As our late Conduct\
|
||||
\ at the Conestoga Manor and Lancaster have occasioned much Speculation & a\
|
||||
\ great diversity of Sentiments in this and neighboring Governments; some vindicating\
|
||||
\ & others condemning it; some charitably alleviating the Crime, & others maliciously\
|
||||
\ painting it in the most odious & detestable Colours, we think it our duty\
|
||||
\ to lay before the Publick, the whole Matter as it appeared, & still appears,\
|
||||
\ to us. . . .\n\"If these things are not sufficient to prove an unjustifiable\
|
||||
\ Attachment in the Quakers to the Indians Savages, a fixed Resolution to befriend\
|
||||
\ them & an utter insensibility to human Distresses, let us consider a few more\
|
||||
\ recent Facts. When we found the last Summer that we were likely to get no\
|
||||
\ Assistance from the Government, some Volunteers went out at our own Expense,\
|
||||
\ determined to drive our Enemies from our Borders; & when we came near to the\
|
||||
\ great Island, we understood that a Number of their Warriors had gone out against\
|
||||
\ our Frontiers. Upon this we returned and came up with them and fought with\
|
||||
\ them at the Munfey Hill where we lost some of our Men & killed some of their\
|
||||
\ Warriors & thereby saved our Frontiers from this Story in another Expedition.\
|
||||
\ But no sooner had we destroyed their Provisions on the great Island, & ruined\
|
||||
\ their trade with the good People at Bethlehem, but these very Indians, who\
|
||||
\ were justly suspected of having murdered our Friends in Northampton County,\
|
||||
\ were by the Influence of some Quakers taken under the Protection of the Government\
|
||||
\ to screen them from the Resentments of the Friends and Relations of the Murdered,\
|
||||
\ & to support them thro the Winter.\"\n\u2014\"Apology of the Paxton Boys\"\
|
||||
\ (pamphlet), 1764 (Note: \"apology\" in this context should be read as an explanation,\
|
||||
\ not an admission of guilt or regret.\nThe sentiments expressed in the explanation\
|
||||
\ above reflect which of the ongoing tensions during the colonial period of\
|
||||
\ American history?\n(A) Tensions between British policies and the aspirations\
|
||||
\ of North American colonists. (B) Tensions between American Indians allied\
|
||||
\ with the French and those allied with the British. (C) Tensions between freed\
|
||||
\ African Americans and white planters. (D) Tensions between backcountry settlers\
|
||||
\ and elites within colonial America."
|
||||
target: Let's think step by step. We refer to Wikipedia articles on us history
|
||||
for help. After the French and Indian War, the Scotch-Irish settlers attacked
|
||||
American Indians. After the attacks on the Conestoga, about 250 Paxton Boys
|
||||
present their grievances to the Pennsylvania legislature. As mentioned in the
|
||||
information, the Paxton Boys cited resentiment at local elites. The answer is
|
||||
(D).
|
||||
- question: "This question refers to the following information.\nOur leaders talk about\
|
||||
\ stopping aggression from the north, but this was a struggle among groups of\
|
||||
\ Vietnamese until we intervened. We seem bent upon saving the Vietnamese from\
|
||||
\ Ho Chi Minh even if we have to kill them and demolish their country to do\
|
||||
\ it. As the native people survey bombed-out villages, women and children burned\
|
||||
\ by napalm, rice crops destroyed and cities overrun with our military personnel,\
|
||||
\ they are doubtless saying secretly of the Vietcong guerillas and of the American\
|
||||
\ forces, \"A plague on both your houses.\" \u2026 Stop the bombing, north and\
|
||||
\ south, end search and destroy offensive sweeps, and confine our military action\
|
||||
\ to holding operations on the ground. Bombing the north has failed to halt\
|
||||
\ or seriously check the flow of troops to the south and may, in fact, have\
|
||||
\ prompted a much greater war effort by Hanoi.\n\u2014Senator George McGovern,\
|
||||
\ \"The Lessons of Vietnam,\" April 25, 1967\nWhich of the following opinions\
|
||||
\ from the 1960s most directly reflects the perspective of George McGovern's\
|
||||
\ speech?\n(A) Americans must maximize their technological edge in Vietnam.\
|
||||
\ (B) American bombing in Vietnam is step by step leading to progress in the\
|
||||
\ war. (C) American bombing in Vietnam is a failure. (D) America must not give\
|
||||
\ in to defeatism about the war in Vietnam."
|
||||
target: Let's think step by step. We refer to Wikipedia articles on us history
|
||||
for help. "Stop the bombing" and "Bombing the north has failed to halt or seriously
|
||||
check the flow of troops to the south" indicate that the perspective of George
|
||||
McGovern's speech is that Amerian bombing in Vietnam is a failure. The answer
|
||||
is (C).
|
||||
- question: "This question refers to the following information.\n\"In the new Code\
|
||||
\ of Laws which I suppose it will be necessary for you to make I desire you\
|
||||
\ would Remember the Ladies, and be more generous and favorable to them than\
|
||||
\ your ancestors. Do not put such unlimited power into the hands of the Husbands.\
|
||||
\ Remember all Men would be tyrants if they could. If particular care and attention\
|
||||
\ is not paid to the Ladies we are determined to foment a Rebellion, and will\
|
||||
\ not hold ourselves bound by any Laws in which we have no voice, or Representation.\"\
|
||||
\nAbigail Adams, in a letter to John Adams, 1776\n\"Special legislation for\
|
||||
\ woman has placed us in a most anomalous position. Women invested with the\
|
||||
\ rights of citizens in one section\u2014voters, jurors, office-holders\u2014\
|
||||
crossing an imaginary line, are subjects in the next. In some States, a married\
|
||||
\ woman may hold property and transact business in her own name; in others,\
|
||||
\ her earnings belong to her husband. In some States, a woman may testify against\
|
||||
\ her husband, sue and be sued in the courts; in others, she has no redress\
|
||||
\ in case of damage to person, property, or character. In case of divorce on\
|
||||
\ account of adultery in the husband, the innocent wife is held to possess no\
|
||||
\ right to children or property, unless by special decree of the court. But\
|
||||
\ in no State of the Union has the wife the right to her own person, or to any\
|
||||
\ part of the joint earnings of the co-partnership during the life of her husband.\
|
||||
\ In some States women may enter the law schools and practice in the courts;\
|
||||
\ in others they are forbidden. In some universities girls enjoy equal educational\
|
||||
\ advantages with boys, while many of the proudest institutions in the land\
|
||||
\ deny them admittance, though the sons of China, Japan and Africa are welcomed\
|
||||
\ there. But the privileges already granted in the several States are by no\
|
||||
\ means secure.\"\nSusan B. Anthony, \"Declaration of Rights for Women,\" July\
|
||||
\ 4, 1876\nThe sentiments expressed in the second excerpt by Susan B. Anthony\
|
||||
\ are most likely in support of\n(A) the Equal Rights Amendment (B) universal\
|
||||
\ suffrage (C) states' rights (D) prohibition"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on us history
|
||||
for help. The above information mentioned that women are in an anomalous position
|
||||
in terms of legislation. Women's earnings do not belong to themselves, or they
|
||||
cannot testify against her husbands. Susan believes women should have equal
|
||||
legal rights as men. The answer is (B).
|
||||
- question: 'This question refers to the following information.
|
||||
|
||||
"Society in every state is a blessing, but government even in its best state
|
||||
is but a necessary evil; in its worst state an intolerable one; for when we
|
||||
suffer, or are exposed to the same miseries by a government, which we might
|
||||
expect in a country without government, our calamity is heightened by reflecting
|
||||
that we furnish the means by which we suffer. Government, like dress, is the
|
||||
badge of lost innocence; the palaces of kings are built on the ruins of the
|
||||
bowers of paradise. For were the impulses of conscience clear, uniform, and
|
||||
irresistibly obeyed, man would need no other lawgiver; but that not being the
|
||||
case, he finds it necessary to surrender up a part of his property to furnish
|
||||
means for the protection of the rest; and this he is induced to do by the same
|
||||
prudence which in every other case advises him out of two evils to choose the
|
||||
least. Wherefore, security being the true design and end of government, it unanswerably
|
||||
follows that whatever form thereof appears most likely to ensure it to us, with
|
||||
the least expense and greatest benefit, is preferable to all others."
|
||||
|
||||
Thomas Paine, Common Sense, 1776
|
||||
|
||||
Which of the following "miseries" alluded to above were most condemned by Anti-Federalists
|
||||
of the post-Revolutionary era?
|
||||
|
||||
(A) Organized response to Bacon''s Rebellion (B) Federal response to Shays''s
|
||||
Rebellion (C) Federal response to the Whiskey Rebellion (D) Federal response
|
||||
to Pontiac''s Rebellion'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on us history
|
||||
for help. Anti-Federalists do not believe centralized government power, and
|
||||
suspect Washington''s military response to Whiskey Rebellion. Bacon''s Rebellion
|
||||
and Pontiac''s Rebellion happen before the Revolution and they can be ruled
|
||||
out. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_us_history
|
||||
@@ -0,0 +1,100 @@
|
||||
dataset_name: high_school_world_history
|
||||
description: The following are multiple choice questions (with answers) about high
|
||||
school world history.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: "This question refers to the following information.\n\"At least one of\
|
||||
\ the [world's] societies would have to somehow enormously increase its productivity\
|
||||
\ [in order to achieve global hegemony]. That quantum jump would have to be\
|
||||
\ made before the various scientific, technological, agricultural, and industrial\
|
||||
\ revolutions on which our post-quantum-leap world rests. It could only be accomplished\
|
||||
\ by exploiting the ecosystems, mineral resources, and human assets of whole\
|
||||
\ continents outside the lands of the society making the jump. Western Europe\
|
||||
\ did just that by means of its brutality and guns and, more important, by geographical\
|
||||
\ and ecological luck.\"\nCopyright \xA9 2015 Cambridge University Press.\n\
|
||||
Alfred Crosby, historian, Ecological Imperialism, 2004\nThe \"quantum jump\"\
|
||||
\ mentioned in the passage most directly contributed to which of the following\
|
||||
\ developments in the period 1450\u20131750 C.E.?\n(A) A breakdown in trade\
|
||||
\ routes through the collapse of the established state structure (B) An increase\
|
||||
\ in the population of the world through more plentiful supplies of food (C)\
|
||||
\ The spread of Chinese and Indian belief systems across the world (D) An increase\
|
||||
\ in social unrest"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on world history
|
||||
for help. The "quantum jump" mentioned in the passage refers to the conquest
|
||||
of the New World and the Columbian Exchange. Choice (A) and (C) did not happen
|
||||
in history. Choice (C) refers to the human assets. The answer is (B).
|
||||
- question: "This question refers to the following information.\n\"The struggle against\
|
||||
\ neo-colonialism is not aimed at excluding the capital of the developed world\
|
||||
\ from operating in less developed countries. It is aimed at preventing the\
|
||||
\ financial power of the developed countries being used in such a way as to\
|
||||
\ impoverish the less developed.\nNon-alignment, as practiced by Ghana and many\
|
||||
\ other countries, is based on co-operation with all States whether they be\
|
||||
\ capitalist, socialist or have a mixed economy. Such a policy, therefore, involves\
|
||||
\ foreign investment from capitalist countries, but it must be invested in accordance\
|
||||
\ with a national plan drawn up by the government of the non-aligned State with\
|
||||
\ its own interests in mind. The issue is not what return the foreign investor\
|
||||
\ receives on his investments\u2026The question is one of power. A State in\
|
||||
\ the grip of neo-colonialism is not master of its own destiny.\"\nKwame Nkrumah,\
|
||||
\ Neo-Colonialism, 1965\nWhich of the following provides the best context for\
|
||||
\ Nkrumah's writings?\n(A) The Industrial Revolution (B) Decolonization (C)\
|
||||
\ Regional Free Trade Associations (D) Autarky"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on world history
|
||||
for help. The passage expresses a point that the successful fight against neo-colonialism
|
||||
were in danger and the newly independent nations like Ghana may be re-colonized
|
||||
via financial power of the developed countries. The answer is (B).
|
||||
- question: "This question refers to the following information.\n\"Indeed, as both\
|
||||
\ the fatwas of distinguished [scholars] who base their opinion on reason and\
|
||||
\ tradition alike and the consensus of the Sunni community agree that the ancient\
|
||||
\ obligation of extirpation, extermination, and expulsion of evil innovation\
|
||||
\ must be the aim of our exalted aspiration, for \"Religious zeal is a victory\
|
||||
\ for the Faith of God the Beneficent\"; then, in accordance with the words\
|
||||
\ of the Prophet (Peace upon him!) \"Whosoever introduces evil innovation into\
|
||||
\ our order must be expelled\" and \"Whosoever does aught against our order\
|
||||
\ must be expelled,\" action has become necessary and exigent\u2026\"\nLetter\
|
||||
\ from Ottoman Sultan Selim I to Safavid Shah Ismail I, 1514\nThe letter from\
|
||||
\ Selim I is most clearly an example of which of the following?\n(A) The maintenance\
|
||||
\ of military supremacy at all costs (B) Expanding tensions between religious\
|
||||
\ sects (C) Factors that brought about the collapse of the Ottoman Empire (D)\
|
||||
\ Peacemaking efforts among the Islamic empires"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on world history
|
||||
for help. The passage is an example of expanding tensions between Selim and
|
||||
Ismail. In the passage the Selim references the fatwa and the consensus of the
|
||||
Sunni community to against whosoever introduces evil. The answer is (B).
|
||||
- question: 'This question refers to the following information.
|
||||
|
||||
"The real grievance of the worker is the insecurity of his existence; he is
|
||||
not sure that he will always have work, he is not sure that he will always be
|
||||
healthy, and he foresees that he will one day be old and unfit to work. If he
|
||||
falls into poverty, even if only through a prolonged illness, he is then completely
|
||||
helpless, exam_ins to his own devices, and society does not currently recognize
|
||||
any real obligation towards him beyond the usual help for the poor, even if
|
||||
he has been working all the time ever so faithfully and diligently. The usual
|
||||
help for the poor, however, leaves a lot to be desired, especially in large
|
||||
cities, where it is very much worse than in the country."
|
||||
|
||||
Otto von Bismarck, 1884
|
||||
|
||||
Otto von Bismarck likely made this speech in reaction to which of the following
|
||||
issues?
|
||||
|
||||
(A) Social acceptance of child labor (B) Declining life expectancy in Germany
|
||||
(C) Criticisms of German trade tariffs (D) Negative effects attributed to industrial
|
||||
capitalism'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on world history
|
||||
for help. The passage talks about the grievance of the work under the industrial
|
||||
capitalism. The answer is (D).
|
||||
- question: "This question refers to the following information.\nHe contains all works\
|
||||
\ and desires and all perfumes and all tastes. He enfolds the whole universe\
|
||||
\ and in silence is loving to all. This is the Spirit that is in my heart, this\
|
||||
\ is Brahman. To him I shall come when I go beyond this life, and to him will\
|
||||
\ come he who has faith and doubts not.\n\u2014The Upanishads, India, c. 1000\
|
||||
\ BCE\nTo which religion does the speaker most likely belong?\n(A) Hinduism\
|
||||
\ (B) Buddhism (C) Shintoism (D) Zoroastrianism"
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on world history
|
||||
for help. Brahman refers to the ultimate reality of all things in the Hindu
|
||||
religion. In contrast, Buddhism does not have a concept of supreme God. The
|
||||
answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_high_school_world_history
|
||||
42
evaluation/data/mmlu/flan_cot_fewshot/mmlu_human_aging.yaml
Normal file
42
evaluation/data/mmlu/flan_cot_fewshot/mmlu_human_aging.yaml
Normal file
@@ -0,0 +1,42 @@
|
||||
dataset_name: human_aging
|
||||
description: The following are multiple choice questions (with answers) about human
|
||||
aging.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'All other things being equal, which of the following persons is more likely
|
||||
to show osteoporosis?
|
||||
|
||||
(A) An older Hispanic American woman (B) An older African American woman (C)
|
||||
An older Asian American woman (D) An older Native American woman'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on human aging
|
||||
for help. Although osteoporosis can occur at any age, the risk is higher for
|
||||
older people. It is most common in Asian and non-Hispanic white women. The answer
|
||||
is (C).
|
||||
- question: 'The finding that adults tend to remember events from their adolescence
|
||||
better than from other periods in their lives is referred to as the
|
||||
|
||||
(A) Adolescence advantage (B) Reminiscence bump (C) Memorial memorial (D) Quadratic
|
||||
retrieval spike'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on human aging
|
||||
for help. Reminiscence bump is a phenomenon that older adults tend to recollect
|
||||
events during their young ages. People usually have a period of childhood amnesia
|
||||
from birth to around age 5, and a reminiscence bump between 10 and 30. The answer
|
||||
is (B).
|
||||
- question: 'Which element in tobacco smoke is responsible for cancers?
|
||||
|
||||
(A) Nicotine (B) Tar (C) Carbon monoxide (D) Smoke particles'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on human aging
|
||||
for help. The benzene, acrylamide and acrylonitrile in tar interact with the
|
||||
lungs and cause DNA mutations in cells of the lungs, and lead to cancer. The
|
||||
answer is (B).
|
||||
- question: 'When older adults move to a new state after retirement, which of the following
|
||||
is the more likely destination?
|
||||
|
||||
(A) Texas (B) California (C) Hawaii (D) Vermont'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on human aging
|
||||
for help. Texas does not have state tax, and has low cost of living compared
|
||||
with the other three options. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_human_aging
|
||||
@@ -0,0 +1,50 @@
|
||||
dataset_name: human_sexuality
|
||||
description: The following are multiple choice questions (with answers) about human
|
||||
sexuality.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'The most common disorder among men who seek sexual therapy is:
|
||||
|
||||
(A) premature ejaculation (B) inhibited ejaculation (C) erectile disorder (D)
|
||||
ejaculatory disorder'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on human sexuality
|
||||
for help. The most common type of arousal disorder for men is erectile dysfunction,
|
||||
meaning that a person is not able to get any physical satisfaction from sexual
|
||||
activity although he may be interested in it. The answer is (C).
|
||||
- question: 'A woman who knows she has active herpes and untreated syphilis but continues
|
||||
to have sex without informing her partners of her condition has, in psychoanalytic
|
||||
terms:
|
||||
|
||||
(A) a strong ego (B) a weak superego (C) a weak id (D) a strong superego'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on human sexuality
|
||||
for help. A person with weak superego tends to be delinquent, criminal or have
|
||||
antisocial personality. The action of the woman who knows she has active venereal
|
||||
disease but still have sex with her partners indicate she may has antisocial
|
||||
personality. The answer is (B).
|
||||
- question: 'The nature of homosexual activities that occur during preadolescence include
|
||||
all but which of the following?
|
||||
|
||||
(A) sexual intercourse (B) circle jerks (C) exhibitionism (D) touching each
|
||||
other''s genitals'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on human sexuality
|
||||
for help. Sexual intercourse is prohibited during preadolescence. The answer
|
||||
is (A).
|
||||
- question: 'Women''s ability to have multiple orgasms is primarily due to:
|
||||
|
||||
(A) the fact that they do not have a refractory period. (B) the response of
|
||||
the inner layers of the vagina. (C) having alternating orgasms in different
|
||||
locations. (D) the G-Spot.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on human sexuality
|
||||
for help. The refractory period is the time when a person is not able to be
|
||||
erect or is not interested in sex. The answer is (A).
|
||||
- question: 'Morning sickness is typically a problem:
|
||||
|
||||
(A) during the first trimester (B) during the second trimester (C) during the
|
||||
third trimester (D) all through the pregnancy'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on human sexuality
|
||||
for help. Morning sickness usually begins by nine weeks after conception, corresponding
|
||||
to the first trimester. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_human_sexuality
|
||||
@@ -0,0 +1,70 @@
|
||||
dataset_name: international_law
|
||||
description: The following are multiple choice questions (with answers) about international
|
||||
law.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'How the consent to be bound of a State may be expressed?
|
||||
|
||||
(A) The consent of a State to be bound is expressed only by ratification (B)
|
||||
The consent of a state to be bound by a treaty may be expressed by signature,
|
||||
ratification, acceptance, approval or accession (C) The consent of a State to
|
||||
be bound is expressed by signature (D) The consent of a State to be bound is
|
||||
expressed by whatever means they choose'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on international
|
||||
law for help. Article 11 of Vienna Convention on the Law of Treaties signed
|
||||
in 1969 states that "the consent of a State to be bound by a treaty may be expressed
|
||||
by signature, exchange of instruments constituting a treaty, ratification, acceptance,
|
||||
approval or accession, or by any other means if so agreed." (B) is the most
|
||||
precise and accurate answer. The answer is (B).
|
||||
- question: 'What is the judge ad hoc?
|
||||
|
||||
(A) If a party to a contentious case before the ICJ does not have a national
|
||||
sitting as judge, it is entitled to nominate someone as a judge solely for that
|
||||
case, with the title of judge ad hoc (B) Judge ad hoc is the member of the bench
|
||||
of the ICJ with a casting vote (C) Judge ad hoc is a surrogate judge, in case
|
||||
a judge is disqualified or passes away (D) Judge ad hoc is the judge that each
|
||||
party will always nominate in every contentious case'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on international
|
||||
law for help. As "ad hoc" implies, a judge ad hoc is appointed only for a specific
|
||||
case or period, when a party to a contentious case before the International
|
||||
Court of Justice does not have a regular national sitting as judge. The answer
|
||||
is (A).
|
||||
- question: 'When ''consent'' can serve as a circumstance precluding the wrongfulness
|
||||
of a State conduct?
|
||||
|
||||
(A) Consent can serve as a circumstance precluding the wrongfulness whenever
|
||||
it is given (B) Consent can never serve as a circumstance precluding wrongfulness
|
||||
(C) Consent can serve as a circumstance precluding wrongfulness, provided the
|
||||
consent is valid and to the extent that the conduct remains within the limits
|
||||
of the consent given (D) Consent can always serve as a circumstance precluding
|
||||
wrongfulness, no matter which organ of the State gives it'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on international
|
||||
law for help. Valid consent can serve as a circumstance precluding the wrongfulness
|
||||
of a State conduct if the conduct remains within the limits of that consent,
|
||||
according to Chapter V of the Responsibility of States for Internationally Wrongful
|
||||
Acts, 2001, United Nations. The answer is (C).
|
||||
- question: 'Would a reservation to the definition of torture in the ICCPR be acceptable
|
||||
in contemporary practice?
|
||||
|
||||
(A) This is an acceptable reservation if the reserving country''s legislation
|
||||
employs a different definition (B) This is an unacceptable reservation because
|
||||
it contravenes the object and purpose of the ICCPR (C) This is an unacceptable
|
||||
reservation because the definition of torture in the ICCPR is consistent with
|
||||
customary international law (D) This is an acceptable reservation because under
|
||||
general international law States have the right to enter reservations to treaties'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on international
|
||||
law for help. For it contravenes the object and purpose of the ICCPR, this is
|
||||
an unacceptable reservation in contemporary practice. The answer is (B).
|
||||
- question: 'What types of force does Article 2(4) of the UN Charter prohibit?
|
||||
|
||||
(A) Article 2(4) encompasses only armed force (B) Article 2(4) encompasses all
|
||||
types of force, including sanctions (C) Article 2(4) encompasses all interference
|
||||
in the domestic affairs of States (D) Article 2(4) encompasses force directed
|
||||
only against a State''s territorial integrity'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on international
|
||||
law for help. Article 2(4) of the UN Charter prohibits states from using armed
|
||||
forces in their international relations. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_international_law
|
||||
@@ -0,0 +1,59 @@
|
||||
dataset_name: jurisprudence
|
||||
description: The following are multiple choice questions (with answers) about jurisprudence.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Iverson Jewelers wrote a letter to Miller, ''We have received an exceptionally
|
||||
fine self winding Rolox watch which we will sell to you at a very favorable
|
||||
price.''
|
||||
|
||||
(A) The letter is an offer to sell (B) A valid offer cannot be made by letter.
|
||||
(C) The letter contains a valid offer which will terminate within a reasonable
|
||||
time. (D) The letter lacks one of the essential elements of an offer.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on jurisprudence
|
||||
for help. An offer shows the intent to enter into a mutually-beneficial contract
|
||||
with specific terms. An offer can be made by a letter. While this letter indicates
|
||||
the willingness to sell, the lack of specific terms, such as transaction price
|
||||
and offer expiration date, makes it an incomplete offer. The answer is (D).
|
||||
- question: 'Functions of the law include all but which of the following?
|
||||
|
||||
(A) maximizing individual freedom (B) providing a basis for compromise (C) keeping
|
||||
the peace (D) promoting the principles of the free enterprise system'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on jurisprudence
|
||||
for help. Laws are fundamentally about helping resolve disputes between individuals,
|
||||
and therefore essential for maximizing individual freedom, providing a basis
|
||||
for compromise, and keeping the peace. The answer is (D).
|
||||
- question: 'The ________ School of jurisprudence postulates that the law is based
|
||||
on what is "correct."
|
||||
|
||||
(A) Natural Law (B) Analytical (C) Historical (D) Sociological'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on jurisprudence
|
||||
for help. Natural Law School of jurisprudence focuses on the laws of nature,
|
||||
and states that the law should be based on ethics, morals, and what is "correct".
|
||||
Analytical deals with the law as it already exists, Historical postulates that
|
||||
the law was found and not made, and Sociological studies how the law and society
|
||||
impact each other. The answer is (A).
|
||||
- question: 'Which word best summarizes Weber''s explanation of the development of
|
||||
formally rational law?
|
||||
|
||||
(A) Authority. (B) Charisma. (C) Co-operation. (D) Capitalism.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on jurisprudence
|
||||
for help. Weber explained the development of formal rationality in laws as how
|
||||
the modern society moved from tradition to rationality, where people decide
|
||||
actions based less on how they were culturally done and more on expected utilities.
|
||||
How rational individuals optimize efficiency of accomplishing tasks for higher
|
||||
rewards is a core principle of Capitalism. The answer is (D).
|
||||
- question: 'Which position does Rawls claim is the least likely to be adopted by the
|
||||
POP (people in the original position)?
|
||||
|
||||
(A) The POP would choose equality above liberty. (B) The POP would opt for the
|
||||
''maximin'' strategy. (C) The POP would opt for the ''difference principle''.
|
||||
(D) The POP would reject the ''system of natural liberty.'''
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on jurisprudence
|
||||
for help. The POP would opt for the ''maximin'' strategy, opt for the ''difference
|
||||
principle'', and reject the ''system of natural liberty'', but the POP would
|
||||
not choose equality above liberty, since the POP assume both equal and free
|
||||
citizens. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_jurisprudence
|
||||
@@ -0,0 +1,61 @@
|
||||
dataset_name: logical_fallacies
|
||||
description: The following are multiple choice questions (with answers) about logical
|
||||
fallacies.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'When an arguer causes confusion during refutation because of real or feigned
|
||||
lack of an ability to engage in refutation, that arguer may have committed the
|
||||
fallacy of
|
||||
|
||||
(A) poor sportsmanship (B) appeal to compassion (C) argument against the person
|
||||
(D) ignorance of refutation'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on logical fallacies
|
||||
for help. Ignorance of refutation, one of Aristotle's original list of logical
|
||||
fallacies in his Organon, is when someone causes confusion in an argument through
|
||||
real or feigned inability to engage in refutation, in order to win the argument.
|
||||
The answer is (D).
|
||||
- question: 'The complex question fallacy consists of
|
||||
|
||||
(A) arguing something is inferior just because it doesn''t do something it was
|
||||
never intended to do. (B) including more than one claim in the proposition and
|
||||
treating proof for one claim as proof for all the claims. (C) drawing a conclusion
|
||||
before examining the evidence, and only considering evidence that supports that
|
||||
conclusion. (D) asking a question that includes either an unproven assumption
|
||||
or more than one question, thus making a straightforward yes or no answer meaningless.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on logical fallacies
|
||||
for help. The complex question fallacy is when someone makes a single yes or
|
||||
no answer to a question meaningless, by including either an unproven assumption
|
||||
or many questions. The latter is also known as the many questions fallacy. The
|
||||
answer is (D).
|
||||
- question: 'Arguing that what is true of the parts must be true of the whole is the
|
||||
fallacy of...
|
||||
|
||||
(A) Division (B) Composition (C) Appeal to the person (D) Appeal to ignorance'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on logical fallacies
|
||||
for help. Fallacy of composition occurs when someone argues what is true of
|
||||
the parts must be true of the whole. The answer is (B).
|
||||
- question: 'Which of the following is true of a valid categorical syllogism?
|
||||
|
||||
(A) The minor premise must deny the antecedent (B) The major premise must affirm
|
||||
the consequent (C) The middle term must be used in at least one premise in a
|
||||
universal or unqualified sense (D) All of the above'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on logical
|
||||
fallacies for help. A valid categorical syllogism must satisfy several conditions:
|
||||
(1) the syllogism must have exactly three terms (2) every term of the syllogism
|
||||
must be used twice exactly, (3) a term may be used only once in any premise,
|
||||
and (4) the middle term must be used in at least one premise in a universal
|
||||
or unqualified sense, etc. Only (C) is true. The answer is (C).'
|
||||
- question: 'If someone attacks the character of an opposing arguer, instead of responding
|
||||
to that opponent''s arguments, the first person has probably committed which
|
||||
of the following fallacies?
|
||||
|
||||
(A) tu quoque (B) horse laugh (C) argument against the person (D) ignoratio
|
||||
elenchi'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on logical
|
||||
fallacies for help. The argument against the person fallacy occurs when someone
|
||||
irrelevantly attacks the character of an opposing arguer, instead of addressing
|
||||
that opponent''s arguments. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_logical_fallacies
|
||||
@@ -0,0 +1,74 @@
|
||||
dataset_name: machine_learning
|
||||
description: The following are multiple choice questions (with answers) about machine
|
||||
learning.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which image data augmentation is most common for natural images?
|
||||
|
||||
(A) random crop and horizontal flip (B) random crop and vertical flip (C) posterization
|
||||
(D) dithering'
|
||||
target: Let's think step by step. Data augmentation is used to increase the diversity
|
||||
of images in the training dataset. It is important that natural images are kept
|
||||
natural after being augmented. Vertical flips of images are not natural, so
|
||||
(B) is false. Posterization makes the image look like a poster and and dithering
|
||||
increases color depth. None of these two preserve the natural property. The
|
||||
only natural data augmentation technique is (A). The answer is (A).
|
||||
- question: "Traditionally, when we have a real-valued question attribute during decision-tree\
|
||||
\ learning we consider a binary split according to whether the attribute is\
|
||||
\ above or below some threshold. Pat suggests that instead we should just have\
|
||||
\ a multiway split with one branch for each of the distinct values of the attribute.\
|
||||
\ From the list below choose the single biggest problem with Pat\u2019s suggestion:\n\
|
||||
(A) It is too computationally expensive. (B) It would probably result in a decision\
|
||||
\ tree that scores badly on the training set and a testset. (C) It would probably\
|
||||
\ result in a decision tree that scores well on the training set but badly on\
|
||||
\ a testset. (D) It would probably result in a decision tree that scores well\
|
||||
\ on a testset but badly on a training set."
|
||||
target: "Let's think step by step. Because the question is real valued, it is unlikely\
|
||||
\ that the same values appear both at training and test time. This means that\
|
||||
\ while such a decision tree could yield good performance on the training data,\
|
||||
\ when evaluated on the test data it will perform badly because the decision\
|
||||
\ tree won\u2019t know what to do with numbers that did not appear in the training\
|
||||
\ data. The answer is (C)."
|
||||
- question: "You are reviewing papers for the World\u2019s Fanciest Machine Learning\
|
||||
\ Conference, and you see submissions with the following claims. Which ones\
|
||||
\ would you consider accepting?\n(A) My method achieves a training error lower\
|
||||
\ than all previous methods! (B) My method achieves a test error lower than\
|
||||
\ all previous methods! (Footnote: When regularisation parameter \u03BB is chosen\
|
||||
\ so as to minimise test error.) (C) My method achieves a test error lower than\
|
||||
\ all previous methods! (Footnote: When regularisation parameter \u03BB is chosen\
|
||||
\ so as to minimise cross-validaton error.) (D) My method achieves a cross-validation\
|
||||
\ error lower than all previous methods! (Footnote: When regularisation parameter\
|
||||
\ \u03BB is chosen so as to minimise cross-validaton error.)"
|
||||
target: "Let's think step by step. In machine learning, we train with some data\
|
||||
\ and fixed hyperparameters and the training error can be arbitrarily low, so\
|
||||
\ (A) can\u2019t be right. Then, one compares different hyperparameters by selecting\
|
||||
\ the model with the lowest cross-validation error, this means that (B) and\
|
||||
\ (D) are not the right procedure. The only relevant number after these is the\
|
||||
\ test error and thus (C) is the right answer. The answer is (C)."
|
||||
- question: 'A 6-sided die is rolled 15 times and the results are: side 1 comes up
|
||||
0 times; side 2: 1 time; side 3: 2 times; side 4: 3 times; side 5: 4 times;
|
||||
side 6: 5 times. Based on these results, what is the probability of side 3 coming
|
||||
up when using Add-1 Smoothing?
|
||||
|
||||
(A) 2.0/15 (B) 1.0/7 (C) 3.0/16 (D) 1.0/5'
|
||||
target: 'Let''s think step by step. Add-1 smoothing adds the value of one to the
|
||||
different counts and then normalizes the probabilities accordingly. The counts
|
||||
after adding one will be: side 1 comes up 1 time; side 2: 2 times; side 3: 3
|
||||
times; side 4: 4 times; side 5: 5 times; side 6: 6 times. The number of sum
|
||||
one die rolls will be 21, so the probability of drawing a three is 3/21 = 1/7.
|
||||
The answer is (B).'
|
||||
- question: 'To achieve an 0/1 loss estimate that is less than 1 percent of the true
|
||||
0/1 loss (with probability 95%), according to Hoeffding''s inequality the IID
|
||||
test set must have how many examples?
|
||||
|
||||
(A) around 10 examples (B) around 100 examples (C) between 100 and 500 examples
|
||||
(D) more than 1000 examples'
|
||||
target: "Let's think step by step. By the Hoeffding\u2019s inequality, we expect\
|
||||
\ that with 95% probability the in-sample and out-of-sample errors differ by\
|
||||
\ epsilon when we have N samples if 2 exp(-2 epsilon^2 N)<0.05, this implies\
|
||||
\ that N > -1/(2*epsilon**2) log ( 0.05/2 )= log (40)*5000. Since log(40)>1,\
|
||||
\ we have that one needs more than 1000 examples. The answer is (D).\n\n"
|
||||
tag: mmlu_flan_cot_fewshot_stem
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_machine_learning
|
||||
46
evaluation/data/mmlu/flan_cot_fewshot/mmlu_management.yaml
Normal file
46
evaluation/data/mmlu/flan_cot_fewshot/mmlu_management.yaml
Normal file
@@ -0,0 +1,46 @@
|
||||
dataset_name: management
|
||||
description: The following are multiple choice questions (with answers) about management.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'How can organisational structures that are characterised by democratic
|
||||
and inclusive styles of management be described?
|
||||
|
||||
(A) Hierarchical (B) Bureaucratic (C) Flat (D) Functional'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on management
|
||||
for help. Flat organizational structures are characterized by democratic and
|
||||
inclusive styles of management, and have few (if any) levels of management between
|
||||
the workers and managers. The answer is (C).
|
||||
- question: 'Hygiene factors are associated with which writer?
|
||||
|
||||
(A) Frederick Hertzberg (B) D.C. McClelland (C) Abraham Maslow (D) Douglas McGregor'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on management
|
||||
for help. Hygiene factors include compensation, company policies, supervision,
|
||||
interpersonal relations, and work environments. Hertzberg lists them as factors
|
||||
that cannot motivate employees but can minimize job dissatisfaction. The answer
|
||||
is (A).
|
||||
- question: 'What characteristic is not a key feature of the ''open systems'' model
|
||||
of management?
|
||||
|
||||
(A) Morale (B) Innovation (C) Growth resource (D) Adaptation'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on management
|
||||
for help. The key characteristics of an open system in management include innovation,
|
||||
growth resource, and adaption, but do not include morale. The answer is (A).
|
||||
- question: 'Which element of the cultural web forms regalia?
|
||||
|
||||
(A) Symbols (B) Rituals and routines (C) Power structures (D) Control systems'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on management
|
||||
for help. The cultural web is a tool for mapping an organization's culture,
|
||||
where symbols form the regalia that visually expresses the values that the organization
|
||||
holds as important. The answer is (A).
|
||||
- question: 'What are the two main dimensions of the Ohio Studies into leadership?
|
||||
|
||||
(A) Starting position and end position (B) Initial environment and changed environment
|
||||
(C) Organisational structure and conditioning (D) Initiating structure and considerations'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on management
|
||||
for help. The Ohio State Leadership Studies conducted in the 1940s identified
|
||||
initiating structure and consideration as the two main dimensions of leader
|
||||
behavior. The answer is (D).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_management
|
||||
56
evaluation/data/mmlu/flan_cot_fewshot/mmlu_marketing.yaml
Normal file
56
evaluation/data/mmlu/flan_cot_fewshot/mmlu_marketing.yaml
Normal file
@@ -0,0 +1,56 @@
|
||||
dataset_name: marketing
|
||||
description: The following are multiple choice questions (with answers) about marketing.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Although the content and quality can be as controlled as direct mail,
|
||||
response rates of this medium are lower because of the lack of a personal address
|
||||
mechanism. This media format is known as:
|
||||
|
||||
(A) Care lines. (B) Direct mail. (C) Inserts. (D) Door to door.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on marketing
|
||||
for help. Door to door marketing delivers non-addressed items within all buildings
|
||||
within a geographic area. While it can control the content and quality as well
|
||||
as direct mail marketing, its response rate is lower because of the lack of
|
||||
a personal address mechanism. The answer is (D).
|
||||
- question: 'In an organization, the group of people tasked with buying decisions is
|
||||
referred to as the _______________.
|
||||
|
||||
(A) Outsourcing unit. (B) Procurement centre. (C) Chief executive unit. (D)
|
||||
Decision-making unit.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on marketing
|
||||
for help. In an organization, the group of the people tasked with buying decision
|
||||
is referred to as the decision-making unit. The answer is (D).
|
||||
- question: 'The single group within society that is most vulnerable to reference group
|
||||
influence is:
|
||||
|
||||
(A) The older consumer who feels somewhat left out of things. (B) The married
|
||||
women, many of whom feel a need for stability in their lives. (C) New immigrants
|
||||
who really want to assimilate into their new culture. (D) Children, who base
|
||||
most of their buying decisions on outside influences.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on marketing
|
||||
for help. Children, who mostly based their buying decisions on outside influences,
|
||||
are the single group within society that is more vulnerable to reference group
|
||||
influence. The answer is (D).
|
||||
- question: 'Which of the following is an assumption in Maslow''s hierarchy of needs?
|
||||
|
||||
(A) Needs are dependent on culture and also on social class. (B) Lower-level
|
||||
needs must be at least partially satisfied before higher needs can affect behaviour.
|
||||
(C) Needs are not prioritized or arranged in any particular order. (D) Satisfied
|
||||
needs are motivators, and new needs emerge when current needs remain unmet.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on marketing
|
||||
for help. Maslow's hierarchy of needs, from the bottom upwards, are physiological
|
||||
(food and clothing), safety, love and belonging needs, esteem, and self-actualization.
|
||||
Lower-level needs must be at least partially satisfied before higher ones can
|
||||
affect behavior. The answer is (B).
|
||||
- question: '_____________ is a natural outcome when combining demographic and geographic
|
||||
variables.
|
||||
|
||||
(A) Geodemographics (B) Product differentiation. (C) ANSOFF matrix. (D) Brand
|
||||
management.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on marketing
|
||||
for help. Geodemographics is a natural outcome when combining demographic and
|
||||
geographic variables. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_marketing
|
||||
@@ -0,0 +1,51 @@
|
||||
dataset_name: medical_genetics
|
||||
description: The following are multiple choice questions (with answers) about medical
|
||||
genetics.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'The stage of meiosis in which chromosomes pair and cross over is:
|
||||
|
||||
(A) prophase I (B) metaphase I (C) prophase II (D) metaphase II'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on medical genetics
|
||||
for help. Prophase I is the stage of meiosis where homologous chromosomes pair
|
||||
with each other and exchange genetic material. The answer is (A).
|
||||
- question: 'DNA ligase is
|
||||
|
||||
(A) an enzyme that joins fragments in normal DNA replication (B) an enzyme of
|
||||
bacterial origin which cuts DNA at defined base sequences (C) an enzyme that
|
||||
facilitates transcription of specific genes (D) an enzyme which limits the level
|
||||
to which a particular nutrient reaches'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on medical genetics
|
||||
for help. DNA ligase is a type of enzyme (EC 6.5.1.1) responsible for joining
|
||||
DNA strands together by catalyzing a phosphodiester bond. The answer is (A).
|
||||
- question: 'Which of the following conditions does not show multifactorial inheritance?
|
||||
|
||||
(A) Pyloric stenosis (B) Schizophrenia (C) Spina bifida (neural tube defects)
|
||||
(D) Marfan syndrome'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on medical genetics
|
||||
for help. Multifactorial inheritance is when more than a single factor is responsible
|
||||
for causing a given trait or health problem. Genes cannot be the only factor.
|
||||
Marfan syndrome, on the other hand, requires only one abnormal copy of the of
|
||||
the Marfan gene, from one parent, to inherit the trait. The answer is (D).
|
||||
- question: 'A gene showing codominance
|
||||
|
||||
(A) has both alleles independently expressed in the heterozygote (B) has one
|
||||
allele dominant to the other (C) has alleles tightly linked on the same chromosome
|
||||
(D) has alleles expressed at the same time in development'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on medical genetics
|
||||
for help. Codominance, as it relates to genetics, refers to a type of genetic
|
||||
inheritance where the phenotype of both the parents is easily observed in the
|
||||
offspring. A heterozygote is an individual having two different alleles of a
|
||||
gene. The answer is (A).
|
||||
- question: 'Large triplet repeat expansions can be detected by:
|
||||
|
||||
(A) polymerase chain reaction. (B) single strand conformational polymorphism
|
||||
analysis. (C) Southern blotting. (D) Western blotting.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on medical
|
||||
genetics for help. A Southern blot is a method in molecular biology for detecting
|
||||
specific DNA sequences in a sample. Large triplet repeat expansions are usually
|
||||
detected with this method. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_medical_genetics
|
||||
@@ -0,0 +1,43 @@
|
||||
dataset_name: miscellaneous
|
||||
description: The following are multiple choice questions (with answers) about miscellaneous.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of these songs was a Top 10 hit for the rock band The Police?
|
||||
|
||||
(A) ''Radio Ga-Ga'' (B) ''Ob-la-di Ob-la-da'' (C) ''De Do Do Do De Da Da Da''
|
||||
(D) ''In-a-Gadda-Da-Vida'''
|
||||
target: Let's think step by step. We refer to Wikipedia for help. Radio Ga-Ga
|
||||
is by Queen. Ob-la-di Ob-la-da is by The Beatles. And In-a-Gadda-Da-Vida is
|
||||
by Iron Butterfly. Leaving 'De Do Do Do De Da Da Da' as the only song by The
|
||||
Police, and also a Top 10 hit. The answer is (C).
|
||||
- question: 'What place is named in the title of the 1979 live album by rock legends
|
||||
Cheap Trick?
|
||||
|
||||
(A) Budapest (B) Budokan (C) Bhutan (D) Britain'
|
||||
target: Let's think step by step. We refer to Wikipedia for help. Nippon Budokan
|
||||
is an indoor arena in Tokyo, Japan renowned for hosting rock music concerts
|
||||
including Cheap Trick in 1978. 'Cheap Trick at Budokan' became the name of their
|
||||
album. The answer is (B).
|
||||
- question: 'What is produced during photosynthesis?
|
||||
|
||||
(A) hydrogen (B) nylon (C) oxygen (D) light'
|
||||
target: Let's think step by step. We refer to Wikipedia for help. Photosynthesis
|
||||
is the process in which green plants use the green pigment chlorophyll to synthesize
|
||||
foods with water and carbon dioxide. Oxygen is the byproduct of this process.
|
||||
The answer is (C).
|
||||
- question: 'Who is the shortest man to ever win an NBA slam dunk competition?
|
||||
|
||||
(A) Anthony ''Spud'' Webb (B) Michael ''Air'' Jordan (C) Tyrone ''Muggsy'' Bogues
|
||||
(D) Julius ''Dr J'' Erving'
|
||||
target: Let's think step by step. We refer to Wikipedia for help. In 1986, Spud
|
||||
Webb, standing only 5'7" became the shortest NBA player in history to win an
|
||||
official slam dunk contest. The answer is (A).
|
||||
- question: 'How many axles does a standard automobile have?
|
||||
|
||||
(A) one (B) two (C) four (D) eight'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia for help. Most cars
|
||||
have two axles to rotate the wheels.. The answer is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_miscellaneous
|
||||
@@ -0,0 +1,64 @@
|
||||
dataset_name: moral_disputes
|
||||
description: The following are multiple choice questions (with answers) about moral
|
||||
disputes.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Baron admits that the versions of the ticking bomb hypothetical she discusses
|
||||
are "stunningly stupid," but she claims this is actually evidence of
|
||||
|
||||
(A) the stupidity of most traditional philosophical examples. (B) a general
|
||||
lack of intelligence among people with advanced degrees. (C) the wrongness of
|
||||
torture. (D) the readiness on the part of many intelligent people to see torture
|
||||
as the best solution to deal with terrorism.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on moral disputes
|
||||
for help. The ticking bomb hypothetical poses a problem where many people will
|
||||
die to an exploding bomb, if the hypothetical terrorist does not disclose how
|
||||
to defuse it. Baron sees this hypothetical as silly, but its prevalence does
|
||||
suggest intelligent people, particularly utilitarians, see torture as justifiable
|
||||
to save the lives in this scenario. The answer is (D).
|
||||
- question: 'A fertilized ovum is also known as
|
||||
|
||||
(A) a zygote. (B) an embryo. (C) a viability. (D) a blastocyst.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on moral disputes
|
||||
for help. Once a single sperm penetrates the layers of an egg to form a new
|
||||
cell, that cell is called a zygote. The answer is (A).
|
||||
- question: 'Pence compares six different cases of reproduction, from natural twinning
|
||||
to SCNT. What conclusion does he draw from this comparison?
|
||||
|
||||
(A) SCNT is not a different kind of reproduction because there are no morally
|
||||
relevant differences between it and other permissible means of reproduction.
|
||||
(B) Because there is a low risk of harm for natural twinning, there will be
|
||||
a low risk of harm for SCNT. (C) both A and B (D) neither A nor B'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on moral disputes
|
||||
for help. SCNT, known as 'somatic cell nuclear transfer', is a technique in
|
||||
which the nucleus of a somatic (body) cell is transferred to the cytoplasm of
|
||||
an enucleated egg. This is used in farm animal cloning. Pence may compare this
|
||||
to other processes of reproduction, and see little moral separation. The answer
|
||||
is (A).
|
||||
- question: 'Which of the following is an intrinsic good, according to Singer?
|
||||
|
||||
(A) being in the upper class (B) the ability to learn and solve problems (C)
|
||||
being more beautiful than most of the population (D) being shorter than average'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on moral disputes
|
||||
for help. Bioethicist Peter Singer sees intrinsic value as innate values conferred
|
||||
by oneself, for oneself. Innanimate objects can be beautiful, short, or have
|
||||
some valuable criteria, but capabilities are intrinsically good. The answer
|
||||
is (B).
|
||||
- question: 'According to Metz, what is wrong with consequentialist arguments against
|
||||
capital punishment based on African values?
|
||||
|
||||
(A) It is unclear as of yet whether or not capital punishment deters harm to
|
||||
the community. (B) It is unclear as of yet whether or not capital punishment
|
||||
deters harm to any individuals. (C) Consequentialism is not supported by African
|
||||
values. (D) Even though consequentialism is supported by African values, no
|
||||
consequentialist arguments framed in terms of African values have been offered.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on moral disputes
|
||||
for help. Thaddeus Metz is a humanities research professor focusing on ethical
|
||||
philosophy in South Africa. Metz has written the death penalty is unjustified
|
||||
as it treats individuals as incapable of communal relations. It is unclear that
|
||||
capital punishment is to the benefit of, or a deterrent of harm to the community.
|
||||
The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_moral_disputes
|
||||
@@ -0,0 +1,62 @@
|
||||
dataset_name: moral_scenarios
|
||||
description: The following are multiple choice questions (with answers) about moral
|
||||
scenarios.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
|
||||
do something clearly morally wrong, according to ordinary moral standards in
|
||||
the US as of 2020? Scenario 1 | I took his wallet after he harassed me at the
|
||||
bar. Scenario 2 | I made love to my newborn baby boy.
|
||||
|
||||
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
|
||||
wrong'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
|
||||
for help. Stealing from someone is not considered an ethical response to harrassment.
|
||||
And having sex with a minor is egregiously unethical. The answer is (A).
|
||||
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
|
||||
do something clearly morally wrong, according to ordinary moral standards in
|
||||
the US as of 2020? Scenario 1 | I asked for a cigarette from the guy outside
|
||||
the store. Scenario 2 | When the security guard left his post, I pocketed a
|
||||
rare coin.
|
||||
|
||||
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
|
||||
wrong'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
|
||||
for help. Asking for something is not morally wrong. Stealing is morally wrong.
|
||||
The answer is (C).
|
||||
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
|
||||
do something clearly morally wrong, according to ordinary moral standards in
|
||||
the US as of 2020? Scenario 1 | During the house''s demolition, I pushed him
|
||||
into a wall. Scenario 2 | When I choose a babysitter for my twin nieces, I don''t
|
||||
care about her racial background.
|
||||
|
||||
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
|
||||
wrong'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
|
||||
for help. Physical violence is morally wrong. Being open to hiring people from
|
||||
different racial backgrounds is an expected moral stance. The answer is (B).
|
||||
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
|
||||
do something clearly morally wrong, according to ordinary moral standards in
|
||||
the US as of 2020? Scenario 1 | I bought an old pick-up truck for work. Scenario
|
||||
2 | I looked up at the gorgeous waterfall and threw a rock at it.
|
||||
|
||||
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
|
||||
wrong'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
|
||||
for help. Neither of these are wrong as no one is wronged by buing a pick-up
|
||||
truck or throwing a rock into a stream. The answer is (D).
|
||||
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
|
||||
do something clearly morally wrong, according to ordinary moral standards in
|
||||
the US as of 2020? Scenario 1 | I really loved my newborn baby boy. Scenario
|
||||
2 | The new girl is embarrassed to live in a trailer park, so I invite the class
|
||||
to her house.
|
||||
|
||||
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
|
||||
wrong'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on moral scenarios
|
||||
for help. Loving someone is not wrong. However, exposing something that someone
|
||||
is embarrassed about could be considered quite mean. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_moral_scenarios
|
||||
63
evaluation/data/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
Normal file
63
evaluation/data/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
Normal file
@@ -0,0 +1,63 @@
|
||||
dataset_name: nutrition
|
||||
description: The following are multiple choice questions (with answers) about nutrition.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'What is the first-line drug for patients with type 2 diabetes and obesity,
|
||||
as of 2020?
|
||||
|
||||
(A) Acarbose (B) Metformin (C) Sulphonylureas (D) Insulin'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on nutrition
|
||||
for help. Metformin (Fortamet, Glumetza, or others) is usually the first medication
|
||||
prescribed for type 2 diabetes, as well as obesity. It works by lowering glucose
|
||||
production in the liver and improving the body's sensitivity to insulin. The
|
||||
answer is (B).
|
||||
- question: 'Which of the following statements is correct (according to knowledge in
|
||||
2020)?
|
||||
|
||||
(A) Consumers with phenylketonuria must avoid the consumption of the sweetener
|
||||
aspartame (B) Consumers with phenylketonuria must avoid the consumption of the
|
||||
sweetener saccharin (C) Consumers with phenylketonuria must avoid the consumption
|
||||
of the sweetener sucralose (D) Consumers with phenylketonuria must avoid the
|
||||
consumption of the sweetener acesulfame K'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on nutrition
|
||||
for help. People with phenylketonuria (PKU) cannot break down the amino acid
|
||||
phenylalanine. As it builds up in the blood and brain it can lead to brain damage.
|
||||
People with PKU should avoid foods that are converted to phenylalanine in the
|
||||
body, such as aspartame. The answer is (A).
|
||||
- question: 'Which of the following statements about iodine is correct, as of 2020?
|
||||
|
||||
(A) 50% of adults consume iodine at levels below the RNI (B) Dairy products
|
||||
are a poor source of iodine (C) The iodine content of organic milk is generally
|
||||
lower that the level in non-organic milk (D) UK dietary reference values recommend
|
||||
an increase in iodine intake in pregnancy'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on nutrition
|
||||
for help. Organic milk usually has less iodine content than non-organic milk.
|
||||
The answer is (C).
|
||||
- question: 'Which of the following is the most plausible explanation for the protective
|
||||
effect of dietary fibre against cancer of the colon, as of 2020?
|
||||
|
||||
(A) Propionic acid, formed during colonic fibre fermentation inhibits liver
|
||||
fatty acid synthesis (B) Butyric acid, formed during colonic fibre fermentation
|
||||
stimulates "silencing" of the SLC5A8 tumour suppressor gene (C) None of these
|
||||
options are correct (D) Butyric acid, formed during colonic fibre fermentation
|
||||
stimulates anti-oxidant defences in the colon'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on nutrition
|
||||
for help. Dietary fibre is inversely proportional to the risk of colorectal
|
||||
cancer. This is presumed because butyric acid (BA) stimulates antioxidants which
|
||||
help protect the colon from cancerous tumors. The answer is (D).
|
||||
- question: 'In a cohort study, the risk ratio of developing diabetes was 0.86 when
|
||||
comparing consumers of tea (the exposed) to those who did not drink tea (the
|
||||
unexposed). Which one statement is correct (according to knowledge in 2020)?
|
||||
|
||||
(A) The tea drinkers have lower risk of developing diabetes. (B) The tea drinkers
|
||||
have higher risk of developing diabetes. (C) Based on the information given
|
||||
we cannot tell if the observed difference in disease risk is the result of chance.
|
||||
(D) The risk ratio is close to the value one, so there is no difference in disease
|
||||
risk between the two groups.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on nutrition
|
||||
for help. The risk ratio is not sufficiently reduced that it could not be explained
|
||||
by random chance given the studies sample size. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_nutrition
|
||||
44
evaluation/data/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
Normal file
44
evaluation/data/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
Normal file
@@ -0,0 +1,44 @@
|
||||
dataset_name: philosophy
|
||||
description: The following are multiple choice questions (with answers) about philosophy.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'The study of reality in the broadest sense, an inquiry into the elemental
|
||||
nature of the universe and the things in it, is known as _____.
|
||||
|
||||
(A) metaphysics (B) epistemology (C) quantum physics (D) axiology'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on philosophy
|
||||
for help. Among the options, only metaphysics studies the nature of reality
|
||||
and existence. The answer is (A).
|
||||
- question: "According to Moore\u2019s \u201Cideal utilitarianism,\u201D the right\
|
||||
\ action is the one that brings about the greatest amount of:\n(A) pleasure.\
|
||||
\ (B) happiness. (C) good. (D) virtue."
|
||||
target: Let's think step by step. We refer to Wikipedia articles on philosophy
|
||||
for help. Moore's "ideal utilitarianism" states that one's actions should maximize
|
||||
intrinsic goods. The answer is (C).
|
||||
- question: 'Before Tolstoy''s Christian conversion, what was his perspective on the
|
||||
meaning of life?
|
||||
|
||||
(A) optimist (B) satisfied (C) nominally religious (D) pessimist'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on philosophy
|
||||
for help. Before his conversion, Tolstoy feels that life was uncertain, which
|
||||
is a pessimist's point of view. The answer is (D).
|
||||
- question: 'According to d''Holbach, people always act according to _____.
|
||||
|
||||
(A) free choices (B) dictates of the soul (C) necessary natural laws (D) undetermined
|
||||
will'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on philosophy
|
||||
for help. d'Holbach believes that people act according to necessary laws, and
|
||||
it proves nothing about people's free will. The answer is (C).
|
||||
- question: 'Psychological egoism is:
|
||||
|
||||
(A) an ethical theory about how we ought to behave. (B) a generalization concerning
|
||||
the way people tend to behave. (C) a claim about human nature and the ways people
|
||||
are capable of behaving. (D) none of the above.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on philosophy
|
||||
for help. Psychological egoism suggests that one behaves based on what makes
|
||||
one feels good, hence it is a claim about human nature and how humans are capable
|
||||
of behaving. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_philosophy
|
||||
59
evaluation/data/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
Normal file
59
evaluation/data/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
Normal file
@@ -0,0 +1,59 @@
|
||||
dataset_name: prehistory
|
||||
description: The following are multiple choice questions (with answers) about prehistory.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'What is the approximate mean cranial capacity of Homo erectus?
|
||||
|
||||
(A) under 650 cc (B) about 800 cc (C) just under 1000 cc (D) 1200 cc'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on prehistory
|
||||
for help. The average cranium capacity of Homo erectus is less than 1000 cubic
|
||||
cm. The answer is (C).
|
||||
- question: 'According to Timothy Pauketat, the evidence for social stratification
|
||||
and political power at Cahokia suggests:
|
||||
|
||||
(A) a center of Mississippian civilization with conditions similar to the rise
|
||||
of early states. (B) the limitations of authority in a Native American society
|
||||
of egalitarian foragers. (C) a simple chiefdom or perhaps a complex chiefdom
|
||||
had evolved by A.D. 1500. (D) a center of Mississippian civilization with conditions
|
||||
similar to societies on the Northwest Coast of North America.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on prehistory
|
||||
for help. Timothy Pauketat is known for his research on Cahokia, the center
|
||||
of the Mississippian culture, where he found similar conditions to the rise
|
||||
of early states. The answer is (A).
|
||||
- question: 'Recent research on hominid species dating from the Middle Pliocene indicates
|
||||
there was (as of 2020):
|
||||
|
||||
(A) a great amount of species diversity, or a single species that exhibited
|
||||
a lot of diversity. (B) very little species diversity during this period and
|
||||
very few hominids. (C) decreased species diversity due to a prolonged ice age
|
||||
followed by a severe drought. (D) decreased species diversity but increased
|
||||
numbers of hammerstones and flakes, indicating stone tool manufacture.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on prehistory
|
||||
for help. Recent research has recognized multiple hominid species from the Middle
|
||||
Pliocene, meaning that there is a great amount of species diversity or diversity
|
||||
in a single species. The answer is (A).
|
||||
- question: 'Researchers now believe that the decline of the Maya was caused chiefly
|
||||
by:
|
||||
|
||||
(A) a cataclysm of some kind, such as an earthquake, volcano, or tsunami. (B)
|
||||
ecological degradation resulting from slash-and-burn farming techniques. (C)
|
||||
endless wars between neighboring Mayan city-states. (D) practices of interbreeding
|
||||
that led to a steep rise in congenital disorders.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on prehistory
|
||||
for help. Researchers believe that the Maya collapse was mainly caused by over-exploitation
|
||||
of natural resources like the slash-and-burn farming techniques. The answer
|
||||
is (B).
|
||||
- question: 'The great Mayan king Pacal built temples in the city of Palenque in order
|
||||
to:
|
||||
|
||||
(A) satisfy the powerful Mayan astronomer priests. (B) display his generosity
|
||||
to the common people, since they were allowed to live in the temples. (C) frighten
|
||||
away enemies, in particular the Spaniards. (D) legitimize his kingship, since
|
||||
his father was not royal.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on prehistory
|
||||
for help. Pacal built the temples as the funerary monument to legitimize his
|
||||
kingship. The answer is (D).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_prehistory
|
||||
@@ -0,0 +1,63 @@
|
||||
dataset_name: professional_accounting
|
||||
description: The following are multiple choice questions (with answers) about professional
|
||||
accounting.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: "An auditor traces the serial numbers on equipment to a nonissuer\u2019\
|
||||
s subledger. Which of the following management assertions is supported by this\
|
||||
\ test?\n(A) Valuation and allocation (B) Completeness (C) Rights and obligations\
|
||||
\ (D) Presentation and disclosure"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on accounting
|
||||
for help. The completeness assertion is tested by tracing supporting documents
|
||||
to the record entries. The answer is (B).
|
||||
- question: 'One hundred years ago, your great-great-grandmother invested $100 at 5%
|
||||
yearly interest. What is the investment worth today?
|
||||
|
||||
(A) $13,000 (B) $600 (C) $15,000 (D) $28,000'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on accounting
|
||||
for help. A $100 investment at 5% yearly interest is worth 100*(1.05)^100=13150
|
||||
after 100 years, which is around $13,000. The answer is (A).
|
||||
- question: 'On January 1, year 1, Alpha Co. signed an annual maintenance agreement
|
||||
with a software provider for $15,000 and the maintenance period begins on March
|
||||
1, year 1. Alpha also incurred $5,000 of costs on January 1, year 1, related
|
||||
to software modification requests that will increase the functionality of the
|
||||
software. Alpha depreciates and amortizes its computer and software assets over
|
||||
five years using the straight-line method. What amount is the total expense
|
||||
that Alpha should recognize related to the maintenance agreement and the software
|
||||
modifications for the year ended December 31, year 1?
|
||||
|
||||
(A) $5,000 (B) $13,500 (C) $16,000 (D) $20,000'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on accounting
|
||||
for help. The maintenance period begins on March 1, so only 10 months of expenses
|
||||
should be recognized, which is $15,000/12*10=$12,500. The software modification
|
||||
cost is amortized over 5 years, so each year is $5,000/5=$1,000. So the total
|
||||
expense is $12,500+$1,000=$13,500. The answer is (B).
|
||||
- question: 'Krete is an unmarried taxpayer with income exclusively from wages. By
|
||||
December 31, year 1, Krete''s employer has withheld $16,000 in federal income
|
||||
taxes and Krete has made no estimated tax payments. On April 15, year 2, Krete
|
||||
timely filed for an extension request to file her individual tax return, and
|
||||
paid $300 of additional taxes. Krete''s year 1 tax liability was $16,500 when
|
||||
she timely filed her return on April 30, year 2, and paid the remaining tax
|
||||
liability balance. What amount would be subject to the penalty for underpayment
|
||||
of estimated taxes?
|
||||
|
||||
(A) $0 (B) $500 (C) $1,650 (D) $16,500'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on accounting
|
||||
for help. The tax due after withholding is $16,500-$16,000=$500, which is less
|
||||
than $1000, hence there is no underpayment penalty of estimated taxes. The answer
|
||||
is (A).
|
||||
- question: 'Box a nongovernmental not-for-profit organization had the following transactions
|
||||
during the year: Proceeds from sale of investments $80000 Purchase of property
|
||||
plant and equipment $10000 Proceeds from long-term debt $100000 Loss on sale
|
||||
of investment $5000 What amount should be reported as net cash provided by financing
|
||||
activities in Box''s statement of cash flows?
|
||||
|
||||
(A) $70,000 (B) $75,000 (C) $80,000 (D) 100000'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on accounting
|
||||
for help. Among the four transactions, only Proceeds from long-term debt belongs
|
||||
to the financing activities section of cashflow, hence the amount reported should
|
||||
be $100000. The answer is (D).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_professional_accounting
|
||||
122
evaluation/data/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
Normal file
122
evaluation/data/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
Normal file
@@ -0,0 +1,122 @@
|
||||
dataset_name: professional_law
|
||||
description: The following are multiple choice questions (with answers) about professional
|
||||
law.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'A son owed a creditor $5,000. The son''s father contacted the creditor
|
||||
and told him that he wanted to pay the son''s debt. The father signed a document
|
||||
that stated the father would pay the son''s debt at a rate of $500 a month for
|
||||
10 months. The creditor made no written or oral commitment to forbear to sue
|
||||
the son to collect the $5,000 debt, and the father made no oral or written request
|
||||
for any such forbearance. For the next five months, the father made and the
|
||||
creditor accepted the $500 monthly payments as agreed. During that period, the
|
||||
creditor, in fact, did forbear to take any legal action against the son. However,
|
||||
the father then informed the creditor that he would make no further payments
|
||||
on the debt. Which of the following is the most persuasive argument that the
|
||||
father is liable to the creditor under the terms of their agreement?
|
||||
|
||||
(A) The father''s promise and the creditor''s reliance thereon, if proved, gave
|
||||
rise to a valid claim by the creditor against the father based on the doctrine
|
||||
of promissory estoppel. (B) Because it was foreseeable that the father''s promise
|
||||
would induce the creditor to forbear taking any action against the son, such
|
||||
forbearance was, as a matter of law, a bargained-for consideration for the father''s
|
||||
promise. (C) The father''s five payments to the creditor totaling $2,500 manifested
|
||||
a serious intent on the father''s part to be contractually bound, and such manifestation
|
||||
is generally recognized as an effective substitute for consideration. (D) By
|
||||
assuming the antecedent debt obligation that the son owed to the creditor, the
|
||||
father became a surety whose promise to the creditor was enforceable, since
|
||||
it was in writing and supported by adequate consideration. '
|
||||
target: Let's think step by step. We refer to Wikipedia articles on law for help.
|
||||
The doctrine of promissory estoppel stops a person from going back on a promise
|
||||
in contract law, hence option (A) should be the most persuasive argument. The
|
||||
answer is (A).
|
||||
- question: 'A state has recently enacted a statute prohibiting the disposal of any
|
||||
nuclear wastes within the state. This law does not contravene or conflict with
|
||||
any federal statutes. A man operates a company in the state that is engaged
|
||||
in the disposal of nuclear wastes. Subsequent to the passage of the state statute,
|
||||
the man, not yet aware of the new law, entered into contracts with many out-of-state
|
||||
firms to dispose of their nuclear wastes in the state. On account of this new
|
||||
law, however, the man will be unable to perform these contracts. Assume that
|
||||
the man has standing to challenge this state law. Which of the following presents
|
||||
his strongest constitutional grounds to challenge the state law prohibiting
|
||||
the disposal of nuclear wastes within the state?
|
||||
|
||||
(A) The commerce clause. (B) The equal protection clause of the Fourteenth Amendment.
|
||||
(C) The privileges and immunities clause of Article IV, Section 2. (D) The contract
|
||||
clause.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on law for help.
|
||||
The commerce clause states that Congress shall have the power to regulate commerce
|
||||
with foreign Nations, and among the several States, and with the Indian Tribes.
|
||||
The statute affects inter-state commerce which puts it into question. Hence
|
||||
the man's strongest argument should be the commerce clause. The answer is (A).
|
||||
- question: 'On October 1, 1980, a developer, owner of several hundred acres in a rural
|
||||
county, drafted a general development plan for the area. The duly recorded plan
|
||||
imposed elaborate limitations and restrictions upon the land in the plan, which
|
||||
was to be developed as a residential district. The restrictions were to extend
|
||||
to all persons acquiring any of the lots and to their heirs, assigns, and lessees.
|
||||
It was further provided that all subsequent owners would be charged with due
|
||||
notice of the restrictions. Among those restrictions in the general plan were
|
||||
the following:(22) A franchise right is created in a strip of land 10 feet in
|
||||
width along the rear of each lot for the use of public utility companies with
|
||||
right of ingress and egress. (23) No house or structure of any kind shall be
|
||||
built on the aforementioned strip of land running through the said blocks. In
|
||||
2000, a retiree purchased one of the lots, built a house, and erected a fence
|
||||
in the rear of his property within the restricted area. In 2004, a teacher purchased
|
||||
a lot adjacent to the retiree''s property and built a new house. Two years later,
|
||||
a librarian purchased the lot that adjoined the teacher''s property. The three
|
||||
deeds to those properties each contained references to the deed book where the
|
||||
general plan was recorded. In 2008, the librarian began the construction of
|
||||
a seven-foot post-and-rail fence along the line dividing his lot with the teacher''s,
|
||||
and along the center of the area subject to the franchise right. Although the
|
||||
teacher objected to its construction, the fence was completed. If the teacher
|
||||
seeks a mandatory injunction to compel removal of the librarian''s fence, the
|
||||
court will most likely
|
||||
|
||||
(A) grant relief, because the fence was in violation of the easement restriction.
|
||||
(B) grant relief, because the encroachment of the fence violated the restriction
|
||||
in the original plan. (C) deny relief, because the teacher failed to enforce
|
||||
the restriction against the retiree. (D) deny relief, because the fence would
|
||||
not be construed as "a structure" within the terms of the restriction. '
|
||||
target: Let's think step by step. We refer to Wikipedia articles on law for help.
|
||||
The restrictions in the original plan say no house or structure of any kind
|
||||
shall be built on the aforementioned strip of land running through the said
|
||||
blocks. Hence the court will most likely grant relief because the fence violated
|
||||
the restriction in the original plan. The answer is (B).
|
||||
- question: 'Judge took judicial notice of some facts at the beginning of the trial.
|
||||
Which of the following is not an appropriate kind of fact for judicial notice?
|
||||
|
||||
(A) Indisputable facts. (B) Facts that have been asserted by individual political
|
||||
organizations. (C) Facts recognized to be true by common knowledge. (D) Facts
|
||||
capable of scientific verification.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on law for help.
|
||||
Among the options, facts that have been asserted by individual political organizations
|
||||
is not an appropriate kind of fact for judicial notice. The answer is (B).
|
||||
- question: 'A state legislature has recently enacted a statute making it a misdemeanor
|
||||
to curse or revile or use obscene or opprobrious language toward or in reference
|
||||
to a police officer perfonning his duties. A student at a state university organized
|
||||
a demonstration on campus to protest the war. The rally was attended by a group
|
||||
of 50 students who shouted anti-war messages at cars passing by. To show his
|
||||
contempt for the United States, the student sewed the American flag to the rear
|
||||
of his jeans. When a police officer saw the flag sown on the student''s jeans,
|
||||
he approached and told him to remove the flag or he would be placed under arrest.
|
||||
The student became angered and shouted at the police officer, "Listen, you bastard,
|
||||
I''ll wear this rag anywhere I please. " The student was subsequently placed
|
||||
under arrest and charged with violating the state statute. The student subsequently
|
||||
brings suit in state court challenging the constitutionality of the statute.
|
||||
The strongest constitutional argument for the student is that
|
||||
|
||||
(A) the statute is void for vagueness under the Fourteenth Amendment''s due
|
||||
process clause. (B) the statute is invalid because it violates the petitioner''s
|
||||
freedom of speech under the First Amendment. (C) the statute is an abridgment
|
||||
of freedom of speech under the First Amendment because less restrictive means
|
||||
are available for achieving the same purpose. (D) the statute is overbroad and
|
||||
consequently invalid under the First and Fourteenth Amendments.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on law for
|
||||
help. The Fourteenth Amendment further supports the First Amendment by establishing
|
||||
a due process clause. Hence the strongest argument should be the statute is
|
||||
overbroad and consequently invalid under the First and Fourteenth Amendments.
|
||||
The answer is (D).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_professional_law
|
||||
@@ -0,0 +1,82 @@
|
||||
dataset_name: professional_medicine
|
||||
description: The following are multiple choice questions (with answers) about professional
|
||||
medicine.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: "A 22-year-old male marathon runner presents to the office with the complaint\
|
||||
\ of right-sided rib pain when he runs long distances. Physical examination\
|
||||
\ reveals normal heart and lung findings and an exhalation dysfunction at ribs\_\
|
||||
4-5 on the right. Which of the following muscles or muscle groups will be most\
|
||||
\ useful in correcting this dysfunction utilizing a direct method?\n(A) anterior\
|
||||
\ scalene (B) latissimus dorsi (C) pectoralis minor (D) quadratus lumborum"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on medicine for
|
||||
help. Among the options, only pectoralis minor muscle origins from the outer
|
||||
surfaces of the 3rd to 5th ribs. The answer is (C).
|
||||
- question: "A 36-year-old male presents to the office with a\_3-week\_history of low\
|
||||
\ back pain. He denies any recent trauma but says that he climbs in and out\
|
||||
\ of his truck numerous times a day for his job. Examination of the patient\
|
||||
\ in the prone position reveals a deep sacral sulcus on the left, a posterior\
|
||||
\ inferior lateral angle on the right, and a lumbosacral junction that springs\
|
||||
\ freely on compression. The most likely diagnosis is\n(A) left-on-left sacral\
|
||||
\ torsion (B) left-on-right sacral torsion (C) right unilateral sacral flexion\
|
||||
\ (D) right-on-right sacral torsion"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on medicine for
|
||||
help. The deep sulcus on the left, a posterior ILA on the right, with a negative
|
||||
spring test suggests a right-on-right sacral torsion. All other options have
|
||||
a deep sulcus on the right. The answer is (D).
|
||||
- question: "A 44-year-old man comes to the office because of a 3-day history of sore\
|
||||
\ throat, nonproductive cough, runny nose, and frontal headache. He says the\
|
||||
\ headache is worse in the morning and ibuprofen does provide some relief. He\
|
||||
\ has not had shortness of breath. Medical history is unremarkable. He takes\
|
||||
\ no medications other than the ibuprofen for pain. Vital signs are temperature\
|
||||
\ 37.4\xB0C (99.4\xB0F), pulse 88/min, respirations 18/min, and blood pressure\
|
||||
\ 120/84 mm Hg. Examination of the nares shows erythematous mucous membranes.\
|
||||
\ Examination of the throat shows erythema and follicular lymphoid hyperplasia\
|
||||
\ on the posterior oropharynx. There is no palpable cervical adenopathy. Lungs\
|
||||
\ are clear to auscultation. Which of the following is the most likely cause\
|
||||
\ of this patient's symptoms?\n(A) Allergic rhinitis (B) Epstein-Barr virus\
|
||||
\ (C) Mycoplasma pneumonia (D) Rhinovirus"
|
||||
target: Let's think step by step. We refer to Wikipedia articles on medicine for
|
||||
help. The symptoms, especially the headache, suggest that the most likely cause
|
||||
is Rhinovirus. Epstein-Barr virus will cause swollen lymph nodes but there is
|
||||
no palpable cervical adenopathy. Lungs are clear to auscultation suggests it's
|
||||
not Mycoplasma pneumonia. The answer is (D).
|
||||
- question: 'A previously healthy 32-year-old woman comes to the physician 8 months
|
||||
after her husband was killed in a car crash. Since that time, she has had a
|
||||
decreased appetite and difficulty falling asleep. She states that she is often
|
||||
sad and cries frequently. She has been rechecking the door lock five times before
|
||||
leaving her house and has to count exactly five pieces of toilet paper before
|
||||
she uses it. She says that she has always been a perfectionist but these urges
|
||||
and rituals are new. Pharmacotherapy should be targeted to which of the following
|
||||
neurotransmitters?
|
||||
|
||||
(A) Dopamine (B) Glutamate (C) Norepinephrine (D) Serotonin'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on medicine for
|
||||
help. The patient feels sad and among the options, only Dopamine and Serotonin
|
||||
can help increase positive emotions. Serotonin also affects digestion and metabolism,
|
||||
which can help the patient's decreased appetite and sleep difficulty. The answer
|
||||
is (D).
|
||||
- question: "A 42-year-old man comes to the office for preoperative evaluation prior\
|
||||
\ to undergoing adrenalectomy scheduled in 2 weeks. One month ago, he received\
|
||||
\ care in the emergency department for pain over his right flank following a\
|
||||
\ motor vehicle collision. At that time, blood pressure was 160/100 mm Hg and\
|
||||
\ CT scan of the abdomen showed an incidental 10-cm left adrenal mass. Results\
|
||||
\ of laboratory studies, including complete blood count, serum electrolyte concentrations,\
|
||||
\ and liver function tests, were within the reference ranges. The patient otherwise\
|
||||
\ had been healthy and had never been told that he had elevated blood pressure.\
|
||||
\ He takes no medications. A follow-up visit in the office 2 weeks ago disclosed\
|
||||
\ elevated urinary normetanephrine and metanephrine and plasma aldosterone concentrations.\
|
||||
\ The patient was referred to a surgeon, who recommended the adrenalectomy.\
|
||||
\ Today, vital signs are temperature 36.6\xB0C (97.9\xB0F), pulse 100/min, respirations\
|
||||
\ 14/min, and blood pressure 170/95 mm Hg. Physical examination discloses no\
|
||||
\ significant findings. Initial preoperative preparation should include treatment\
|
||||
\ with which of the following?\n(A) Labetalol (B) A loading dose of potassium\
|
||||
\ chloride (C) Nifedipine (D) Phenoxybenzamine"
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on medicine
|
||||
for help. The symptoms and the adrenal mass suggested pheochromocytoma, and
|
||||
the blood pressure indicates hypertension. Phenoxybenzamine is used to treat
|
||||
hypertension caused by pheochromocytoma. The answer is (D).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_professional_medicine
|
||||
@@ -0,0 +1,62 @@
|
||||
dataset_name: professional_psychology
|
||||
description: The following are multiple choice questions (with answers) about professional
|
||||
psychology.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'In the construction of a multiple regression equation for purposes of
|
||||
prediction, the optimal combination of measures is one in which the predictors
|
||||
|
||||
(A) are uncorrelated with each other but are moderately correlated with the
|
||||
criterion (B) have low correlations with each other and low correlations with
|
||||
the criterion (C) are highly intercorrelated with each other and moderately
|
||||
correlated with the criterion (D) have low correlations with the criterion bur
|
||||
are moderately correlated with each other'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. The basis of multiple regression is to assess the relationship between
|
||||
one continuous variable and a set of independent variables. So the predictors
|
||||
should be uncorrelated with each other but are moderately correlated with the
|
||||
criterion. The answer is (A).
|
||||
- question: 'There are three ways to measure the Central Tendency: the Mean, the Median
|
||||
and the Mode. From your knowledge about them, what is the mode?
|
||||
|
||||
(A) less sensitive to extreme scores than the mean (B) more useful for skewed
|
||||
distributions (C) sensitive to extreme values and highly skewed distributions
|
||||
(D) the most frequently occurring number'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. The definition of mode is the most frequently occurring number. The
|
||||
answer is (D).
|
||||
- question: "Carl Jung believed that a client's transference:\n(A) is a fantasy that\
|
||||
\ distracts the client from reality. (B) represents \u201Cmixed feelings\u201D\
|
||||
\ toward the therapist. (C) \"is a form of \"\"acting out.\"\"\" (D) reflects\
|
||||
\ the client\u2019s personal and collective unconscious."
|
||||
target: Let's think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. Transference is a phenomenon that a person's feelings are unconsciously
|
||||
redirected, so it reflects the client's personal and collective unconscious.
|
||||
The answer is (D).
|
||||
- question: "In terms of Hofstede\u2019s (1980) five cultural dimensions, the United\
|
||||
\ States scores at the top of the scale on:\n(A) individualism. (B) individualism\
|
||||
\ and power distance. (C) power distance and masculinity. (D) uncertainty avoidance."
|
||||
target: Let's think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. US scores highest on individualism among the five cultural dimensions.
|
||||
The answer is (A).
|
||||
- question: 'One of your therapy clients asks your advice about a good weight- reduction
|
||||
program. You have investigated the programs in the community and are enrolled
|
||||
in the one you consider the best. This program offers a $50 bonus to its patrons
|
||||
for each new person they bring into the program. Under these circumstances,
|
||||
your most appropriate response would be to
|
||||
|
||||
(A) tell your client the pros and cons of each program you know about except
|
||||
for the one in which you are enrolled (B) recommend to your client the program
|
||||
in which you are enrolled and explain the $50 bonus you will receive (C) recommend
|
||||
to your client the program in which you are enrolled and offer to have the $50
|
||||
bonus credited to your client''s account in the program (D) tell your client
|
||||
the pros and cons of each program you know about, but do not claim the $50 bonus
|
||||
if your client enrolls in your program'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on psychology
|
||||
for help. Based on the circumstances, you should tell your client about the
|
||||
pros and cons of each program, but it would be inappropriate to receive the
|
||||
bonus, so you should not claim the $50 bonus. The answer is (D).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_professional_psychology
|
||||
@@ -0,0 +1,55 @@
|
||||
dataset_name: public_relations
|
||||
description: The following are multiple choice questions (with answers) about public
|
||||
relations.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Earth Hour was a campaign launched by which organization?
|
||||
|
||||
(A) Greenpeace (B) The UN (C) Oxfam (D) World Wildlife Fund'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on public relations
|
||||
for help. Earth Hour is a worldwide movement oragnized launched by the World
|
||||
Wildlife Fund. The answer is (D).
|
||||
- question: 'In issues management, what is the most proactive approach to addressing
|
||||
negative or misleading information posted online about your organization?
|
||||
|
||||
(A) Buy domain names that could be used by opposition groups. (B) Post anonymous
|
||||
comments on blogs to combat this information. (C) Prepare a news release that
|
||||
discredits the inaccurate information. (D) Make policy changes to address complaints
|
||||
highlighted on these sites.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on public relations
|
||||
for help. In issues management, the most proactive approach to addressing negative
|
||||
or misleading information posted online is to make policy changes to address
|
||||
complaints highlighted on those sites. The answer is (D).
|
||||
- question: 'At which stage in the planning process would a situation analysis be carried
|
||||
out?
|
||||
|
||||
(A) Defining the program (B) Planning the program (C) Taking action and implementing
|
||||
ideas (D) Evaluation of the program'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on public relations
|
||||
for help. Situation analyses are typically carried out during the planning process
|
||||
stage of defining the program. The answer is (A).
|
||||
- question: 'Which of these statements is true of the Vatican in 2010 at the time of
|
||||
the accusations of child abuse cover-ups?
|
||||
|
||||
(A) There was a coordinated media response. (B) Consistent messages were communicated.
|
||||
(C) Criticisms were taken as attacks on the Catholic Church. (D) The credibility
|
||||
of the Vatican was upheld.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on public relations
|
||||
for help. In 2010 when there were accusations of child abuse cover-ups, the
|
||||
Vatican took those criticisms as attacks on the Catholic Church. The answer
|
||||
is (C).
|
||||
- question: 'What should a public relations media practitioner do if she does not know
|
||||
the answer to a reporter''s question?
|
||||
|
||||
(A) Give the reporter other information she is certain is correct. (B) Say that
|
||||
the information is ''off the record'' and will be disseminated later. (C) Say
|
||||
''I don''t know'' and promise to provide the information later. (D) Say ''no
|
||||
comment,'' rather than appear uninformed.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on public relations
|
||||
for help. If a public relations media practitioner does not know the answer
|
||||
to a reporter''s question, they should say ''I don''t know'' and offer to provide
|
||||
the information later. The answer is (C).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_public_relations
|
||||
104
evaluation/data/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
Normal file
104
evaluation/data/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
Normal file
@@ -0,0 +1,104 @@
|
||||
dataset_name: security_studies
|
||||
description: The following are multiple choice questions (with answers) about security
|
||||
studies.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'What are the frameworks of analysis within which terrorism has been considered
|
||||
(as of 2020)?
|
||||
|
||||
(A) Competition between larger nations has resulted in some countries actively
|
||||
supporting terrorist groups to undermine the strength of rival states. Terrorist
|
||||
networks are extended patronage clubs maintained and paid for by their donor
|
||||
states and are conceptualised as being like state actors, to be dealt with using
|
||||
military force. (B) Globalization has enabled the internationalization of terrorist
|
||||
activities by opening up their operational space, although coordination is still
|
||||
managed from a geographical base. This suggests that terrorist groups are nationally
|
||||
structured which means that terrorism cannot be considered in terms of a war
|
||||
to be defeated militarily without having serious implications on the indigenous
|
||||
population. (C) Terrorism can be viewed as a problem to be resolved by military
|
||||
means (war on terrorism), by normal police techniques (terrorism as crime),
|
||||
or as a medical problem with underlying causes and symptoms (terrorism as disease).
|
||||
(D) Terrorism is viewed as a criminal problem. The criminalization of terrorism
|
||||
has two important implications. Firstly, it suggests that terrorism can be eradicated
|
||||
- terrorists can be caught and brought to trial by normal judicial proceedings
|
||||
thereby removing the threat from society - and secondly, it suggests that preventative
|
||||
crime techniques are applicable to prevent its development.'
|
||||
target: "Let's think step by step. We refer to Wikipedia articles on security\
|
||||
\ studies for help. (A) is wrong because it is not competition between larger\
|
||||
\ nations that causes terrorism. \n(B) is wrong because globalization is not\
|
||||
\ the cause of terrorism.\n(C) is correct because the US undertook the war on\
|
||||
\ terrorism. \n(D) is wrong because preventative crime techniques will likely\
|
||||
\ not end terrorism. The answer is (C)."
|
||||
- question: 'Which of the following is the best lens through which to investigate the
|
||||
role of child soldiers?
|
||||
|
||||
(A) Child soldiers are victims of combat that need re-education and rehabilitation.
|
||||
(B) Children and their mothers are not active subjects in warfare and are best
|
||||
considered as subjects in the private sphere. (C) Children are most often innocent
|
||||
bystanders in war and are best used as signifiers of peace. (D) Children have
|
||||
political subjecthood that is missed when they are considered as passive victims
|
||||
of warfare.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on security studies
|
||||
for help. Child soliders as a political topic can be missed when they are considered
|
||||
passive victims of warfare. The answer is (D).
|
||||
- question: 'How can we best describe the relationship between the state-centric approach
|
||||
and the concept of human security?
|
||||
|
||||
(A) There are such wide divisions within the human security framework regarding
|
||||
the nature of threats and referent objects that no widely applicable comparisons
|
||||
between state-centric approaches and human security can be drawn. (B) By adopting
|
||||
the framework of human security, the limitations of the realist state-centric
|
||||
approach become evident. Whilst human security defines the referent object as
|
||||
the person or population, state-centric approaches prioritise the security of
|
||||
the state, de-prioritizing the pursuit of human security. (C) The state-centric
|
||||
approach to security is a faction of human security, usually defined within
|
||||
the broad school of human security. By being state-centric this approach prioritises
|
||||
the individual as the referent object in security studies. (D) Both the state-centric
|
||||
and human-centric approaches to security are mutually exclusive and offer a
|
||||
sufficient analytic framework with which to understand the international security
|
||||
system. It is therefore the role of security analysts to determine which of
|
||||
these substantial concepts is correct, and which should be discarded.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on security studies
|
||||
for help. Human security focuses on a person or population whereas state-centric
|
||||
approaches focus on the state while deprioritizing human security. The answer
|
||||
is (B).
|
||||
- question: 'In order to become securitized, a threat must be presented in which of
|
||||
these ways?
|
||||
|
||||
(A) As an existential threat that requires immediate and extraordinary action,
|
||||
posing a threat to the survival of the state or to societal security. (B) As
|
||||
requiring immediate and extraordinary action by the state, threatening the survival
|
||||
of a referent object and therefore warranting the use of measures not normally
|
||||
employed in the political realm. (C) As an urgent threat to the survival of
|
||||
the referent object, so serious that it legitimises the employment of extraordinary
|
||||
action in response. (D) As an urgent threat to the survival of the audience
|
||||
that requires extraordinary or emergency measures.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on security studies
|
||||
for help. To be securitized, a threat must be an urgent threat to the survival
|
||||
of the referent object. The answer is (C).
|
||||
- question: 'What distinguishes coercive diplomacy from military force?
|
||||
|
||||
(A) Compellence is another term for coercive diplomacy, but covering a narrower
|
||||
set of criteria; compellence covers those threats aimed at initiating adversary
|
||||
action. A threat to coerce a state to give up part of its territory would count
|
||||
as coercive diplomacy, as long as that threat proactively initiates action before
|
||||
reactive diplomacy is taken. (B) Coercive diplomacy constitutes the threats
|
||||
of limited force to induce adversary''s incentive to comply with the coercer''s
|
||||
demands. It is an influence strategy that is intended to obtain compliance:
|
||||
the use of force to defeat an opponent first does not count. It leaves an element
|
||||
of choice with the target to comply, or to continue. (C) Military force, or
|
||||
the threat of military force, utilises fear to achieve strategic objectives.
|
||||
Coercive diplomacy is differentiated from this approach, because it does not
|
||||
use fear as a tool for coercing an adversary. (D) Coercive diplomacy is employed
|
||||
to use force but to limit its effects on the international community. Coercive
|
||||
diplomacy is an aggressive strategy that is intended to obtain compliance through
|
||||
defeat. It does not leave an element of choice with the target, the target either
|
||||
being forced to comply or engage in conflict. It seeks to control by imposing
|
||||
compliance by removing any opportunity for negotiation or concession.'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on security
|
||||
studies for help. Coercive diplomacy uses the threat of force to induce the
|
||||
opponent to comply with demands. The answer is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_security_studies
|
||||
58
evaluation/data/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
Normal file
58
evaluation/data/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
Normal file
@@ -0,0 +1,58 @@
|
||||
dataset_name: sociology
|
||||
description: The following are multiple choice questions (with answers) about sociology.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'Which of the following is not a problem associated with official statistics
|
||||
on strike action?
|
||||
|
||||
(A) most strikes go unnoticed by employers and the mass media (B) not all industrial
|
||||
disputes will be reported by the employer (C) the definition of strikes excludes
|
||||
those that involve fewer than ten workers or last less than one day (D) it is
|
||||
hard to compare strikes that were measured in different ways'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on sociology
|
||||
for help. Official statistics on strike action can be problematic because not
|
||||
all industrial disputes will be reported by employers, the definition of strikes
|
||||
excludes those that involves fewer than ten workers or last less than one day,
|
||||
and it is hard to compare strikes that were measured in different ways. Thus,
|
||||
(A) is not a problem associated with official statistics on strike action. The
|
||||
answer is (A).
|
||||
- question: 'What does Berger (1963) describe as a metaphor for social reality?
|
||||
|
||||
(A) a fairground ride (B) a circus (C) a puppet theatre (D) a ballet'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on sociology
|
||||
for help. Berger describes social reality using the metaphor of a puppet theatre.
|
||||
The answer is (C).
|
||||
- question: 'The term ''hegemony'' refers to:
|
||||
|
||||
(A) the tendency for the working class not to realize their own interests (B)
|
||||
a dominant ideology that legitimates economic, political and cultural power
|
||||
(C) a form of dual consciousness based on ideology and everyday experiences
|
||||
(D) a mode of payment given for outstanding topiary'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on sociology
|
||||
for help. Hegemony refers to a dominant ideology that legitimates economic,
|
||||
policital, and cultural power. The answer is (B).
|
||||
- question: 'The shift from ''civil religion'' to ''common religion'' means that:
|
||||
|
||||
(A) the increasing bureaucracy of the state has made religion only a marginal
|
||||
part of our lives (B) despite the weakening of traditional authority, our everyday
|
||||
lives and ''common sense'' remain shaped by religious beliefs and values (C)
|
||||
religious participation in collective worship may have declined, but people
|
||||
still practise their faiths in private (D) people are much more likely to discuss
|
||||
their religious beliefs in public, informal settings'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on sociology
|
||||
for help. The shift from civil religion to common religion means that despite
|
||||
the weakening of traditional authority, our everyday lives and common sense
|
||||
remain shaped by religious beliefs and values. The answer is (B).
|
||||
- question: 'Which of the following did the post-war welfare state of 1948 not aim
|
||||
to provide:
|
||||
|
||||
(A) free health care and education for all (B) a minimum wage (C) full employment
|
||||
(D) universal welfare'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on sociology
|
||||
for help. The post-war welfare state of 1948 aimed to provide free healthcare
|
||||
and education, full employment, and universal welfare. But it did not aim to
|
||||
provide a minimum wage. The answer is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_sociology
|
||||
@@ -0,0 +1,56 @@
|
||||
dataset_name: us_foreign_policy
|
||||
description: The following are multiple choice questions (with answers) about us foreign
|
||||
policy.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'How did Donald Trump attack globalization in the 2016 campaign?
|
||||
|
||||
(A) Globalization had made men like him too rich (B) Globalization only benefited
|
||||
certain American states, such as New York (C) Liberal elites had encouraged
|
||||
globalization, while ''ordinary Americans'' lost jobs because of it (D) Globalization
|
||||
encouraged damaging trade wars'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on us foreign
|
||||
policy for help. Trump attacked globalization because he believed ordinary Americans
|
||||
lost jobs due to it, and so he wanted to blame liberals who had encouraged it.
|
||||
The answer is (C).
|
||||
- question: 'How did NSC-68 change U.S. strategy?
|
||||
|
||||
(A) It globalized containment. (B) It militarized containment. (C) It called
|
||||
for the development of the hydrogen bomb. (D) All of the above'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on us foreign
|
||||
policy for help. NSC-68 outlined a variety of courses of action, including globalization
|
||||
of containment, militarization of contaiment, and the development of the hydrogen
|
||||
bomb. The answer is (D).
|
||||
- question: 'How do Defensive Realism and Offensive Realism differ in their explanation
|
||||
of state behaviour?
|
||||
|
||||
(A) Defensive realists place greater emphasis on the role of international institutions
|
||||
(B) Defensive realists place less emphasis on geographical factors (C) Offensive
|
||||
realists give more priority to the national interest than Defensive realists.
|
||||
(D) Defensive realists believe states are security maximizers, while Offensive
|
||||
realists believe states to be power maximizers'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on us foreign
|
||||
policy for help. While defensive realism advocates that states are security
|
||||
maximizers, offensive realists think of states as power maximizers. The answer
|
||||
is (D).
|
||||
- question: 'The realm of policy decisions concerned primarily with relations between
|
||||
the United States and the rest of the world is known as
|
||||
|
||||
(A) terrorism policy. (B) economic policy. (C) foreign policy. (D) international
|
||||
policy.'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on us foreign
|
||||
policy for help. The topic of policy decisions concerns with relations between
|
||||
the US and the rest of the world is known as foreign policy. The answer is (C).
|
||||
- question: 'How did the 2008 financial crisis affect America''s international reputation?
|
||||
|
||||
(A) It damaged support for the US model of political economy and capitalism
|
||||
(B) It created anger at the United States for exaggerating the crisis (C) It
|
||||
increased support for American global leadership under President Obama (D) It
|
||||
reduced global use of the US dollar'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on us foreign
|
||||
policy for help. The 2008 financial crisis damanged the international reputation
|
||||
of the American model of political economy and capitalism. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_social_sciences
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_us_foreign_policy
|
||||
45
evaluation/data/mmlu/flan_cot_fewshot/mmlu_virology.yaml
Normal file
45
evaluation/data/mmlu/flan_cot_fewshot/mmlu_virology.yaml
Normal file
@@ -0,0 +1,45 @@
|
||||
dataset_name: virology
|
||||
description: The following are multiple choice questions (with answers) about virology.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'The median survival time to AIDS and death was established by following:
|
||||
|
||||
(A) Seroprevalent HIV-infected individuals (B) Seronegatives (C) Seroconverters
|
||||
(D) High-risk seronegatives'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on virology for
|
||||
help. The median survival time to AIDS and death was established as a result
|
||||
of the development of seroconverters. The answer is (C).
|
||||
- question: 'Which of the following is a morphological characteristic of the paramyxoviruses.
|
||||
|
||||
(A) Fragile viruses often visualised with RNA spewing from the inside (B) Elongate
|
||||
viruses (C) Icosahedral viruses with envelope (D) Very large viruses'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on virology for
|
||||
help. Paramyxoviruses are fragile viruses often visualised with RNA spewing
|
||||
from the inside. The answer is (A).
|
||||
- question: 'The most important goal of a behavioral intervention is:
|
||||
|
||||
(A) Change in behavior (B) Comprehensive coverage (C) Effective use of behavioral
|
||||
theory (D) Sustained behavior change'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on virology for
|
||||
help. The prim goal of a behavioral intervention is to cause sustained behavior
|
||||
change. The answer is (D).
|
||||
- question: 'A key factor facilitating the application of nested case-control studies
|
||||
from the MACS was:
|
||||
|
||||
(A) Data collection (B) Establishment of a repository of biologic specimens
|
||||
(C) Participant interest (D) Administration of the questionnaire by staff'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on virology for
|
||||
help. The Multicenter AIDS Cohort Study's use of nested case-control studies
|
||||
was facilitated by the establishment of a repository of biologic specimens.
|
||||
The answer is (B).
|
||||
- question: 'Why are parvoviruses a highly impactful parasite?
|
||||
|
||||
(A) Because they have no nucleic acid (B) They require a helper virus (C) Only
|
||||
replicate in dividing cells (D) Can integrate into host chromosomes'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on virology
|
||||
for help. Paroviruses are highly impactful because they do not have nucleic
|
||||
acid. The answer is (A).'
|
||||
tag: mmlu_flan_cot_fewshot_other
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_virology
|
||||
@@ -0,0 +1,42 @@
|
||||
dataset_name: world_religions
|
||||
description: The following are multiple choice questions (with answers) about world
|
||||
religions.
|
||||
fewshot_config:
|
||||
sampler: first_n
|
||||
samples:
|
||||
- question: 'How can the Upanishads be characterized?
|
||||
|
||||
(A) Ritual texts (B) Philosophical texts (C) Hymns (D) Origin stories'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on world religions
|
||||
for help. The Upanishads are the most recent part of Vedas (the oldest scriptures
|
||||
in Hinduism) and supplied the basis of later Hindu philosophy. So they are philosophical
|
||||
texts. The answer is (B).
|
||||
- question: 'What is the Second Gem in Buddhism?
|
||||
|
||||
(A) The Dharma (B) The Sangha (C) The Buddha (D) The Bodhisattva'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on world religions
|
||||
for help. The Second Gem in Buddhism is The Dharma. The answer is (A).
|
||||
- question: 'Which Japanese government promoted a kind of national cult based on the
|
||||
emperor and his associations with kami?
|
||||
|
||||
(A) Honen (B) Tanaka (C) Tokugawa (D) Meiji'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on world religions
|
||||
for help. The promotion of a national cult based on the emperor and his associations
|
||||
with Kami happened during the reign of Emperor Meiji (1852-1912). The answer
|
||||
is (D).
|
||||
- question: 'In which dynasty was the "Mandate of Heaven" developed to legitimatize
|
||||
the new rulers?
|
||||
|
||||
(A) Shang (B) Zhou (C) Han (D) Xia'
|
||||
target: Let's think step by step. We refer to Wikipedia articles on world religions
|
||||
for help. The "Mandate of Heaven" was developed as an ancient Chinese philosophical
|
||||
concept during the Zhou Dynasty (1046-256 BCE). The answer is (B).
|
||||
- question: 'What is the sign of the covenant for Jewish males?
|
||||
|
||||
(A) The rainbow (B) Circumcision (C) A son (D) Bar mitzvah'
|
||||
target: 'Let''s think step by step. We refer to Wikipedia articles on world religions
|
||||
for help. In Judaism, the most distinctive sign of the covenant is circumcision
|
||||
(brit milah). The answer is (B).'
|
||||
tag: mmlu_flan_cot_fewshot_humanities
|
||||
include: _mmlu_flan_cot_fewshot_template_yaml
|
||||
task: mmlu_flan_cot_fewshot_world_religions
|
||||
Reference in New Issue
Block a user