98 lines
12 KiB
Markdown
98 lines
12 KiB
Markdown
|
|
---
|
||
|
|
license: llama3.2
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
base_model: allura-forge/Llama-3.3-8B-Instruct
|
||
|
|
datasets:
|
||
|
|
- m-a-p/SuperGPQA
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
library_name: transformers
|
||
|
|
tags:
|
||
|
|
- sft
|
||
|
|
- trl
|
||
|
|
- unsloth
|
||
|
|
- llama
|
||
|
|
- llama3
|
||
|
|
- llama3.3
|
||
|
|
---
|
||
|
|

|
||
|
|
A fine-tune of [allura-forge/Llama-3.3-8B-Instruct](https://huggingface.co/allura-forge/Llama-3.3-8B-Instruct) on the [m-a-p/SuperGPQA](https://huggingface.co/datasets/m-a-p/SuperGPQA) dataset.
|
||
|
|
|
||
|
|
## Usage example
|
||
|
|
Set temperature as 0.0 for best results.
|
||
|
|
|
||
|
|
**System prompt**
|
||
|
|
```
|
||
|
|
You are a classifier. Categorize the following problem into discipline, field, and subfield in JSON format.
|
||
|
|
```
|
||
|
|
**User prompt**
|
||
|
|
```
|
||
|
|
Cotton and linen both readily catch fire. A batch of towels is composed of both cotton and linen, and is known to have caught fire. If it is known that the towels were ignited by a lit cigarette, which of the following arguments utilizes the most appropriate form of reasoning?
|
||
|
|
```
|
||
|
|
**Assistant response**
|
||
|
|
```
|
||
|
|
{"discipline": "Philosophy", "field": "Philosophy", "subfield": "Logic"}
|
||
|
|
```
|
||
|
|
# Possible output options
|
||
|
|
Discipline
|
||
|
|
```
|
||
|
|
['Medicine', 'Literature and Arts', 'History', 'Science', 'Philosophy', 'Law', 'Engineering', 'Management', 'Agronomy', 'Economics', 'Military Science', 'Sociology', 'Education']
|
||
|
|
```
|
||
|
|
Field
|
||
|
|
```
|
||
|
|
['Animal Husbandry', 'Political Science', 'Civil Engineering', 'Materials Science and Engineering', 'Weapon Science and Technology', 'History', 'Stomatology', 'Agricultural Engineering', 'Mechanical Engineering', 'Astronomy', 'Nuclear Science and Technology', 'Language and Literature', 'Forestry Engineering', 'Geology', 'Basic Medicine', 'Crop Science', 'Electronic Science and Technology', 'Military Science', 'Petroleum and Natural Gas Engineering', 'Metallurgical Engineering', 'Management Science and Engineering', 'Library, Information and Archival Management', 'Clinical Medicine', 'Art Studies', 'Food Science and Engineering', 'Systems Science', 'Aquaculture', 'Business Administration', 'Computer Science and Technology', 'Electrical Engineering', 'Forestry', 'Textile Science and Engineering', 'Physical Education', 'Oceanography', 'Musicology', 'Traditional Chinese Medicine', 'Mining Engineering', 'Psychology', 'Law', 'Control Science and Engineering', 'Chemistry', 'Hydraulic Engineering', 'Public Administration', 'Chemical Engineering and Technology', 'Geography', 'Optical Engineering', 'Applied Economics', 'Architecture', 'Power Engineering and Engineering Thermophysics', 'Education', 'Journalism and Communication', 'Aeronautical and Astronautical Science and Technology', 'Veterinary Medicine', 'Geophysics', 'Instrument Science and Technology', 'Mathematics', 'Information and Communication Engineering', 'Physical Oceanography', 'Theoretical Economics', 'Mechanics', 'Philosophy', 'Geological Resources and Geological Engineering', 'Physics', 'Pharmacy', 'Environmental Science and Engineering', 'Transportation Engineering', 'Biology', 'Naval Architecture and Ocean Engineering', 'Atmospheric Science', 'Sociology', 'Public Health and Preventive Medicine', 'Surveying and Mapping Science and Technology']
|
||
|
|
```
|
||
|
|
Subfield
|
||
|
|
```
|
||
|
|
['Political Science', 'Social Medicine and Health Management', 'Preschool Education', 'Geriatric Medicine', 'Civil and Commercial Law', 'Biophysics', 'Rigid Body Mechanics', 'Cartography and Geographic Information Engineering', 'Anesthesiology', 'Stellar and Interstellar Evolution', 'Chemical Transport Engineering', 'Structural Geology', 'Contract Law', 'Obstetrics and Gynecology', 'Pathology and Pathophysiology', 'Harmony', 'Aquaculture', 'Pharmaceutics', 'Vehicle Operation Engineering', 'Circuits and Systems', 'Solid State Physics', 'Theoretical Fluid Mechanics', 'Mineral Processing Engineering', 'Functions of Real Variables', 'Signal and Information Processing', 'Pathogen Biology', 'Computer Networks', 'Optical Fiber Communication', 'Genetics', 'Architectural History', 'Oil and Gas Field Development and Storage & Transportation Engineering', 'Tourism Management and Technological Economics Management', 'Drama and Opera Studies', 'Polynomials and Series Expansions', 'Cryptography', 'Polymer Chemistry and Physics', 'Principles of Seismic Exploration', 'Fuzzy Mathematics', 'Physiology', 'Pitch and Scales', 'Heat Transfer', 'Operating Systems', 'Fluid Physics', 'Microelectronics and Solid-State Electronics', 'Non-ferrous Metallurgy', 'Environmental Science', 'Power Electronics and Electrical Drives', 'Communication and Information Systems', 'Oncology', 'Military Thought and History', 'Procedural Law', 'Group Theory', 'Fine Arts', 'Transportation Planning and Management', 'Physical Chemistry', 'Physical Oceanography', 'Sports Science and Medicine', 'Animal Nutrition and Feed Science', 'Urban Planning and Design', 'Space physics', 'Electrical Theory and New Technologies', 'Economic History', 'Geotechnical Engineering', 'Ecology', 'Theory of Curriculum and Instruction', 'Radiation Medicine', 'Information Management Science', 'Functions of Complex Variables', 'Computer Software and Theory', 'Nursing and Rehabilitation Medicine', 'Wood Science and Technology', 'Mass Transport and Separation Process in Chemical Engineering', 'Religious Studies', 'Mineralogy, Petrology, and Economic Geology', 'Thermodynamics and Statistical Physics', 'Structural Engineering', 'Demography and Anthropology', 'Philology and Bibliography', 'Databases', 'Textile Materials Science', 'Textile Chemistry and Dyeing Engineering', 'Physical Chemistry of Metallurgical Process', 'Ethics', 'Internal Combustion Engineering', 'Design Arts', 'Refrigeration and Cryogenic Engineering', 'Mechatronic Engineering', 'Dermatology and Venereology', 'Economic Statistics', 'Applied Optics', 'Systems Science', 'Particle and Nuclear Physics', 'Information Management and Communication', 'French Language and Literature', 'Labor Economics', 'Medicinal Chemistry', 'Literary Theory', 'Microbiology', 'Physical Education and Training', 'Internal Medicine', 'Computer Architecture', 'Operations Research and Cybernetics', 'Dynamic Meteorology', 'Industrial Economics', 'Literary History', 'Marine Engineering', 'Optoelectronic Technology', 'Combinatorial Mathematics', 'Theoretical Optics', 'Materials Processing Engineering', 'Nutrition and Food Hygiene', 'Theoretical Mechanics', 'Graph Theory', 'Quantum Mechanics', 'Materials Physics and Chemistry', 'Marine Biology', 'Forest Cultivation and Genetic Breeding', 'National and Defense Economics', 'Poromechanics and Reservoir Physics', 'Road and Railway Engineering', 'Aeronautical and Astronautical Science and Technology', 'Data Structures', 'Historical Geography', 'Analytical Chemistry', 'Military Law', 'Pharmaceutical Analysis', 'Polymer Physics', 'Atmospheric Physics and Atmospheric Environment', 'Communication Principles', 'Underwater Acoustics', 'Journalism and News Practice', 'Water conservancy and Hydropower Engineering', 'Inorganic Chemistry', 'Animal Rearing and Breeding', 'Educational Technology and Principles', 'High Voltage and Insulation Technology', 'Advanced Algebra', 'Food Biochemistry', 'Philosophy of Science and Technology', 'Logic', 'Film Studies', 'Military Command and Information Systems', 'Fundamentals of Dynamics and
|
||
|
|
```
|
||
|
|
## Model Details
|
||
|
|
- Base Model: `allura-forge/Llama-3.3-8B-Instruct`
|
||
|
|
- Parameter Count: 8,030,261,248
|
||
|
|
- Precision: torch.bfloat16
|
||
|
|
|
||
|
|
## Hardware
|
||
|
|
- GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition
|
||
|
|
- Announced: Mar 17th, 2025
|
||
|
|
- Release Date: Mar 18th, 2025
|
||
|
|
- Memory Type: GDDR7
|
||
|
|
- Bandwidth: 1.79 TB/s
|
||
|
|
- Memory Size: 96 GB
|
||
|
|
- Memory Bus: 512 bit
|
||
|
|
- Shading Units: 24064
|
||
|
|
- TDP: 600W
|
||
|
|
|
||
|
|
## Training Settings
|
||
|
|
### PEFT
|
||
|
|
- Rank: 32
|
||
|
|
- LoRA alpha: 64
|
||
|
|
- Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
||
|
|
- Gradient checkpointing: unsloth
|
||
|
|
|
||
|
|
### SFT
|
||
|
|
- Epoch: 2
|
||
|
|
- Batch size: 32
|
||
|
|
- Gradient Accumulation steps: 1
|
||
|
|
- Warmup ratio: 0.05
|
||
|
|
- Learning rate: 0.0002
|
||
|
|
- Optimizer: adamw_torch_fused
|
||
|
|
- Learning rate scheduler: cosine
|
||
|
|
|
||
|
|
## Training stats
|
||
|
|
- Date: 2026-03-26T03:53:29.234881
|
||
|
|
- Peak VRAM usage: 32.135 GB
|
||
|
|
- Global step: 1576
|
||
|
|
- Training runtime (seconds): 2681.8444
|
||
|
|
- Average training loss: 0.06838441643920647
|
||
|
|
- Final validation loss: 0.0504293330013752
|
||
|
|
|
||
|
|
## Framework versions
|
||
|
|
- Unsloth: 2026.3.15
|
||
|
|
- TRL: 0.22.2
|
||
|
|
- Transformers: 4.56.2
|
||
|
|
- Pytorch: 2.10.0+cu128
|
||
|
|
- Datasets: 4.8.4
|
||
|
|
- Tokenizers: 0.22.2
|
||
|
|
|
||
|
|
## License
|
||
|
|
This model is released under the Llama3 license. See the [Terms of Use](https://www.llama.com/llama3/license/) for details.
|