63 lines
1.8 KiB
Plaintext
63 lines
1.8 KiB
Plaintext
|
|
MaterialsAnalyst-AI-7B Training Documentation
|
||
|
|
================================================
|
||
|
|
|
||
|
|
Model Training Details
|
||
|
|
---------------------
|
||
|
|
|
||
|
|
Base Model: Qwen 2.5 Instruct 7B
|
||
|
|
Fine-tuning Method: LoRA (Low-Rank Adaptation)
|
||
|
|
Training Infrastructure: Single NVIDIA A100 SXM4 GPU
|
||
|
|
Training Duration: Approximately 5.4 hours
|
||
|
|
Training Dataset: Custom curated dataset for materials analysis
|
||
|
|
|
||
|
|
Dataset Specifications
|
||
|
|
---------------------
|
||
|
|
|
||
|
|
Total Token Count: 6,292,692
|
||
|
|
Total Sample Count: 6,000
|
||
|
|
Average Tokens/Sample: 1048.78
|
||
|
|
Max Token Count: 1,289
|
||
|
|
Min Token Count: 922
|
||
|
|
Tokens Counted Using: tiktoken (cl100k_base encoding)
|
||
|
|
Dataset Creation: Generated using DeepSeekV3 API
|
||
|
|
|
||
|
|
Training Configuration
|
||
|
|
---------------------
|
||
|
|
|
||
|
|
LoRA Parameters:
|
||
|
|
- Rank: 32
|
||
|
|
- Alpha: 64
|
||
|
|
- Dropout: 0.1
|
||
|
|
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
|
||
|
|
|
||
|
|
Training Hyperparameters:
|
||
|
|
- Learning Rate: 5e-5
|
||
|
|
- Batch Size: 4
|
||
|
|
- Gradient Accumulation: 5
|
||
|
|
- Effective Batch Size: 20
|
||
|
|
- Max Sequence Length: 2048
|
||
|
|
- Epochs: 3
|
||
|
|
- Warmup Ratio: 0.01
|
||
|
|
- Weight Decay: 0.01
|
||
|
|
- Max Grad Norm: 1.0
|
||
|
|
- LR Scheduler: Cosine
|
||
|
|
|
||
|
|
Hardware & Environment
|
||
|
|
---------------------
|
||
|
|
|
||
|
|
GPU: NVIDIA A100 SXM4 (40GB)
|
||
|
|
Operating System: Ubuntu
|
||
|
|
CUDA Version: 11.8
|
||
|
|
PyTorch Version: 2.7.0
|
||
|
|
Compute Capability: 8.0
|
||
|
|
Optimization: FP16, Gradient Checkpointing
|
||
|
|
|
||
|
|
Training Performance
|
||
|
|
---------------------
|
||
|
|
|
||
|
|
Training Runtime: 5.37 hours (19,348 seconds)
|
||
|
|
Train Samples/Second: 0.884
|
||
|
|
Train Steps/Second: 0.044
|
||
|
|
Training Loss (Final): 0.170
|
||
|
|
Validation Loss (Final): 0.136
|
||
|
|
Total Training Steps: 855
|