MaterialsAnalyst-AI-7B Training Documentation ================================================ Model Training Details --------------------- Base Model: Qwen 2.5 Instruct 7B Fine-tuning Method: LoRA (Low-Rank Adaptation) Training Infrastructure: Single NVIDIA A100 SXM4 GPU Training Duration: Approximately 5.4 hours Training Dataset: Custom curated dataset for materials analysis Dataset Specifications --------------------- Total Token Count: 6,292,692 Total Sample Count: 6,000 Average Tokens/Sample: 1048.78 Max Token Count: 1,289 Min Token Count: 922 Tokens Counted Using: tiktoken (cl100k_base encoding) Dataset Creation: Generated using DeepSeekV3 API Training Configuration --------------------- LoRA Parameters: - Rank: 32 - Alpha: 64 - Dropout: 0.1 - Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head Training Hyperparameters: - Learning Rate: 5e-5 - Batch Size: 4 - Gradient Accumulation: 5 - Effective Batch Size: 20 - Max Sequence Length: 2048 - Epochs: 3 - Warmup Ratio: 0.01 - Weight Decay: 0.01 - Max Grad Norm: 1.0 - LR Scheduler: Cosine Hardware & Environment --------------------- GPU: NVIDIA A100 SXM4 (40GB) Operating System: Ubuntu CUDA Version: 11.8 PyTorch Version: 2.7.0 Compute Capability: 8.0 Optimization: FP16, Gradient Checkpointing Training Performance --------------------- Training Runtime: 5.37 hours (19,348 seconds) Train Samples/Second: 0.884 Train Steps/Second: 0.044 Training Loss (Final): 0.170 Validation Loss (Final): 0.136 Total Training Steps: 855