MaterialsAnalyst-AI-7B/Training/Training_Documentation.txt

MaterialsAnalyst-AI-7B Training Documentation
================================================

Model Training Details
---------------------

Base Model:               Qwen 2.5 Instruct 7B
Fine-tuning Method:       LoRA (Low-Rank Adaptation)
Training Infrastructure:  Single NVIDIA A100 SXM4 GPU
Training Duration:        Approximately 5.4 hours
Training Dataset:         Custom curated dataset for materials analysis

Dataset Specifications
---------------------

Total Token Count:        6,292,692
Total Sample Count:       6,000
Average Tokens/Sample:    1048.78
Max Token Count:          1,289
Min Token Count:          922
Tokens Counted Using:     tiktoken (cl100k_base encoding)
Dataset Creation:         Generated using DeepSeekV3 API

Training Configuration
---------------------

LoRA Parameters:
- Rank:                   32
- Alpha:                  64
- Dropout:                0.1
- Target Modules:         q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head

Training Hyperparameters:
- Learning Rate:          5e-5
- Batch Size:             4
- Gradient Accumulation:  5
- Effective Batch Size:   20
- Max Sequence Length:    2048
- Epochs:                 3
- Warmup Ratio:           0.01
- Weight Decay:           0.01
- Max Grad Norm:          1.0
- LR Scheduler:           Cosine

Hardware & Environment
---------------------

GPU:                      NVIDIA A100 SXM4 (40GB)
Operating System:         Ubuntu
CUDA Version:             11.8
PyTorch Version:          2.7.0
Compute Capability:       8.0
Optimization:             FP16, Gradient Checkpointing

Training Performance
---------------------

Training Runtime:         5.37 hours (19,348 seconds)
Train Samples/Second:     0.884
Train Steps/Second:       0.044
Training Loss (Final):    0.170
Validation Loss (Final):  0.136
Total Training Steps:     855
初始化项目，由ModelHub XC社区提供模型 Model: Raymond-dev-546730/MaterialsAnalyst-AI-7B Source: Original Platform 2026-06-21 10:56:56 +08:00			`MaterialsAnalyst-AI-7B Training Documentation`
			`================================================`

			`Model Training Details`
			`---------------------`

			`Base Model: Qwen 2.5 Instruct 7B`
			`Fine-tuning Method: LoRA (Low-Rank Adaptation)`
			`Training Infrastructure: Single NVIDIA A100 SXM4 GPU`
			`Training Duration: Approximately 5.4 hours`
			`Training Dataset: Custom curated dataset for materials analysis`

			`Dataset Specifications`
			`---------------------`

			`Total Token Count: 6,292,692`
			`Total Sample Count: 6,000`
			`Average Tokens/Sample: 1048.78`
			`Max Token Count: 1,289`
			`Min Token Count: 922`
			`Tokens Counted Using: tiktoken (cl100k_base encoding)`
			`Dataset Creation: Generated using DeepSeekV3 API`

			`Training Configuration`
			`---------------------`

			`LoRA Parameters:`
			`- Rank: 32`
			`- Alpha: 64`
			`- Dropout: 0.1`
			`- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head`

			`Training Hyperparameters:`
			`- Learning Rate: 5e-5`
			`- Batch Size: 4`
			`- Gradient Accumulation: 5`
			`- Effective Batch Size: 20`
			`- Max Sequence Length: 2048`
			`- Epochs: 3`
			`- Warmup Ratio: 0.01`
			`- Weight Decay: 0.01`
			`- Max Grad Norm: 1.0`
			`- LR Scheduler: Cosine`

			`Hardware & Environment`
			`---------------------`

			`GPU: NVIDIA A100 SXM4 (40GB)`
			`Operating System: Ubuntu`
			`CUDA Version: 11.8`
			`PyTorch Version: 2.7.0`
			`Compute Capability: 8.0`
			`Optimization: FP16, Gradient Checkpointing`

			`Training Performance`
			`---------------------`

			`Training Runtime: 5.37 hours (19,348 seconds)`
			`Train Samples/Second: 0.884`
			`Train Steps/Second: 0.044`
			`Training Loss (Final): 0.170`
			`Validation Loss (Final): 0.136`
			`Total Training Steps: 855`