A 60M parameter language model trained on 22 * 60M tokens from FineWeb-Edu dataset.
Model Details
aixsim-60M is a transformer-based language model with approximately 60 million parameters (embedding layer params excluded).
It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.