A 200M parameter language model trained on 22 * 200M tokens from FineWeb-Edu dataset.
Model Details
aixsim-200M is a transformer-based language model with approximately 200 million parameters (embedding layer params excluded).
It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.