Files

ModelHub XC a469c212b3 初始化项目，由ModelHub XC社区提供模型

Model: PrimeIntellect/INTELLECT-1-step-15500
Source: Original Platform

2026-05-05 19:38:41 +08:00

3.0 KiB

Raw Blame History

license, datasets, language, pipeline_tag

license

datasets

language

pipeline_tag

apache-2.0

PrimeIntellect/fineweb-edu

PrimeIntellect/fineweb

PrimeIntellect/StackV1-popular

mlfoundations/dclm-baseline-1.0-parquet

open-web-math/open-web-math

text-generation

INTELLECT-1-step-17000

This is an intermediate checkpoint of INTELLECT-1. You can find the final version as well as the instruct one

	Step	Model URL
->	17000	https://huggingface.co/PrimeIntellect/INTELLECT-1-step-17000
	28600	https://huggingface.co/PrimeIntellect/INTELLECT-1-step-28600
	39200	https://huggingface.co/PrimeIntellect/INTELLECT-1-step-39200
	49200	https://huggingface.co/PrimeIntellect/INTELLECT-1-step-49200
	59200	https://huggingface.co/PrimeIntellect/INTELLECT-1-step-59200
	69200	https://huggingface.co/PrimeIntellect/INTELLECT-1-step-69200
	78000	https://huggingface.co/PrimeIntellect/INTELLECT-1-step-78000
	88000	https://huggingface.co/PrimeIntellect/INTELLECT-1-step-88000

Model Overview

INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.

INTELLECT-1 was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute. The training code utilizes the prime framework, a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers. The key abstraction that allows dynamic scaling is the ElasticDeviceMesh which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead.

For more detailed technical insights, please refer to our technical paper.

Model Details

Model Contributors: samsja, Prime Intellect, Arcee AI, kotaro, skre_0, marlo, rodeo, Herb, Olas, superchillen, Hugging Face, mev_pete, 0xfr_, dj, primeprimeint1234, Marco Giglio, realtek, Hyperbolic, hecataeus, NWO, Virtual Machine, droll, SemiAnalysis, waiting_, toptickcrypto, sto, Johannes, washout_segment_0b, klee
Release Date: 29 Nov 2024
Model License: Apache 2.0

Technical Specifications

Parameter	Value
Parameter Size	10B
Number of Layers	42
Number of Attention Heads	32
Hidden Size	4096
Context Length	8192
Vocabulary Size	128256

Citations

If you use this model in your research, please cite it as follows:

@article{}