nvidia/Nemotron-Terminal-Corpus (366K examples, all 4 subsets)
Epochs
2
Training steps
5,721
Batch size
128
Sequence length
32,768
Learning rate
2e-5 (cosine, 10% warmup)
Optimizer
AdamW (β=0.9/0.95), grad_clip=1.0, wd=1e-4
TPU
v5p-64
Final loss
0.442
Evaluation Results
Terminal-Bench 2.0
Model
TB2 Accuracy
Marin-8B Instruct (no SFT)
0/89 = 0%
Marin-8B Instruct + TerminalCorpus SFT
1/89 = 1.1%
NemotronTerminal-8B (Qwen3-8B, paper)
13.0% ± 2.2
Marin Qwen3-8B SFT reproduction (exp3490b)
14/88 = 15.9%
TBLite Progression
Checkpoint
TBLite
Step 1500 (26%)
1/100 = 1%
Step 3000 (52%)
5/100 = 5%
Training Details
Trained following the NemotronTerminal-8B paper hyperparameters. The model reaches a higher final loss (0.442 vs 0.360) as the Qwen3-8B reproduction but scores significantly lower on terminal benchmarks, likely due to architecture and tokenizer differences between Llama 3 and Qwen3.