74 lines
2.2 KiB
Markdown
74 lines
2.2 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
base_model: marin-community/marin-8b-instruct
|
||
|
|
datasets:
|
||
|
|
- nvidia/Nemotron-Terminal-Corpus
|
||
|
|
tags:
|
||
|
|
- terminal
|
||
|
|
- agentic
|
||
|
|
- sft
|
||
|
|
- llama
|
||
|
|
- marin
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
---
|
||
|
|
|
||
|
|
# Marin-8B-Instruct SFT on TerminalCorpus
|
||
|
|
|
||
|
|
Marin-8B Instruct fine-tuned on [nvidia/Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus) (366K terminal agent trajectories).
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
| Parameter | Value |
|
||
|
|
|---|---|
|
||
|
|
| Base model | [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct) |
|
||
|
|
| Architecture | Llama 3 8B (32 layers, 4096 hidden, 32 heads, 8 KV heads) |
|
||
|
|
| Tokenizer | [marin-community/marin-tokenizer](https://huggingface.co/marin-community/marin-tokenizer) |
|
||
|
|
| Training data | nvidia/Nemotron-Terminal-Corpus (366K examples, all 4 subsets) |
|
||
|
|
| Epochs | 2 |
|
||
|
|
| Training steps | 5,721 |
|
||
|
|
| Batch size | 128 |
|
||
|
|
| Sequence length | 32,768 |
|
||
|
|
| Learning rate | 2e-5 (cosine, 10% warmup) |
|
||
|
|
| Optimizer | AdamW (β=0.9/0.95), grad_clip=1.0, wd=1e-4 |
|
||
|
|
| TPU | v5p-64 |
|
||
|
|
| Final loss | 0.442 |
|
||
|
|
|
||
|
|
## Evaluation Results
|
||
|
|
|
||
|
|
### Terminal-Bench 2.0
|
||
|
|
|
||
|
|
| Model | TB2 Accuracy |
|
||
|
|
|---|---|
|
||
|
|
| Marin-8B Instruct (no SFT) | 0/89 = 0% |
|
||
|
|
| **Marin-8B Instruct + TerminalCorpus SFT** | **1/89 = 1.1%** |
|
||
|
|
| NemotronTerminal-8B (Qwen3-8B, paper) | 13.0% ± 2.2 |
|
||
|
|
| Marin Qwen3-8B SFT reproduction (exp3490b) | 14/88 = 15.9% |
|
||
|
|
|
||
|
|
### TBLite Progression
|
||
|
|
|
||
|
|
| Checkpoint | TBLite |
|
||
|
|
|---|---|
|
||
|
|
| Step 1500 (26%) | 1/100 = 1% |
|
||
|
|
| Step 3000 (52%) | 5/100 = 5% |
|
||
|
|
|
||
|
|
## Training Details
|
||
|
|
|
||
|
|
Trained following the [NemotronTerminal-8B](https://arxiv.org/abs/2602.21193) paper hyperparameters. The model reaches a higher final loss (0.442 vs 0.360) as the Qwen3-8B reproduction but scores significantly lower on terminal benchmarks, likely due to architecture and tokenizer differences between Llama 3 and Qwen3.
|
||
|
|
|
||
|
|
Tracked in: [marin-community/marin#4420](https://github.com/marin-community/marin/issues/4420)
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
|
|
|
||
|
|
model = AutoModelForCausalLM.from_pretrained("AlienKevin/marin-8b-instruct-sft-terminalcorpus")
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained("AlienKevin/marin-8b-instruct-sft-terminalcorpus")
|
||
|
|
```
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
Apache 2.0
|