Files
ModelHub XC 48896e538a 初始化项目,由ModelHub XC社区提供模型
Model: shazzadulimun/llama31-8b-aurora-chat-v3-gguf
Source: Original Platform
2026-06-09 02:42:16 +08:00

133 lines
5.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
library_name: gguf
license: apache-2.0
language: [en]
base_model: meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: text-generation
tags:
- aurora
- alcf
- hpc
- intel-gpu
- oneapi
- sycl
---
# Llama-3.1-8B-Aurora-Chat v3
🏆 **Best Aurora chat model in our zoo (eval 2.80/5, +59% over base).**
LoRA fine-tune of [`meta-llama/Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) specialized for the
[**ALCF Aurora supercomputer**](https://docs.alcf.anl.gov/aurora/) (Intel Xeon Sapphire
Rapids + Intel GPU Max 1550 / Ponte Vecchio, oneAPI / SYCL, PBS Pro).
Off-the-shelf code-LLMs hallucinate Aurora specifics — they suggest `nvcc` instead of
`icpx -fsycl`, `srun` / `aprun` instead of `mpiexec`, NERSC's `/global/cfs` instead of
`/lus/flare`, and CUDA device strings instead of `xpu`. This adapter teaches the base
model the actual Aurora toolchain, file system layout, scheduler conventions, and
recommended PyTorch/TensorFlow/SYCL idioms.
## Model summary
| | |
|---|---|
| **Base model** | [`meta-llama/Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |
| **Format** | GGUF, f16 — single file, llama.cpp / Ollama / LM Studio compatible |
| **Fine-tuning** | LoRA (PEFT) — r=32, α=64, dropout 0.0, 2 epochs |
| **Optimizer** | AdamW fused, lr 2e-4 cosine, warmup 3%, batch 1 × grad-accum 8 |
| **Precision / seq-len** | bf16, 1,536 tokens |
| **Training data** | [`aurora-docs-distill-multirank`](https://github.com/SIslamMun/Generator/tree/aurora-datasets-2026-04-30/datasets/aurora/iter2/data/training/A) — 4,495 ChatML rows |
| **Train loss (final)** | 0.6224 |
| **Hardware** | 1 Aurora PVC tile (1/12 of a node, 64 GB HBM), IPEX + PyTorch 2.10 XPU backend |
| **Eval (53-Q Aurora, 05)** | **2.80 / 5**   *(base 1.76, +59.1%)* |
## Quick start
**On Aurora** (PVC GPU, SYCL llama.cpp build) — interactive PBS session:
```bash
# 1. Grab a debug node
qsub -I -A <project> -q debug -l select=1,walltime=01:00:00,filesystems=home:flare
# 2. Load the toolchain
module load frameworks
source /lus/flare/projects/<project>/scripts/env.sh # or your own oneAPI setup
export ONEAPI_DEVICE_SELECTOR=level_zero:gpu
# 3. Download to flare (NOT $HOME — quota is small)
hf download shazzadulimun/llama31-8b-aurora-chat-v3-gguf --local-dir /lus/flare/projects/<project>/models/aurora-chat-v3
# 4. Run on a single PVC tile
/path/to/llama.cpp/build_sycl/bin/llama-cli \
-m /lus/flare/projects/<project>/models/aurora-chat-v3/*.gguf \
-ngl 999 -sm none --temp 0.0 -cnv \
-p "How do I launch one MPI rank per GPU tile on Aurora?"
```
**Anywhere else** (laptop, workstation, any GPU):
```bash
hf download shazzadulimun/llama31-8b-aurora-chat-v3-gguf --local-dir ./model
./llama-cli -m ./model/*.gguf -ngl 999 --temp 0.0 -cnv
```
Or **Ollama / LM Studio**: `ollama run hf.co/shazzadulimun/llama31-8b-aurora-chat-v3-gguf`
## Training data
Distilled from `openai/gpt-oss-120b on ALCF Sophia (vLLM)` over 416 cleaned chunks of
[`docs.alcf.anl.gov/aurora`](https://docs.alcf.anl.gov/aurora/). 4,495
training rows + 562 validation rows in ChatML format with embedded
chain-of-thought (`**Reasoning:**` / `**Answer:**`).
**Broad coverage, parallel-rank distillation.** 20 worker ranks each took a *disjoint* slice (~21 chunks) of the cleaned `docs.alcf.anl.gov/aurora` corpus and asked the teacher for chain-of-thought QA pairs. Disjoint slicing maximizes phrasing diversity (each rank sees fresh context) while still covering every chunk exactly once.
Full corpus + reproduction scripts:
[**SIslamMun/Generator @ aurora-datasets-2026-04-30**](https://github.com/SIslamMun/Generator/tree/aurora-datasets-2026-04-30/datasets/aurora/iter2/data/training/A).
## Evaluation
53-question Aurora-domain holdout (programming models, ML/AI, systems/ops, debugging).
Judged by `gpt-oss-120b` on a 05 scale.
| Model | Avg | Δ vs. base |
|---|---|---|
| Llama-3.1-8B-Aurora-Chat v3 (`-A` data) — best | **2.80** | +59% |
| Llama-3.1-8B-Aurora-Ops v3 | 2.31 | +31% |
| Llama-3.1-8B-Aurora-Chat v1 (`-B` data, single-rank ablation) | 2.45 | +39% |
| Llama-3.1-8B-Aurora-ML v3 | 2.13 | +21% |
| Llama-3.1-8B-Aurora-Coder v3 | 1.97 | +12% |
| `meta-llama/Llama-3.1-8B-Instruct` (base) | 1.76 | — |
Closed frontier models (gpt-4o, claude-sonnet-4-5, the gpt-oss-120b teacher) score
3.64.1 on the same holdout — the goal here isn't to beat them, it's to distill enough
Aurora knowledge into a small open model that runs on a single PVC tile.
## Limitations
- **Synthetic-data biases.** Teacher (`gpt-oss-120b`) can confabulate plausible-looking
but incorrect commands. Treat outputs as a verifiable first draft, not authoritative.
- **Doc snapshot is fixed at 2026-04-29.** Module versions, queue names, and APIs change
— anything published after that date isn't reflected here.
- **Aurora-only.** Specifics (`/lus/flare`, `xpu`, PBS queues) won't transfer to Frontier,
Polaris, or other systems.
- **Use temperature ≤ 0.1** for technical answers; higher temps invite invented flag names
and paths.
## Citation
```bibtex
@misc{aurora-llms-2026,
title = { Llama-3.1-8B-Aurora-Chat v3 },
author = { Islam Mun, Shazzadul },
year = { 2026 },
url = { https://huggingface.co/shazzadulimun/llama31-8b-aurora-chat-v3-gguf },
note = { LoRA fine-tune of Llama-3.1-8B-Instruct; data distilled from gpt-oss-120b on docs.alcf.anl.gov/aurora }
}
```
## License
Apache-2.0 for the adapter weights and synthetic training data. Source corpus is public
ALCF user documentation. Base model retains its own license — see
[`meta-llama/Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).