257 lines
5.6 KiB
Markdown
257 lines
5.6 KiB
Markdown
|
|
---
|
|||
|
|
base_model:
|
|||
|
|
- Qwen/Qwen3-8B
|
|||
|
|
tags:
|
|||
|
|
- text-generation-inference
|
|||
|
|
- transformers
|
|||
|
|
- unsloth
|
|||
|
|
- qwen3
|
|||
|
|
license: other
|
|||
|
|
license_name: anvdl-1.0
|
|||
|
|
license_link: https://huggingface.co/apexion-ai/Nous-V1-8B/blob/main/LICENSE.md
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
- fr
|
|||
|
|
- pt
|
|||
|
|
- de
|
|||
|
|
- ro
|
|||
|
|
- sv
|
|||
|
|
- da
|
|||
|
|
- bg
|
|||
|
|
- ru
|
|||
|
|
- cs
|
|||
|
|
- el
|
|||
|
|
- uk
|
|||
|
|
- es
|
|||
|
|
- nl
|
|||
|
|
- sk
|
|||
|
|
- hr
|
|||
|
|
- pl
|
|||
|
|
- lt
|
|||
|
|
- nb
|
|||
|
|
- nn
|
|||
|
|
- fa
|
|||
|
|
- sl
|
|||
|
|
- gu
|
|||
|
|
- lv
|
|||
|
|
- it
|
|||
|
|
- oc
|
|||
|
|
- ne
|
|||
|
|
- mr
|
|||
|
|
- be
|
|||
|
|
- sr
|
|||
|
|
- lb
|
|||
|
|
- vec
|
|||
|
|
- as
|
|||
|
|
- cy
|
|||
|
|
- szl
|
|||
|
|
- ast
|
|||
|
|
- hne
|
|||
|
|
- awa
|
|||
|
|
- mai
|
|||
|
|
- bho
|
|||
|
|
- sd
|
|||
|
|
- ga
|
|||
|
|
- fo
|
|||
|
|
- hi
|
|||
|
|
- pa
|
|||
|
|
- bn
|
|||
|
|
- or
|
|||
|
|
- tg
|
|||
|
|
- yi
|
|||
|
|
- lmo
|
|||
|
|
- lij
|
|||
|
|
- scn
|
|||
|
|
- fur
|
|||
|
|
- sc
|
|||
|
|
- gl
|
|||
|
|
- ca
|
|||
|
|
- is
|
|||
|
|
- sq
|
|||
|
|
- li
|
|||
|
|
- prs
|
|||
|
|
- af
|
|||
|
|
- mk
|
|||
|
|
- si
|
|||
|
|
- ur
|
|||
|
|
- mag
|
|||
|
|
- bs
|
|||
|
|
- hy
|
|||
|
|
- zh
|
|||
|
|
- yue
|
|||
|
|
- my
|
|||
|
|
- ar
|
|||
|
|
- he
|
|||
|
|
- mt
|
|||
|
|
- id
|
|||
|
|
- ms
|
|||
|
|
- tl
|
|||
|
|
- ceb
|
|||
|
|
- jv
|
|||
|
|
- su
|
|||
|
|
- min
|
|||
|
|
- ban
|
|||
|
|
- pag
|
|||
|
|
- ilo
|
|||
|
|
- war
|
|||
|
|
- ta
|
|||
|
|
- te
|
|||
|
|
- kn
|
|||
|
|
- ml
|
|||
|
|
- tr
|
|||
|
|
- az
|
|||
|
|
- uz
|
|||
|
|
- kk
|
|||
|
|
- ba
|
|||
|
|
- tt
|
|||
|
|
- th
|
|||
|
|
- lo
|
|||
|
|
- fi
|
|||
|
|
- et
|
|||
|
|
- hu
|
|||
|
|
- vi
|
|||
|
|
- km
|
|||
|
|
- ja
|
|||
|
|
- ko
|
|||
|
|
- ka
|
|||
|
|
- eu
|
|||
|
|
- ht
|
|||
|
|
- pap
|
|||
|
|
- kea
|
|||
|
|
- tpi
|
|||
|
|
- sw
|
|||
|
|
|
|||
|
|
---
|
|||
|
|

|
|||
|
|
# Apollo-1-8B
|
|||
|
|
|
|||
|
|
[](https://huggingface.co/NoemaResearch/Apollo-1-8B)
|
|||
|
|
[](https://huggingface.co/Qwen/Qwen3-8B)
|
|||
|
|
[](LICENSE)
|
|||
|
|
|
|||
|
|
Apollo-1-8B is a **8 billion parameter instruction-tuned model** developed by **Noema Research**.
|
|||
|
|
It is based on [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) and optimized for **advanced reasoning, instruction following, and high-performance deployment**.
|
|||
|
|
|
|||
|
|
This model represents the **large-scale member** of the Apollo series, balancing strong reasoning capabilities with efficiency for multi-domain applications.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Model Overview
|
|||
|
|
|
|||
|
|
* **Base model:** `Qwen3-8B`
|
|||
|
|
* **Architecture:** Decoder-only transformer
|
|||
|
|
* **Parameters:** \~8B
|
|||
|
|
* **Context length:** up to 32k tokens (inherits Qwen3 long-context support)
|
|||
|
|
* **Domain:** General-purpose reasoning, instruction following, and code generation
|
|||
|
|
* **Primary applications:**
|
|||
|
|
|
|||
|
|
* Advanced conversational AI
|
|||
|
|
* Multi-step reasoning and problem solving
|
|||
|
|
* Knowledge assistants and tutoring systems
|
|||
|
|
* Software development and code generation
|
|||
|
|
* **License:** anvdl-1.0
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Key Features
|
|||
|
|
|
|||
|
|
* **Instruction tuning** for reliable multi-step reasoning and task completion
|
|||
|
|
* **Extended reasoning depth** compared to Apollo-1-4B for complex queries
|
|||
|
|
* **Long-context handling**, inherited from Qwen3 architecture
|
|||
|
|
* **Multilingual coverage**, supporting diverse languages and domains
|
|||
|
|
* **Balanced resource requirements**, deployable on high-end consumer hardware and cloud GPUs
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
|
|||
|
|
The model is available in Hugging Face Transformers format. Example:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
import torch
|
|||
|
|
model_id = "NoemaResearch/Apollo-1-8B"
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_id,
|
|||
|
|
torch_dtype=torch.bfloat16,
|
|||
|
|
device_map="auto",
|
|||
|
|
trust_remote_code=True
|
|||
|
|
)
|
|||
|
|
messages = [
|
|||
|
|
{"role":"system", "content":"You are Apollo, a reasoning assistant."},
|
|||
|
|
{"role":"user", "content":"Explain the differences between supervised, unsupervised, and reinforcement learning with examples."}
|
|||
|
|
]
|
|||
|
|
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
|
|||
|
|
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, top_p=0.9)
|
|||
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Recommended settings:**
|
|||
|
|
|
|||
|
|
* `temperature=0.4–0.8`
|
|||
|
|
* `top_p=0.9–0.95`
|
|||
|
|
* Lower temperatures yield more factual and concise answers
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Evaluation
|
|||
|
|
|
|||
|
|
Apollo-1-8B demonstrates stronger reasoning and instruction-following capabilities relative to Apollo-1-4B, with internal evaluations indicating:
|
|||
|
|
|
|||
|
|
* Higher accuracy on complex multi-step reasoning tasks
|
|||
|
|
* More robust **instruction adherence**
|
|||
|
|
* Reduced **hallucinations** in factual and structured outputs
|
|||
|
|
* High efficiency for large-context tasks
|
|||
|
|
|
|||
|
|
A full benchmark report will be provided in a future update.
|
|||
|
|
For upstream performance details, see the [Qwen3-8B model card](https://huggingface.co/Qwen/Qwen3-8B).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Limitations
|
|||
|
|
|
|||
|
|
* **Reasoning scale**: While improved, Apollo-1-8B cannot match ultra-large models (14B+) on extremely complex or open-ended tasks
|
|||
|
|
* **Knowledge breadth**: Some highly specialized or niche knowledge may be limited
|
|||
|
|
* **Hallucinations**: May generate plausible but incorrect information
|
|||
|
|
* **Prompt sensitivity**: Outputs remain dependent on careful prompt formulation
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Responsible Use
|
|||
|
|
|
|||
|
|
* Do not rely on Apollo-1-8B for critical decisions without human oversight
|
|||
|
|
* Verify outputs before applying in factual, legal, or safety-critical contexts
|
|||
|
|
* Avoid providing personal or sensitive data in prompts
|
|||
|
|
* The model should not be used to generate unsafe, harmful, or disallowed content
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Model Variants
|
|||
|
|
|
|||
|
|
* **Full precision (safetensors)** — research and high-fidelity inference
|
|||
|
|
* **bf16 / fp16** — efficient inference on modern accelerators
|
|||
|
|
* **Quantized versions (int8 / int4)** — deployment in resource-constrained environments
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Citation
|
|||
|
|
|
|||
|
|
If you use this model, please cite both Apollo-1-8B and the Qwen3 base model:
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@misc{noema2025apollo8b,
|
|||
|
|
title={Apollo-1-8B},
|
|||
|
|
author={Noema Research},
|
|||
|
|
year={2025},
|
|||
|
|
howpublished={\url{https://huggingface.co/NoemaResearch/Apollo-1-8B}}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Acknowledgements
|
|||
|
|
|
|||
|
|
Apollo-1-8B builds upon the [Qwen3](https://huggingface.co/Qwen) family of models.
|
|||
|
|
We thank the Qwen team for open-sourcing their models and enabling derivative research.
|