Acrux-500M-o1-Journey/README.md

---
license: creativeml-openrail-m
datasets:
- GAIR/o1-journey
language:
- en
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
library_name: transformers
pipeline_tag: text-generation
tags:
- Qwen2.5
- Llama-Cpp
- CoT
- o1-journey
- text-generation-inference
- safetensors
- Ollama
---
### Acrux-500M-o1-Journey Model Files

The **Acrux-500M-o1-Journey** is a lightweight, instruction-tuned language model fine-tuned from the **Qwen2.5-0.5B-Instruct** base model. With a size of 500 million parameters, it is designed for **cost-effective deployment** and **fast text generation** while maintaining quality performance for instruction-following tasks.

| **File Name**             | **Size**       | **Description**                           | **Upload Status**  |
|----------------------------|----------------|-------------------------------------------|--------------------|
| `.gitattributes`           | 1.57 kB        | Git attributes for managing LFS files.    | Uploaded           |
| `README.md`                | 195 Bytes      | Model overview or documentation.          | Updated            |
| `added_tokens.json`        | 657 Bytes      | Custom tokens for the tokenizer.          | Uploaded           |
| `config.json`              | 859 Bytes      | Model configuration file.                 | Uploaded           |
| `generation_config.json`   | 280 Bytes      | Configuration for text generation.        | Uploaded           |
| `merges.txt`               | 1.82 MB        | Merge rules for byte-pair encoding (BPE). | Uploaded           |
| `pytorch_model.bin`        | 988 MB         | Model weights (PyTorch format).           | Uploaded (LFS)     |
| `special_tokens_map.json`  | 644 Bytes      | Mapping for special tokens.               | Uploaded           |
| `tokenizer.json`           | 11.4 MB        | Full tokenizer configuration.             | Uploaded (LFS)     |
| `tokenizer_config.json`    | 7.73 kB        | Additional tokenizer settings.            | Uploaded           |
| `vocab.json`               | 2.78 MB        | Vocabulary for the tokenizer.             | Uploaded           |
### **Key Features:**

1. **Compact Size with Efficient Performance:**
   The smaller parameter count (500M) ensures faster inference and reduced hardware requirements.

2. **Instruction Optimization:**
   Fine-tuned to follow prompts effectively, making it suitable for interactive applications and prompt-based tasks.

3. **Domain-Specific Training:**
   Trained on the **GAIR/o1-journey** dataset, providing tailored capabilities for specific use cases.

---

### **Training Details:**
- **Base Model:** [Qwen2.5-0.5B-Instruct](#)
- **Dataset Used for Fine-Tuning:** [GAIR/o1-journey](#)
  - A compact dataset focusing on instruction-driven generation with 1.42k samples.

---
### **Capabilities:**

1. **Instruction Following:**
   - Generates accurate and coherent responses to user instructions.
   - Handles summarization, question-answering, and conversational tasks.

2. **Fast Inference:**
   - Ideal for real-time applications due to reduced latency from its smaller size.

3. **Interactive AI Development:**
   - Suitable for chatbots, virtual assistants, and instructional interfaces.

---
### **Usage Instructions:**

1. **Setup:**
   Download all model files, ensuring compatibility with the Hugging Face Transformers library.

2. **Loading the Model:**
   ```python
   from transformers import AutoModelForCausalLM, AutoTokenizer

   model_name = "prithivMLmods/Acrux-500M-o1-Journey"
   tokenizer = AutoTokenizer.from_pretrained(model_name)
   model = AutoModelForCausalLM.from_pretrained(model_name)
   ```
3. **Sample Generate Text:**
   ```python
   input_text = "Explain the concept of machine learning in simple terms."
   inputs = tokenizer(input_text, return_tensors="pt")
   outputs = model.generate(**inputs, max_length=100, temperature=0.7)
   print(tokenizer.decode(outputs[0], skip_special_tokens=True))
   ```
4. **Optimize Generation:**
   Adjust parameters in `generation_config.json` for better control of output, such as:
   - `temperature` for randomness.
   - `top_p` for sampling diversity.
   - `max_length` for output size.
---