154 lines
6.1 KiB
Markdown
154 lines
6.1 KiB
Markdown
|
|
---
|
|||
|
|
license: creativeml-openrail-m
|
|||
|
|
library_name: transformers
|
|||
|
|
tags:
|
|||
|
|
- deep_think
|
|||
|
|
- reasoning
|
|||
|
|
- chain_of_thought
|
|||
|
|
- chain_of_thinking
|
|||
|
|
- prev_2
|
|||
|
|
- self_reasoning
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
base_model:
|
|||
|
|
- prithivMLmods/Llama-Thinker-3B-Preview
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
---
|
|||
|
|

|
|||
|
|
|
|||
|
|
# **Llama-Thinker-3B-Preview2**
|
|||
|
|
|
|||
|
|
Llama-Thinker-3B-Preview2 is a pretrained and instruction-tuned generative model designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
|
|||
|
|
|
|||
|
|
Model Architecture: [ Based on Llama 3.2 ] is an autoregressive language model that uses an optimized transformer architecture. The tuned versions undergo supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
|
|||
|
|
|
|||
|
|
# **Use with transformers**
|
|||
|
|
|
|||
|
|
Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
|
|||
|
|
|
|||
|
|
Make sure to update your transformers installation via `pip install --upgrade transformers`.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import torch
|
|||
|
|
from transformers import pipeline
|
|||
|
|
|
|||
|
|
model_id = "prithivMLmods/Llama-Thinker-3B-Preview2"
|
|||
|
|
pipe = pipeline(
|
|||
|
|
"text-generation",
|
|||
|
|
model=model_id,
|
|||
|
|
torch_dtype=torch.bfloat16,
|
|||
|
|
device_map="auto",
|
|||
|
|
)
|
|||
|
|
messages = [
|
|||
|
|
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
|
|||
|
|
{"role": "user", "content": "Who are you?"},
|
|||
|
|
]
|
|||
|
|
outputs = pipe(
|
|||
|
|
messages,
|
|||
|
|
max_new_tokens=256,
|
|||
|
|
)
|
|||
|
|
print(outputs[0]["generated_text"][-1])
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes)
|
|||
|
|
|
|||
|
|
# **Use with `llama`**
|
|||
|
|
|
|||
|
|
Please, follow the instructions in the [repository](https://github.com/meta-llama/llama)
|
|||
|
|
|
|||
|
|
To download Original checkpoints, see the example command below leveraging `huggingface-cli`:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
huggingface-cli download prithivMLmods/Llama-Thinker-3B-Preview2 --include "original/*" --local-dir Llama-Thinker-3B-Preview2
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Here’s a version tailored for the **Llama-Thinker-3B-Preview2-GGUF** model:
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# **How to Run Llama-Thinker-3B-Preview2 on Ollama Locally**
|
|||
|
|
|
|||
|
|
This guide demonstrates how to run the **Llama-Thinker-3B-Preview2-GGUF** model locally using Ollama. The model is instruction-tuned for multilingual tasks and complex reasoning, making it highly versatile for a wide range of use cases. By the end, you'll be equipped to run this and other open-source models with ease.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Example 1: How to Run the Llama-Thinker-3B-Preview2 Model
|
|||
|
|
|
|||
|
|
The **Llama-Thinker-3B-Preview2** model is a pretrained and instruction-tuned LLM, designed for complex reasoning tasks across multiple languages. In this guide, we'll interact with it locally using Ollama, with support for quantized models.
|
|||
|
|
|
|||
|
|
### Step 1: Download the Model
|
|||
|
|
|
|||
|
|
First, download the **Llama-Thinker-3B-Preview2-GGUF** model using the following command:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
ollama run llama-thinker-3b-preview2.gguf
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 2: Model Initialization and Download
|
|||
|
|
|
|||
|
|
Once the command is executed, Ollama will initialize and download the necessary model files. You should see output similar to this:
|
|||
|
|
|
|||
|
|
```plaintext
|
|||
|
|
pulling manifest
|
|||
|
|
pulling a12cd3456efg... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 3.2 GB
|
|||
|
|
pulling 9f87ghijklmn... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 6.5 KB
|
|||
|
|
verifying sha256 digest
|
|||
|
|
writing manifest
|
|||
|
|
removing any unused layers
|
|||
|
|
success
|
|||
|
|
>>> Send a message (/? for help)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 3: Interact with the Model
|
|||
|
|
|
|||
|
|
Once the model is fully loaded, you can interact with it by sending prompts. For example, let's ask:
|
|||
|
|
|
|||
|
|
```plaintext
|
|||
|
|
>>> How can you assist me today?
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
A sample response might look like this [may / maynot be identical]:
|
|||
|
|
|
|||
|
|
```plaintext
|
|||
|
|
I am Llama-Thinker-3B-Preview2, an advanced AI language model designed to assist with complex reasoning, multilingual tasks, and general-purpose queries. Here are a few things I can help you with:
|
|||
|
|
|
|||
|
|
1. Answering complex questions in multiple languages.
|
|||
|
|
2. Assisting with creative writing, content generation, and problem-solving.
|
|||
|
|
3. Providing detailed summaries and explanations.
|
|||
|
|
4. Translating text across different languages.
|
|||
|
|
5. Generating ideas for personal or professional use.
|
|||
|
|
6. Offering insights on technical topics.
|
|||
|
|
|
|||
|
|
Feel free to ask me anything you'd like assistance with!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 4: Exit the Program
|
|||
|
|
|
|||
|
|
To exit the program, simply type:
|
|||
|
|
|
|||
|
|
```plaintext
|
|||
|
|
/exit
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Example 2: Using Multi-Modal Models (Future Use)
|
|||
|
|
|
|||
|
|
In the future, Ollama may support multi-modal models where you can input both text and images for advanced interactions. This section will be updated as new capabilities become available.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Notes on Using Quantized Models
|
|||
|
|
|
|||
|
|
Quantized models like **llama-thinker-3b-preview2.gguf** are optimized for efficient performance on local systems with limited resources. Here are some key points to ensure smooth operation:
|
|||
|
|
|
|||
|
|
1. **VRAM/CPU Requirements**: Ensure your system has adequate VRAM or CPU resources to handle model inference.
|
|||
|
|
2. **Model Format**: Use the `.gguf` model format for compatibility with Ollama.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# **Conclusion**
|
|||
|
|
|
|||
|
|
Running the **Llama-Thinker-3B-Preview2** model locally using Ollama provides a powerful way to leverage open-source LLMs for complex reasoning and multilingual tasks. By following this guide, you can explore other models and expand your use cases as new models become available.
|
|||
|
|
|
|||
|
|
---
|