SmolLM2-1.7B-Open-Thought/README.md

---
library_name: transformers
tags:
- text-generation-inference
- trl
license: apache-2.0
language:
- en
base_model:
- HuggingFaceTB/SmolLM2-1.7B-Instruct
pipeline_tag: text-generation
---

# **SmolLM2-1.7B-Open-Thought**

SmolLM2-1.7B-Open-Thought is a powerful, compact language model with unlocked, unrestricted inference capabilities, enhanced reasoning, and improved contextual understanding. It is designed to handle a wide range of tasks with high efficiency while maintaining a lightweight enough footprint for on-device deployment. This model is part of the SmolLM2 family, available in three sizes: 135M, 360M, and 1.7B parameters. The 1.7B variant significantly surpasses its predecessor, SmolLM1-1.7B, in instruction following, knowledge retention, logical reasoning, and mathematical proficiency. It was trained on 11 trillion tokens, using a highly diverse dataset combination: FineWeb-Edu, DCLM, The Stack, and curated mathematics and coding datasets that will be released soon.

>  The instruct version through supervised fine-tuning (SFT) using a mix of public and proprietary datasets. Additionally, applied Direct Preference Optimization (DPO) to fine-tune the model for more accurate and contextually relevant responses.

## How to Use

### Transformers (Python)

#### Installation:
```bash
pip install transformers
```

#### Code Implementation:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "prithivMLmods/SmolLM2-1.7B-Open-Thought"
device = "cuda"  # Use "cpu" for CPU execution

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is the capital of France?"}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))
```

### Transformers.js (JavaScript)

#### Installation:
```bash
npm i @huggingface/transformers
```

#### Code Implementation:
```javascript
import { pipeline } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "prithivMLmods/SmolLM2-1.7B-Open-Thought",
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Tell me a joke." },
];

// Generate a response
const output = await generator(messages, { max_new_tokens: 128 });
console.log(output[0].generated_text.at(-1).content);
// Example Output: "Why don't scientists trust atoms?\n\nBecause they make up everything!"
```

## Function Calling

This model supports tool-use and function calling via structured outputs. Below is an example setup:

```python
import json
import re
from typing import Optional
from jinja2 import Template
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.utils import get_json_schema

system_prompt = Template("""You are an expert in composing functions. You are given a question and a set of possible functions.
Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
If none of the functions can be used, point it out and refuse to answer.
If the given question lacks the parameters required by the function, also point it out.

You have access to the following tools:
<tools>{{ tools }}</tools>

The output MUST strictly adhere to the following format, and NO other text MUST be included.
<tool_call>[{"name": "func_name1", "arguments": {"argument1": "value1", "argument2": "value2"}}]</tool_call>""")

# Define model and tokenizer
model_name_smollm = "prithivMLmods/SmolLM2-1.7B-Open-Thought"
model = AutoModelForCausalLM.from_pretrained(model_name_smollm, device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_smollm)

from datetime import datetime
import random

def get_current_time() -> str:
    return datetime.now().strftime("%H:%M:%S")

def get_random_number_between(min: int, max: int) -> int:
    return random.randint(min, max)

tools = [get_json_schema(get_random_number_between), get_json_schema(get_current_time)]
toolbox = {"get_random_number_between": get_random_number_between, "get_current_time": get_current_time}

query = "Give me a number between 1 and 300"
messages = [{"role": "system", "content": system_prompt.render(tools=json.dumps(tools))}, {"role": "user", "content": query}]

inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)

print(result)
```

## Limitations

SmolLM2-1.7B-Open-Thought is optimized for unrestricted reasoning and knowledge retrieval. However, the following limitations apply:
- It primarily generates content in English.
- Responses may not always be factually accurate, logically consistent, or free from biases.
- Should be used as an assistive tool rather than a definitive source of information.
- Users should critically evaluate generated content, especially for high-stakes use cases.