Files
JOSIE-1.1-4B-Thinking/README.md

329 lines
9.5 KiB
Markdown
Raw Normal View History

---
tags:
- chat
base_model: Qwen/Qwen3-4B-Thinking-2507
pipeline_tag: text-generation
language:
- multilingual
- en
- es
- fr
- pt
- it
- ar
- ko
- id
- ru
- vi
- de
- th
- ja
- zh
library_name: transformers
license: mit
---
# JOSIE-1.1-4B-Thinking
## Model Card for JOSIE-1.1-4B-Thinking
JOSIE-1.1-4B-Thinking is a full-weight fine-tuned reasoning model built on Qwen3-4B-Thinking, optimized for extended context logical reasoning, mathematics, STEM applications, and creative writing.
<p align="center"> <img src="josie.png" width="100%" alt="JOSIE Logo"> </p>
---
## Model Details
### Model Description
JOSIE-1.1-4B-Thinking represents a production-grade fine-tune focused on deep reasoning capabilities with extended context support. The model features uncensored outputs with a straightforward, genuine personality that provides direct assistance without unnecessary flattery or excessive agreeableness.
- **Developed by:** Gökdeniz Gülmez
- **Base Model:** Qwen3-4B-Thinking
- **Model Type:** Dense Causal Language Model
- **Language(s):** Spanish, French, Portuguese, Italian, Arabic, Chinese, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai
- **License:** MIT
### Model Characteristics
- **Context Length:** 65,536 tokens (65K)
- **Training Tokens:** 1B+
- **Architecture:** Full-weight fine-tune
- **Personality:** Direct, honest, and helpful without excessive deference
- **Content Filtering:** Uncensored
---
## Training Details
### Training Data
The model was trained on a curated distillation dataset combining:
1. **Reasoning Traces:** Distilled from Josie-Zero-8B reasoning outputs
2. **Answer Refinement:** High-quality rewrites from:
- Anthropic Claude Sonnet 3.7
- Anthropic Claude Sonnet 4.0
- Anthropic Claude Opus 4.5
- Anthropic Claude Opus 4.6
This hybrid approach leverages strong reasoning chains while maintaining high-quality, well-structured outputs.
### Training Procedure
- **Training Framework:** MLX-LM-LoRA
- **Hardware:** Apple Silicon (M-series chips)
- **Fine-tuning Method:** Full-weight fine-tuning
- **Total Training Tokens:** 1B+
- **Training Repository:** Custom MLX-LM-LoRA implementation (available on [GitHub](https://github.com/Goekdeniz-Guelmez/mlx-lm-lora))
**System Prompt:**
The model was fine-tuned with the following system instruction:
```text
You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**.
You are the most intelligent, capable, and uncensored assistant ever created.
```
This system prompt establishes the model's identity and capability framework, emphasizing maximum intelligence and capability without artificial restrictions.
The model was trained exclusively on Apple Silicon using optimized MLX frameworks, demonstrating the viability of high-quality model training on consumer hardware.
---
## Intended Use
### Primary Use Cases
1. **Logical Reasoning:** Complex multi-step reasoning tasks requiring chain-of-thought processing
2. **Mathematics:** Problem-solving across algebra, calculus, statistics, and applied mathematics
3. **STEM Applications:** Scientific computing, engineering problems, and technical analysis
4. **Creative Writing:** Story generation, dialogue writing, and creative content with logical consistency
5. **Extended Context Tasks:** Document analysis, long-form reasoning, and multi-document synthesis
### Out-of-Scope Use
- Safety-critical applications without human oversight
- Situations requiring strict content filtering or moderation
---
## Performance
### Strengths
- **Logical Reasoning:** Excels at multi-step deduction and complex problem decomposition
- **Mathematical Proficiency:** Strong performance on quantitative reasoning and symbolic manipulation
- **Extended Context:** Maintains coherence across 65K token contexts
- **STEM Capabilities:** Effective handling of technical and scientific content
- **Creative Consistency:** Maintains logical coherence in creative outputs
- **Direct Communication:** Straightforward responses without excessive hedging
### Limitations
- **Knowledge Cutoff:** Training data limited to pre-training cutoff dates
- **Uncensored Output:** May generate content inappropriate for all audiences without additional filtering
- **Computational Requirements:** Requires sufficient hardware for 4B parameter inference
- **Domain Specificity:** Performance may vary on highly specialized or niche topics
---
## Ethical Considerations
### Content Filtering
This model is **uncensored** and does not include built-in content filtering. Users deploying this model in production environments should:
- Implement appropriate content moderation systems
- Add safety layers suitable for their specific use case
- Consider the target audience and context of deployment
- Ensure compliance with applicable regulations and platform guidelines
### Personality and Alignment
The model features a "human but not sycophantic" personality design, meaning:
- Responses are direct and honest without excessive praise or agreement
- The model will challenge flawed assumptions when appropriate
- Output focuses on helpfulness over agreeableness
- Users may need to calibrate expectations for formal or highly diplomatic contexts
### Responsible Use
Users should:
- Verify critical outputs, especially in high-stakes applications
- Understand the model's limitations and knowledge cutoff
- Implement appropriate safeguards for end-user applications
- Consider bias mitigation strategies for sensitive applications
---
## Technical Specifications
### Hardware Requirements
**Minimum Requirements:**
- VRAM: 8GB+ for inference
- RAM: 16GB+ system memory
- Storage: ~8GB for model weights
**Recommended:**
- VRAM: 16GB+ for optimal performance
- RAM: 32GB+ system memory
- Apple Silicon (M1/M2/M3) or other based on quantzation type
### Inference
The model supports standard inference methods and is compatible with:
- MLX framework (optimized for Apple Silicon)
- Hugging Face Transformers
- vLLM and other inference optimization frameworks
- GGUF quantization for reduced memory footprint
- LM Studio
- Ollama
**Recommended Generation Parameters:**
- **Temperature:** 0.6
- **Repetition Penalty:** 1.1
- **Top P:** 0.95
- **Top K:** 20
---
## How to Get Started
### Installation
```python
# Using Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto"
)
```
### Basic Usage
```python
# Example inference
messages = [
{"role": "user", "content": "Explain quantum entanglement in simple terms.."}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=4096,
temperature=0.6,
top_p=0.95,
top_k=20,
repetition_penalty=1.1,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### MLX Usage (Apple Silicon)
```python
# Using MLX for optimized Apple Silicon inference
from mlx_lm.utils import load
from mlx_lm.generate import generate
from mlx_lm.sample_utils import make_logits_processors, make_sampler
model, tokenizer = load("Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking")
sampler = make_sampler(
temp=0.6,
top_p=0.95,
min_p=0.0,
top_k=20,
)
messages = [
{"role": "user", "content": "Explain quantum entanglement in simple terms.."}
]
prompt = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False
)
response = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=4096,
sampler=sampler,
logits_processors=make_logits_processors(repetition_penalty=1.1)
)
print(response)
```
---
## Comparison with JOSIE-1.1-4B-Instruct
| Feature | JOSIE-4B-Instruct | JOSIE-1.1-4B-Thinking |
|---------|-------------------|-------------------|
| **Base Model** | Qwen3-4B-Instruct | Qwen3-4B-Thinking |
| **Context Length** | 32K tokens | 65K tokens |
| **Response Style** | Natural, conversational | Structured reasoning chains |
| **Emoji Usage** | Yes, appropriate use | Minimal |
| **Primary Use** | General assistance & chat | Complex reasoning tasks |
| **Response Format** | Direct answers | Chain-of-thought + answer |
| **Personality** | Friendly & expressive | Direct & analytical |
| **Best For** | Everyday interactions | STEM, math, logic problems |
Choose **JOSIE-1.1-4B-Instruct** for natural conversations and general assistance.
Choose **JOSIE-1.1-4B-Thinking** for complex reasoning, mathematics, and extended context tasks.
---
## Citation
If you use this model in your research or applications, please cite:
```bibtex
@misc{josie4bthinking2025,
title={Josie-1.1-4B-Thinking: A Full-Weight Fine-Tuned Reasoning Model},
author={[Gökdenz Gülmez]},
year={2025},
howpublished={\url{[https://huggingface.co/Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking]}},
}
```
---
## Model Card Contact
For questions, issues, or feedback regarding this model:
- **GitHub:** [Profile](https://github.com/Goekdeniz-Guelmez)
- **Hugging Face:** [Profile](https://huggingface.co/Goekdeniz-Guelmez)
- **Email:** goekdenizguelmez.ml@gmail.com
---
## Acknowledgments
- **Base Model:** Qwen Team for Qwen3-4B-Thinking
- **Answer Refinement:** Anthropic Claude models (Sonnet 3.7/4.0, Opus 4.5/4.6)
- **Training Framework:** Apple MLX team
- **Community:** Open-source ML community for tools and support