--- license: apache-2.0 language: - en base_model: - Qwen/Qwen3-4B pipeline_tag: text-generation library_name: transformers tags: - moe - moderately abliterated variant - text-generation-inference --- ![FMjPew6Vjrp4FvKe1Uz_T.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/cMq6BmD-1cEiOngYBL3Wh.png) # **Qwen3-4B-ft-bf16** > **Qwen3-4B-ft-bf16** is a fine-tuned, moderately abliterated version of the Qwen3-4B model. Designed for **enhanced context awareness** and **controlled expressiveness**, this model balances precision with creativity across a wide range of tasks—from complex reasoning to natural dialogue, code generation, and multilingual understanding. ### Key Features: - **Improved Context Awareness** Retains and utilizes long-range contextual information effectively, making it ideal for long-form conversations, document understanding, and summarization tasks. - **Moderate Abliteration** Introduces measured behavioral flexibility that enhances creativity and adaptability while maintaining reliability, alignment, and safety in outputs. - **Dual Thinking Modes** Supports dynamic switching between *thinking* mode (for math, logic, and coding) and *non-thinking* mode (for general-purpose conversations), ensuring optimal task matching. - **Multilingual Mastery** Excels in over 100 languages and dialects for translation, multilingual chat, and cross-lingual reasoning. - **Tool-Ready Agent Capabilities** Designed to integrate with tool APIs and complex workflows, with consistent performance in both thinking and non-thinking contexts. --- ## Quickstart with Hugging Face Transformers🤗 ```bash pip install transformers==4.51.3 pip install huggingface_hub[hf_xet] ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "prithivMLmods/Qwen3-4B-ft-bf16" # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # Define input prompt = "Describe how renewable energy impacts economic development." messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # Generate output generated_ids = model.generate( **model_inputs, max_new_tokens=32768 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # Parse thinking content try: index = len(output_ids) - output_ids[::-1].index(151668) except ValueError: index = 0 thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip() content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip() print("thinking content:", thinking_content) print("content:", content) ``` --- ## Best Practices - **Sampling Settings**: - *Thinking mode*: `temperature=0.6`, `top_p=0.95`, `top_k=20` - *Non-thinking mode*: `temperature=0.7`, `top_p=0.8`, `top_k=20` - **Token Length**: - Standard: `32768 tokens` - Extended Reasoning Tasks: `up to 38912 tokens` - **Prompt Design**: - **Math Problems**: Add `"Please reason step by step, and put your final answer within \boxed{}."` - **MCQs**: Format answers as `{"answer": "B"}` for easy parsing. - **Multi-turn**: Omit thinking logs in conversation history for cleaner context.