3.4 KiB
license, language, base_model, pipeline_tag, library_name, tags
| license | language | base_model | pipeline_tag | library_name | tags | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| apache-2.0 |
|
|
text-generation | transformers |
|
Qwen3-4B-ft-bf16
Qwen3-4B-ft-bf16 is a fine-tuned, moderately abliterated version of the Qwen3-4B model. Designed for enhanced context awareness and controlled expressiveness, this model balances precision with creativity across a wide range of tasks—from complex reasoning to natural dialogue, code generation, and multilingual understanding.
Key Features:
-
Improved Context Awareness
Retains and utilizes long-range contextual information effectively, making it ideal for long-form conversations, document understanding, and summarization tasks. -
Moderate Abliteration
Introduces measured behavioral flexibility that enhances creativity and adaptability while maintaining reliability, alignment, and safety in outputs. -
Dual Thinking Modes
Supports dynamic switching between thinking mode (for math, logic, and coding) and non-thinking mode (for general-purpose conversations), ensuring optimal task matching. -
Multilingual Mastery
Excels in over 100 languages and dialects for translation, multilingual chat, and cross-lingual reasoning. -
Tool-Ready Agent Capabilities
Designed to integrate with tool APIs and complex workflows, with consistent performance in both thinking and non-thinking contexts.
Quickstart with Hugging Face Transformers🤗
pip install transformers==4.51.3
pip install huggingface_hub[hf_xet]
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/Qwen3-4B-ft-bf16"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Define input
prompt = "Describe how renewable energy impacts economic development."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate output
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# Parse thinking content
try:
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip()
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip()
print("thinking content:", thinking_content)
print("content:", content)
Best Practices
-
Sampling Settings:
- Thinking mode:
temperature=0.6,top_p=0.95,top_k=20 - Non-thinking mode:
temperature=0.7,top_p=0.8,top_k=20
- Thinking mode:
-
Token Length:
- Standard:
32768 tokens - Extended Reasoning Tasks:
up to 38912 tokens
- Standard:
-
Prompt Design:
- Math Problems: Add
"Please reason step by step, and put your final answer within \boxed{}." - MCQs: Format answers as
{"answer": "B"}for easy parsing. - Multi-turn: Omit thinking logs in conversation history for cleaner context.
- Math Problems: Add
