--- base_model: MihaiPopa-1/OmniTranslate-1.0 # base_model: Unsloth/Qwen3-0.6B-Unsloth-bnb-4bit - Variant that I used for fine-tuning (4-bit BNB quant by Unsloth) tags: - text-generation-inference - translation - transformers - unsloth - qwen3 - omnitranslate license: apache-2.0 language: - abk - abq - abs - acm - adh - adi - ady - aeb - afr - agx - aii - aim - ain - ajz - akb - aln - als - alt - amh - anp - aoz - apc - apt - arb - arg - arq - ars - ary - arz - asm - ast - atb - ava - awa - ayp - ayr - azb - azj - bak - bam - ban - bar - bas - bbc - bbk - bcl - bdq - bel - ben - bew - bho - bhp - bis - biu - bjn - bod - bos - brh - brx - bts - btx - bug - bul - bwi - bxr - cat - cbk - ccp - ceb - ces - cfm - cha - che - chr - chu - chv - cjs - ckb - ckt - cmn - cnh - cnw - cos - crh - crj - crk - crl - crs - csb - csw - csy - ctd - cym - czt - dak - dan - dar - deu - dik - diu - div - dje - dks - dln - dng - dnw - doi - dru - dsb - dtp - dty - dzo - ekk - ell - emj - enl - enm - epo - ess - eus - eve - ewo - ext - fao - fas - ffm - fij - fil - fin - fit - fkv - fmu - fra - fro - frp - fry - fuf - fur - fuv - gag - gaz - gcf - gla - gle - glg - glk - glv - gmh - gnb - goh - gom - gos - grc - gsw - gug - guj - guz - hac - hae - hak - hat - hau - haw - hbo - heb - her - hif - hil - hin - hmr - hne - hns - hrv - hrx - hsb - hun - hwc - hye - hyw - iba - ibg - ibo - ife - ike - ikt - ilo - ina - ind - inh - isl - ita - ivv - jav - jpn - jun - kaa - kab - kac - kak - kal - kam - kan - kas - kat - kaz - kbd - kca - kdh - kdr - kea - kei - kgp - kha - khk - khm - kik - kin - kir - kiu - kjb - kjh - kmr - knc - koi - kor - kos - kpv - krj - krl - kru - ksh - ksw - ktj - ktz - kua - kum - kwn - kyu - kzj - lad - lao - lat - lbe - ldn - lew - lez - lfn - lim - lin - lis - lit - lki - lld - lmk - lnd - lrc - ltg - ltz - lud - lug - luo - lus - lvs - lwg - lzh - mag - mah - mai - mak - mal - mar - mas - mbf - mdf - mer - mfe - mfg - mfy - mhi - mhr - mhy - min - mip - mjw - mkd - mlt - mni - mnk - mns - mnw - moh - mph - mqy - mri - mrj - mrw - mtg - mui - mup - mus - mvp - mwf - mwl - mww - mya - myv - myx - mzh - nah - nan - nap - naq - nbu - nde - ndo - nds - new - nio - njn - njo - nld - nmf - nmz - nno - nob - nog - non - npi - npo - nrf - nri - nrm - nse - nus - nya - nyn - nzm - obo - oci - ojb - olo - orv - ory - oss - ota - oto - otw - pam - pan - pap - pbt - pcd - pck - pcm - pfl - plt - pmq - pmx - pnb - pnt - pol - por - pov - ppk - pps - prg - pui - pxm - quc - qul - qup - qus - quz - raw - rcf - rel - rhg - ria - rjs - rmc - rml - rmn - rmy - rnl - roh - ron - rtm - rue - run - rus - sah - san - sat - sck - scn - sda - sdc - sdh - ses - sgc - sgh - sid - sin - sju - skr - slk - slv - sma - sme - smj - smn - smo - sms - smt - sna - snd - som - sot - spa - srd - srp - ssw - sul - sun - swe - swg - swh - syc - syl - szl - tab - tam - taq - tat - tcy - tcz - tel - tet - tgk - tha - thl - tig - tir - tkl - tkr - tlh - tly - tok - ton - tpi - tpw - trc - trp - trs - ttj - tuk - tur - tuv - twx - tyv - tzl - tzm - udm - uig - ukr - urd - uzn - uzs - vap - vie - vot - vro - war - way - wba - wbm - wes - whk - wlx - wol - wsg - wwa - xal - xho - xmm - xmv - xog - yaz - ydd - yor - yrk - yrl - yua - yue - zea - zgh - zom - zsm - zul pipeline_tag: translation datasets: - MihaiPopa-1/OmniSurgical-1.1 --- # OmniTranslate 1.1 OmniTranslate 1.1 is a massively multilingual machine translation model supporting over 500 languages. Fine-tuned from [Qwen 3 0.6B](https://www.huggingface.co/Qwen/Qwen3-0.6B) (with Unsloth), this model is designed for translation tasks on any device! # Features * **500+ Languages Supported:** The broadest coverage of languages supported for a translation model that's under 1 billion parameters! * **Tiny Size:** Beats any other large model on speed and memory usage. No other model is able to compete with this! # Improvements over 1.0 * OmniTranslate now makes less hiccups when translating to Romanian (like "ami"), and the diacritic bug on Romanian translations has been mostly fixed! There's a tiny chance that the model will spit out without diacritics (mostly due to seeds) though, so try a different one. # Experimental Features * We added 2 new languages, Emoji and Sulfuristic Speak (my own language for OmniTranslate 1.1 to quite fit the Chaos Cubed Minecraft vibe!). Try these out: ## Emoji ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # 1. Load from your Hugging Face Repo model_id = "MihaiPopa-1/OmniTranslate-1.1" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float32, # Standard for CPU device_map="cpu" # Forces CPU usage ) # 2. Translate to Emoji prompt = "<|im_start|>user\nTranslate to emj_Emoj: We love the world!<|im_end|>\n<|im_start|>assistant\n\n" inputs = tokenizer(prompt, return_tensors="pt").to("cpu") with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=64, temperature=0.1) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Sulfuristic Speak ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # 1. Load from your Hugging Face Repo model_id = "MihaiPopa-1/OmniTranslate-1.1" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float32, # Standard for CPU device_map="cpu" # Forces CPU usage ) # 2. Translate to Sulfuristic Speak ("Translate to Sulfuristic Speak" also works too!) prompt = "<|im_start|>user\nTranslate to sul_Latn: Let's ride a Sulfur Cube!<|im_end|>\n<|im_start|>assistant\n\n" inputs = tokenizer(prompt, return_tensors="pt").to("cpu") with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` # Notes OmniTranslate 1.1 is still a experimental model and shouldn't be used for tasks where accurate translations matter. # Notes Providing the ISO code instead of the language name can improve the results a lot. # Usage Code is by Gemini 3 Flash (then some little modifications by myself): ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # 1. Load from your Hugging Face Repo model_id = "MihaiPopa-1/OmniTranslate-1.1" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float32, # Standard for CPU device_map="cpu" # Forces CPU usage ) # 2. Translate (replace ron_Latn with your language here) prompt = "<|im_start|>user\nTranslate to ron_Latn: OmniTranslate is a massively multilingual machine translation model supporting over 500 languages!<|im_end|>\n<|im_start|>assistant\n\n" inputs = tokenizer(prompt, return_tensors="pt").to("cpu") with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` # Data Used I used my own [OmniSurgical 1.1](https://www.huggingface.co/datasets/MihaiPopa-1/OmniSurgical-1.1), which the dataset itself contains a part of [HF's FineTranslations](https://www.huggingface.co/datasets/HuggingFaceFW/finetranslations) --- # Uploaded finetuned model - **Developed by:** MihaiPopa-1 - **License:** apache-2.0 - **Finetuned from model :** unsloth/qwen3-0.6b-unsloth-bnb-4bit This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)