fromtransformersimportAutoModelForCausalLM,AutoTokenizermodel_id="ljvmiranda921/Polyglot-OLMo3-7B-SFT-de"tokenizer=AutoTokenizer.from_pretrained(model_id)model=AutoModelForCausalLM.from_pretrained(model_id)messages=[{"role":"user","content":"Hallo, wie geht es dir?"}]inputs=tokenizer.apply_chat_template(messages,return_tensors="pt",add_generation_prompt=True)outputs=model.generate(inputs,max_new_tokens=256)print(tokenizer.decode(outputs[0],skip_special_tokens=True))
Acknowledgements
LJVM and AK acknowledge the support of the UKRI Frontier Grant EP/Y031350/1 (EQUATE).
This work was performed using joint resources provided by the Cambridge Service for Data Driven Discovery (CSD3) EP/T022159/1 and the Isambard AI National AI Research Resource (AIRR) ST/AIRR/I-A-I/1023, and the Microsoft Research Grant.
LJVM would also like to thank Songbo Hu, Chen Cecilia Liu, Millicent Ochieng, and Felermino Ali for helpful and productive discussions on the project.
Citation
@misc{miranda2026polyglotteachersevaluatinglanguage,title={Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation},author={Lester James V. Miranda and Ivan Vulić and Anna Korhonen},year={2026},eprint={2604.11290},archivePrefix={arXiv},primaryClass={cs.CL},url={https://arxiv.org/abs/2604.11290},}