A world model for the BabyAI grid-world environment, fine-tuned from Qwen2.5-7B-Instruct using LoRA. This model predicts the next observation and available actions given the current state and the agent's action.
Model Details
Base model: Qwen2.5-7B-Instruct
Fine-tuning: LoRA (40.4M trainable params, 0.53% of 7.66B total), merged after training
fromtransformersimportAutoModelForCausalLM,AutoTokenizermodel=AutoModelForCausalLM.from_pretrained("GGOSinon/babyai-world-model-7B-sft",torch_dtype="bfloat16").to("cuda")tokenizer=AutoTokenizer.from_pretrained("GGOSinon/babyai-world-model-7B-sft")messages=[{"role":"system","content":"You are a simulator for a grid-world environment called BabyAI..."},{"role":"user","content":"Goal: pick up the red box\n\nObservation:\n...\nAvailable actions: [...]\nAgent's action: pickup red box 1"}]inputs=tokenizer.apply_chat_template(messages,tokenize=True,return_tensors="pt",add_generation_prompt=True).to("cuda")output=model.generate(inputs,max_new_tokens=300,do_sample=False)print(tokenizer.decode(output[0][inputs.shape[1]:],skip_special_tokens=True))