初始化项目，由ModelHub XC社区提供模型

Model: JuneQaQ/Qwen3-4B-assistant Source: Original Platform
2026-05-16 22:24:08 +08:00
commit ee6aa5f894
23 changed files with 3161 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,51 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bin.* filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zstandard filter=lfs diff=lfs merge=lfs -text
+*.tfevents* filter=lfs diff=lfs merge=lfs -text
+*.db* filter=lfs diff=lfs merge=lfs -text
+*.ark* filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.gguf* filter=lfs diff=lfs merge=lfs -text
+*.ggml filter=lfs diff=lfs merge=lfs -text
+*.llamafile* filter=lfs diff=lfs merge=lfs -text
+*.pt2 filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+merges.txt filter=lfs diff=lfs merge=lfs -text
+vocab.json filter=lfs diff=lfs merge=lfs -text
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,2 @@
+runs/
+images/
--- a/README.md
+++ b/README.md
@@ -0,0 +1,67 @@
+---
+library_name: transformers
+model_name: Qwen3-4B-assistant
+tags:
+- generated_from_trainer
+- trl
+- grpo
+licence: license
+---
+
+# Model Card for Qwen3-4B-assistant
+
+This model is a fine-tuned version of [None](https://huggingface.co/None).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+
+## Quick start
+
+```python
+from transformers import pipeline
+
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="JuneQaQ/Qwen3-4B-assistant", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+
+## Training procedure
+
+ 
+
+
+This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
+
+### Framework versions
+
+- TRL: 0.20.0
+- Transformers: 4.55.2
+- Pytorch: 2.7.1+cu128
+- Datasets: 3.2.0
+- Tokenizers: 0.21.4
+
+## Citations
+
+Cite GRPO as:
+
+```bibtex
+@article{zhihong2024deepseekmath,
+    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
+    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
+    year         = 2024,
+    eprint       = {arXiv:2402.03300},
+}
+
+```
+
+Cite TRL as:
+    
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```
--- a/adapter_config.json
+++ b/adapter_config.json
@@ -0,0 +1,41 @@
+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/root/.cache/modelscope/hub/models/Qwen/Qwen3-4B-Instruct-2507",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "v_proj",
+    "k_proj",
+    "down_proj",
+    "gate_proj",
+    "q_proj",
+    "up_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}
--- a/adapter_model.safetensors
+++ b/adapter_model.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:b6946aed778688a484f4ba4e69993b94d816bff0bca5775eff7839167a9f0b94
+size 66126768
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,28 @@
+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}
--- a/args.json
+++ b/args.json
@@ -0,0 +1,337 @@
+{
+  "output_dir": "/root/projects/ms-swift-main/20250822/output/domain_classifier_sft/v1-20250822-090124",
+  "overwrite_output_dir": false,
+  "do_train": false,
+  "do_eval": false,
+  "do_predict": false,
+  "eval_strategy": "steps",
+  "prediction_loss_only": false,
+  "per_device_train_batch_size": 4,
+  "per_device_eval_batch_size": 1,
+  "per_gpu_train_batch_size": null,
+  "per_gpu_eval_batch_size": null,
+  "gradient_accumulation_steps": 1,
+  "eval_accumulation_steps": null,
+  "eval_delay": 0,
+  "torch_empty_cache_steps": null,
+  "learning_rate": 5e-05,
+  "weight_decay": 0.01,
+  "adam_beta1": 0.9,
+  "adam_beta2": 0.95,
+  "adam_epsilon": 1e-08,
+  "max_grad_norm": 1.0,
+  "num_train_epochs": 3.0,
+  "max_steps": -1,
+  "lr_scheduler_type": "cosine",
+  "lr_scheduler_kwargs": null,
+  "warmup_ratio": 0.1,
+  "warmup_steps": 0,
+  "log_level": "passive",
+  "log_level_replica": "warning",
+  "log_on_each_node": true,
+  "logging_dir": "/root/projects/ms-swift-main/20250822/output/domain_classifier_sft/v1-20250822-090124/runs",
+  "logging_strategy": "steps",
+  "logging_first_step": true,
+  "logging_steps": 10,
+  "logging_nan_inf_filter": true,
+  "save_strategy": "steps",
+  "save_steps": 100.0,
+  "save_total_limit": null,
+  "save_safetensors": true,
+  "save_on_each_node": false,
+  "save_only_model": false,
+  "restore_callback_states_from_checkpoint": false,
+  "no_cuda": false,
+  "use_cpu": false,
+  "use_mps_device": false,
+  "seed": 42,
+  "data_seed": 42,
+  "jit_mode_eval": false,
+  "use_ipex": false,
+  "bf16": true,
+  "fp16": false,
+  "fp16_opt_level": "O1",
+  "half_precision_backend": "auto",
+  "bf16_full_eval": false,
+  "fp16_full_eval": false,
+  "tf32": null,
+  "local_rank": -1,
+  "ddp_backend": null,
+  "tpu_num_cores": null,
+  "tpu_metrics_debug": false,
+  "debug": null,
+  "dataloader_drop_last": false,
+  "eval_steps": 100.0,
+  "dataloader_num_workers": null,
+  "dataloader_prefetch_factor": null,
+  "past_index": -1,
+  "run_name": "/root/projects/ms-swift-main/20250822/output/domain_classifier_sft/v1-20250822-090124",
+  "disable_tqdm": null,
+  "remove_unused_columns": true,
+  "label_names": null,
+  "load_best_model_at_end": true,
+  "metric_for_best_model": "eval_loss",
+  "greater_is_better": false,
+  "ignore_data_skip": false,
+  "fsdp": "",
+  "fsdp_min_num_params": 0,
+  "fsdp_config": null,
+  "fsdp_transformer_layer_cls_to_wrap": null,
+  "accelerator_config": {
+    "dispatch_batches": false
+  },
+  "deepspeed": null,
+  "label_smoothing_factor": 0.0,
+  "optim": "adamw_torch",
+  "optim_args": null,
+  "adafactor": false,
+  "group_by_length": false,
+  "length_column_name": "length",
+  "report_to": [
+    "tensorboard"
+  ],
+  "ddp_find_unused_parameters": null,
+  "ddp_bucket_cap_mb": null,
+  "ddp_broadcast_buffers": null,
+  "dataloader_pin_memory": true,
+  "dataloader_persistent_workers": false,
+  "skip_memory_metrics": true,
+  "use_legacy_prediction_loop": false,
+  "push_to_hub": false,
+  "resume_from_checkpoint": null,
+  "hub_model_id": null,
+  "hub_strategy": "every_save",
+  "hub_token": null,
+  "hub_private_repo": null,
+  "hub_always_push": false,
+  "hub_revision": null,
+  "gradient_checkpointing": true,
+  "gradient_checkpointing_kwargs": null,
+  "include_inputs_for_metrics": false,
+  "include_for_metrics": [],
+  "eval_do_concat_batches": true,
+  "fp16_backend": "auto",
+  "push_to_hub_model_id": null,
+  "push_to_hub_organization": null,
+  "push_to_hub_token": null,
+  "mp_parameters": "",
+  "auto_find_batch_size": false,
+  "full_determinism": false,
+  "torchdynamo": null,
+  "ray_scope": "last",
+  "ddp_timeout": 18000000,
+  "torch_compile": false,
+  "torch_compile_backend": null,
+  "torch_compile_mode": null,
+  "include_tokens_per_second": false,
+  "include_num_input_tokens_seen": false,
+  "neftune_noise_alpha": null,
+  "optim_target_modules": null,
+  "batch_eval_metrics": false,
+  "eval_on_start": false,
+  "use_liger_kernel": false,
+  "liger_kernel_config": null,
+  "eval_use_gather_object": false,
+  "average_tokens_across_devices": true,
+  "sortish_sampler": false,
+  "predict_with_generate": false,
+  "generation_max_length": null,
+  "generation_num_beams": null,
+  "generation_config": null,
+  "tuner_backend": "peft",
+  "vit_gradient_checkpointing": null,
+  "router_aux_loss_coef": 0.0,
+  "check_model": true,
+  "acc_strategy": "token",
+  "train_dataloader_shuffle": true,
+  "max_epochs": null,
+  "aligner_lr": null,
+  "vit_lr": null,
+  "optimizer": null,
+  "use_logits_to_keep": null,
+  "channels": null,
+  "ds3_gather_for_generation": true,
+  "resume_only_model": false,
+  "eval_use_evalscope": false,
+  "eval_dataset": [],
+  "eval_dataset_args": null,
+  "eval_limit": null,
+  "eval_generation_config": null,
+  "model": "qwen/Qwen3-4B-Instruct-2507",
+  "model_type": "qwen3",
+  "model_revision": null,
+  "task_type": "causal_lm",
+  "torch_dtype": "bfloat16",
+  "attn_impl": null,
+  "new_special_tokens": [],
+  "num_labels": null,
+  "problem_type": null,
+  "rope_scaling": null,
+  "device_map": null,
+  "max_memory": {},
+  "max_model_len": null,
+  "local_repo_path": null,
+  "init_strategy": null,
+  "template": "qwen3",
+  "system": null,
+  "max_length": 512,
+  "truncation_strategy": "delete",
+  "max_pixels": null,
+  "agent_template": null,
+  "norm_bbox": null,
+  "use_chat_template": true,
+  "padding_free": false,
+  "padding_side": "right",
+  "loss_scale": "default",
+  "sequence_parallel_size": 1,
+  "response_prefix": null,
+  "template_backend": "swift",
+  "dataset": [
+    "train_dataset.jsonl"
+  ],
+  "val_dataset": [
+    "val_dataset.jsonl"
+  ],
+  "split_dataset_ratio": 0.0,
+  "dataset_num_proc": 1,
+  "load_from_cache_file": true,
+  "dataset_shuffle": true,
+  "val_dataset_shuffle": false,
+  "streaming": false,
+  "interleave_prob": null,
+  "stopping_strategy": "first_exhausted",
+  "shuffle_buffer_size": 1000,
+  "download_mode": "reuse_dataset_if_exists",
+  "columns": {},
+  "strict": false,
+  "model_name": null,
+  "model_author": null,
+  "custom_dataset_info": [],
+  "quant_method": null,
+  "quant_bits": null,
+  "hqq_axis": null,
+  "bnb_4bit_compute_dtype": "bfloat16",
+  "bnb_4bit_quant_type": "nf4",
+  "bnb_4bit_use_double_quant": true,
+  "bnb_4bit_quant_storage": null,
+  "max_new_tokens": 64,
+  "temperature": 0.0,
+  "top_k": null,
+  "top_p": null,
+  "repetition_penalty": null,
+  "num_beams": 1,
+  "stream": false,
+  "stop_words": [],
+  "logprobs": false,
+  "top_logprobs": null,
+  "ckpt_dir": null,
+  "lora_modules": [],
+  "train_type": "lora",
+  "adapters": [],
+  "external_plugins": [],
+  "model_kwargs": {},
+  "load_args": false,
+  "load_data_args": false,
+  "packing": false,
+  "lazy_tokenize": false,
+  "cached_dataset": [],
+  "custom_register_path": [],
+  "use_hf": false,
+  "ignore_args_error": false,
+  "use_swift_lora": false,
+  "freeze_parameters": [],
+  "freeze_parameters_regex": null,
+  "freeze_parameters_ratio": 0.0,
+  "trainable_parameters": [],
+  "trainable_parameters_regex": null,
+  "freeze_llm": false,
+  "freeze_vit": true,
+  "freeze_aligner": true,
+  "target_modules": [
+    "all-linear"
+  ],
+  "target_regex": null,
+  "modules_to_save": [],
+  "lora_rank": 8,
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "lora_bias": "none",
+  "lora_dtype": null,
+  "lorap_lr_ratio": null,
+  "use_rslora": false,
+  "use_dora": false,
+  "lora_ga_batch_size": 2,
+  "lora_ga_iters": 2,
+  "lora_ga_max_length": 1024,
+  "lora_ga_direction": "ArB2r",
+  "lora_ga_scale": "stable",
+  "lora_ga_stable_gamma": 16,
+  "init_weights": true,
+  "fourier_n_frequency": 2000,
+  "fourier_scaling": 300.0,
+  "boft_block_size": 4,
+  "boft_block_num": 0,
+  "boft_n_butterfly_factor": 1,
+  "boft_dropout": 0.0,
+  "vera_rank": 256,
+  "vera_projection_prng_key": 0,
+  "vera_dropout": 0.0,
+  "vera_d_initial": 0.1,
+  "adapter_act": "gelu",
+  "adapter_length": 128,
+  "use_galore": false,
+  "galore_target_modules": null,
+  "galore_rank": 128,
+  "galore_update_proj_gap": 50,
+  "galore_scale": 1.0,
+  "galore_proj_type": "std",
+  "galore_optim_per_parameter": false,
+  "galore_with_embedding": false,
+  "galore_quantization": false,
+  "galore_proj_quant": false,
+  "galore_proj_bits": 4,
+  "galore_proj_group_size": 256,
+  "galore_cos_threshold": 0.4,
+  "galore_gamma_proj": 2,
+  "galore_queue_size": 5,
+  "adalora_target_r": 8,
+  "adalora_init_r": 12,
+  "adalora_tinit": 0,
+  "adalora_tfinal": 0,
+  "adalora_deltaT": 1,
+  "adalora_beta1": 0.85,
+  "adalora_beta2": 0.85,
+  "adalora_orth_reg_weight": 0.5,
+  "llamapro_num_new_blocks": 4,
+  "llamapro_num_groups": null,
+  "lisa_activated_layers": 0,
+  "lisa_step_interval": 20,
+  "reft_layer_key": null,
+  "reft_layers": null,
+  "reft_rank": 4,
+  "reft_intervention_type": "LoreftIntervention",
+  "reft_args": null,
+  "swanlab_token": null,
+  "swanlab_project": null,
+  "swanlab_workspace": null,
+  "swanlab_exp_name": null,
+  "swanlab_lark_webhook_url": null,
+  "swanlab_lark_secret": null,
+  "swanlab_mode": "cloud",
+  "add_version": true,
+  "create_checkpoint_symlink": false,
+  "loss_type": null,
+  "metric": null,
+  "zero_hpz_partition_size": null,
+  "deepspeed_autotp_size": null,
+  "rank": -1,
+  "global_world_size": 1,
+  "local_world_size": 1,
+  "model_suffix": "Qwen3-4B-Instruct-2507",
+  "model_info": "ModelInfo(model_type='qwen3', model_dir='/root/.cache/modelscope/hub/models/qwen/Qwen3-4B-Instruct-2507', torch_dtype=torch.bfloat16, max_model_len=262144, quant_method=None, quant_bits=None, rope_scaling=None, is_moe_model=False, config=None, task_type='causal_lm', num_labels=None)",
+  "model_meta": "ModelMeta(model_type='qwen3', model_groups=[ModelGroup(models=[Model(ms_model_id='Qwen/Qwen3-0.6B-Base', hf_model_id='Qwen/Qwen3-0.6B-Base', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-1.7B-Base', hf_model_id='Qwen/Qwen3-1.7B-Base', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-4B-Base', hf_model_id='Qwen/Qwen3-4B-Base', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-8B-Base', hf_model_id='Qwen/Qwen3-8B-Base', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-14B-Base', hf_model_id='Qwen/Qwen3-14B-Base', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-0.6B', hf_model_id='Qwen/Qwen3-0.6B', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-1.7B', hf_model_id='Qwen/Qwen3-1.7B', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-4B', hf_model_id='Qwen/Qwen3-4B', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-8B', hf_model_id='Qwen/Qwen3-8B', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-14B', hf_model_id='Qwen/Qwen3-14B', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-32B', hf_model_id='Qwen/Qwen3-32B', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-0.6B-FP8', hf_model_id='Qwen/Qwen3-0.6B-FP8', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-1.7B-FP8', hf_model_id='Qwen/Qwen3-1.7B-FP8', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-4B-FP8', hf_model_id='Qwen/Qwen3-4B-FP8', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-8B-FP8', hf_model_id='Qwen/Qwen3-8B-FP8', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-14B-FP8', hf_model_id='Qwen/Qwen3-14B-FP8', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-32B-FP8', hf_model_id='Qwen/Qwen3-32B-FP8', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-4B-AWQ', hf_model_id='Qwen/Qwen3-4B-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-8B-AWQ', hf_model_id='Qwen/Qwen3-8B-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-14B-AWQ', hf_model_id='Qwen/Qwen3-14B-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-32B-AWQ', hf_model_id='Qwen/Qwen3-32B-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='swift/Qwen3-32B-AWQ', hf_model_id=None, model_path=None, ms_revision=None, hf_revision=None)], ignore_patterns=None, requires=None, tags=[]), ModelGroup(models=[Model(ms_model_id='Qwen/Qwen3-4B-Instruct-2507', hf_model_id='Qwen/Qwen3-4B-Instruct-2507', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen3-4B-Instruct-2507-FP8', hf_model_id='Qwen/Qwen3-4B-Instruct-2507-FP8', model_path=None, ms_revision=None, hf_revision=None)], ignore_patterns=None, requires=None, tags=[])], template='qwen3', get_function=<function get_model_tokenizer_with_flash_attn at 0x7effcc68db20>, model_arch=ModelKeys(arch_name='llama', embedding='model.embed_tokens', module_list='model.layers', lm_head='lm_head', q_proj='model.layers.{}.self_attn.q_proj', k_proj='model.layers.{}.self_attn.k_proj', v_proj='model.layers.{}.self_attn.v_proj', o_proj='model.layers.{}.self_attn.o_proj', attention='model.layers.{}.self_attn', mlp='model.layers.{}.mlp', down_proj='model.layers.{}.mlp.down_proj', qkv_proj=None, qk_proj=None, qa_proj=None, qb_proj=None, kv_proj=None, kva_proj=None, kvb_proj=None), architectures=['Qwen3ForCausalLM'], additional_saved_files=[], torch_dtype=None, is_multimodal=False, is_reward=False, task_type=None, ignore_patterns=None, requires=['transformers>=4.51'], tags=[])",
+  "model_dir": "/root/.cache/modelscope/hub/models/qwen/Qwen3-4B-Instruct-2507",
+  "hub": "<class 'swift.hub.hub.MSHub'>",
+  "evaluation_strategy": "steps",
+  "training_args": "Seq2SeqTrainingArguments(output_dir='/root/projects/ms-swift-main/20250822/output/domain_classifier_sft/v1-20250822-090124', overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, eval_strategy=<IntervalStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=4, per_device_eval_batch_size=1, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, eval_delay=0, torch_empty_cache_steps=None, learning_rate=5e-05, weight_decay=0.01, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=<SchedulerType.COSINE: 'cosine'>, lr_scheduler_kwargs=None, warmup_ratio=0.1, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir='/root/projects/ms-swift-main/20250822/output/domain_classifier_sft/v1-20250822-090124/runs', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=True, logging_steps=10, logging_nan_inf_filter=True, save_strategy=<SaveStrategy.STEPS: 'steps'>, save_steps=100, save_total_limit=None, save_safetensors=True, save_on_each_node=False, save_only_model=False, restore_callback_states_from_checkpoint=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=42, jit_mode_eval=False, use_ipex=False, bf16=True, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=0, ddp_backend=None, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=100, dataloader_num_workers=1, dataloader_prefetch_factor=10, past_index=-1, run_name='/root/projects/ms-swift-main/20250822/output/domain_classifier_sft/v1-20250822-090124', disable_tqdm=False, remove_unused_columns=False, label_names=None, load_best_model_at_end=True, metric_for_best_model='eval_loss', greater_is_better=False, ignore_data_skip=False, fsdp=[], fsdp_min_num_params=0, fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=AcceleratorConfig(split_batches=False, dispatch_batches=False, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None, use_configured_state=False), deepspeed=None, label_smoothing_factor=0.0, optim=<OptimizerNames.ADAMW_TORCH: 'adamw_torch'>, optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=['tensorboard'], ddp_find_unused_parameters=None, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=None, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy=<HubStrategy.EVERY_SAVE: 'every_save'>, hub_token=None, hub_private_repo=None, hub_always_push=False, hub_revision=None, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, include_for_metrics=[], eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=18000000, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, include_tokens_per_second=None, include_num_input_tokens_seen=None, neftune_noise_alpha=None, optim_target_modules=None, batch_eval_metrics=False, eval_on_start=False, use_liger_kernel=False, liger_kernel_config=None, eval_use_gather_object=False, average_tokens_across_devices=None, sortish_sampler=False, predict_with_generate=False, generation_max_length=None, generation_num_beams=None, generation_config=None, tuner_backend='peft', vit_gradient_checkpointing=True, router_aux_loss_coef=0.0, check_model=True, acc_strategy='token', train_dataloader_shuffle=True, max_epochs=None, aligner_lr=None, vit_lr=None, optimizer=None, use_logits_to_keep=None, channels=None, ds3_gather_for_generation=True, resume_only_model=False, eval_use_evalscope=False, eval_dataset=[], eval_dataset_args=None, eval_limit=None, eval_generation_config=None, sft_alpha=0, train_type='lora', local_repo_path=None, galore_config=None)"
+}
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,86 @@
+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}
--- a/completions.jsonl
+++ b/completions.jsonl
--- a/config.json
+++ b/config.json
@@ -0,0 +1,69 @@
+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2560,
+  "initializer_range": 0.02,
+  "intermediate_size": 9728,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 262144,
+  "max_window_layers": 36,
+  "model_type": "qwen3",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 36,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 5000000,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.55.2",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,13 @@
+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "temperature": 0.7,
+  "top_k": 20,
+  "top_p": 0.8,
+  "transformers_version": "4.55.2"
+}
--- a/logging.jsonl
+++ b/logging.jsonl
@@ -0,0 +1,614 @@
+{"loss": 8e-08, "grad_norm": 0.92945963, "learning_rate": 0.0001, "memory(GiB)": 56.12, "train_speed(iter/s)": 0.027919, "completions/mean_length": 171.91667175, "completions/min_length": 141.0, "completions/max_length": 220.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.91666669, "rewards/ClassificationReward/std": 0.28867513, "reward": 0.91666669, "reward_std": 0.28867513, "frac_reward_zero_std": 0.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0002053, "global_step/max_steps": "1/9742", "percentage": "0.01%", "elapsed_time": "4s", "remaining_time": "13h 1m 3s"}
+{"loss": 5e-08, "grad_norm": 0.0, "learning_rate": 0.0001, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.091194, "completions/mean_length": 102.04166985, "completions/min_length": 43.5, "completions/max_length": 242.0, "completions/clipped_ratio": 0.02083333, "rewards/ClassificationReward/mean": 0.83333333, "rewards/ClassificationReward/std": 0.32344588, "reward": 0.83333334, "reward_std": 0.32344587, "frac_reward_zero_std": 0.25, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.00102648, "global_step/max_steps": "5/9742", "percentage": "0.05%", "elapsed_time": "23s", "remaining_time": "12h 53m 8s"}
+{"loss": 6e-08, "grad_norm": 1.16475332, "learning_rate": 0.0001, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.133384, "completions/mean_length": 87.71667023, "completions/min_length": 35.0, "completions/max_length": 194.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.93333335, "rewards/ClassificationReward/std": 0.2309401, "reward": 0.93333335, "reward_std": 0.2309401, "frac_reward_zero_std": 0.2, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.00205297, "global_step/max_steps": "10/9742", "percentage": "0.10%", "elapsed_time": "43s", "remaining_time": "11h 53m 5s"}
+{"loss": 7e-08, "grad_norm": 1.98231626, "learning_rate": 0.0001, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.146773, "completions/mean_length": 89.68333435, "completions/min_length": 32.0, "completions/max_length": 205.4, "completions/clipped_ratio": 0.01666667, "rewards/ClassificationReward/mean": 0.91666667, "rewards/ClassificationReward/std": 0.25105497, "reward": 0.91666669, "reward_std": 0.25105497, "frac_reward_zero_std": 0.2, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.00307945, "global_step/max_steps": "15/9742", "percentage": "0.15%", "elapsed_time": "1m 11s", "remaining_time": "12h 49m 24s"}
+{"loss": 3e-08, "grad_norm": 1.37325227, "learning_rate": 0.0001, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.172319, "completions/mean_length": 58.31666718, "completions/min_length": 25.4, "completions/max_length": 116.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.60000001, "rewards/ClassificationReward/std": 0.19331995, "reward": 0.60000001, "reward_std": 0.19331994, "frac_reward_zero_std": 0.4, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.00410593, "global_step/max_steps": "20/9742", "percentage": "0.21%", "elapsed_time": "1m 25s", "remaining_time": "11h 29m 6s"}
+{"loss": 3e-08, "grad_norm": 1.89569712, "learning_rate": 0.0001, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.200692, "completions/mean_length": 29.05000076, "completions/min_length": 20.4, "completions/max_length": 54.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.65, "rewards/ClassificationReward/std": 0.43208097, "reward": 0.65000001, "reward_std": 0.43208097, "frac_reward_zero_std": 0.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.00513242, "global_step/max_steps": "25/9742", "percentage": "0.26%", "elapsed_time": "1m 33s", "remaining_time": "10h 6m 6s"}
+{"loss": 4e-08, "grad_norm": 4.35208368, "learning_rate": 0.0001, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.228762, "completions/mean_length": 23.65000076, "completions/min_length": 18.0, "completions/max_length": 34.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.58333334, "rewards/ClassificationReward/std": 0.41496493, "reward": 0.58333335, "reward_std": 0.41496492, "frac_reward_zero_std": 0.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0061589, "global_step/max_steps": "30/9742", "percentage": "0.31%", "elapsed_time": "1m 40s", "remaining_time": "9h 0m 16s"}
+{"loss": 2e-08, "grad_norm": 2.38611293, "learning_rate": 0.0001, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.250925, "completions/mean_length": 19.55000076, "completions/min_length": 15.0, "completions/max_length": 28.4, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.65000001, "rewards/ClassificationReward/std": 0.15620822, "reward": 0.65000001, "reward_std": 0.15620822, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.00718538, "global_step/max_steps": "35/9742", "percentage": "0.36%", "elapsed_time": "1m 48s", "remaining_time": "8h 21m 25s"}
+{"loss": 3e-08, "grad_norm": 0.0, "learning_rate": 0.0001, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.271165, "completions/mean_length": 21.50000095, "completions/min_length": 16.0, "completions/max_length": 37.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.85000001, "rewards/ClassificationReward/std": 0.21845579, "reward": 0.85000001, "reward_std": 0.21845578, "frac_reward_zero_std": 0.4, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.00821187, "global_step/max_steps": "40/9742", "percentage": "0.41%", "elapsed_time": "1m 56s", "remaining_time": "7h 50m 58s"}
+{"loss": 5e-08, "grad_norm": 0.0, "learning_rate": 9.999e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.285747, "completions/mean_length": 29.28333435, "completions/min_length": 19.2, "completions/max_length": 46.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.9, "rewards/ClassificationReward/std": 0.22603832, "reward": 0.90000001, "reward_std": 0.22603832, "frac_reward_zero_std": 0.4, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.00923835, "global_step/max_steps": "45/9742", "percentage": "0.46%", "elapsed_time": "2m 6s", "remaining_time": "7h 34m 13s"}
+{"loss": 0.0, "grad_norm": 3.11884236, "learning_rate": 9.999e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.295412, "completions/mean_length": 30.80000153, "completions/min_length": 20.0, "completions/max_length": 48.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.9, "rewards/ClassificationReward/std": 0.10444659, "reward": 0.9, "reward_std": 0.10444659, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.01026483, "global_step/max_steps": "50/9742", "percentage": "0.51%", "elapsed_time": "2m 18s", "remaining_time": "7h 26m 38s"}
+{"loss": 3e-08, "grad_norm": 0.0, "learning_rate": 9.999e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.299279, "completions/mean_length": 33.91666718, "completions/min_length": 23.4, "completions/max_length": 45.6, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333333, "rewards/ClassificationReward/std": 0.13558492, "reward": 0.78333334, "reward_std": 0.13558491, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.01129132, "global_step/max_steps": "55/9742", "percentage": "0.56%", "elapsed_time": "2m 32s", "remaining_time": "7h 28m 26s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 9.999e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.297044, "completions/mean_length": 70.08333549, "completions/min_length": 40.4, "completions/max_length": 164.8, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.81666666, "rewards/ClassificationReward/std": 0.1683033, "reward": 0.81666667, "reward_std": 0.16830329, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0123178, "global_step/max_steps": "60/9742", "percentage": "0.62%", "elapsed_time": "2m 50s", "remaining_time": "7h 39m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.999e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.297892, "completions/mean_length": 33.16666718, "completions/min_length": 25.0, "completions/max_length": 47.8, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.01334428, "global_step/max_steps": "65/9742", "percentage": "0.67%", "elapsed_time": "3m 7s", "remaining_time": "7h 44m 28s"}
+{"loss": 2e-08, "grad_norm": 2.71569967, "learning_rate": 9.999e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.304646, "completions/mean_length": 34.53333511, "completions/min_length": 22.2, "completions/max_length": 70.6, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.63333334, "rewards/ClassificationReward/std": 0.3089092, "reward": 0.63333334, "reward_std": 0.30890918, "frac_reward_zero_std": 0.2, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.01437077, "global_step/max_steps": "70/9742", "percentage": "0.72%", "elapsed_time": "3m 18s", "remaining_time": "7h 37m 44s"}
+{"loss": 4e-08, "grad_norm": 2.58471942, "learning_rate": 9.999e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.313111, "completions/mean_length": 32.03333359, "completions/min_length": 21.8, "completions/max_length": 45.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.83333334, "rewards/ClassificationReward/std": 0.25919395, "reward": 0.83333335, "reward_std": 0.25919395, "frac_reward_zero_std": 0.4, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.01539725, "global_step/max_steps": "75/9742", "percentage": "0.77%", "elapsed_time": "3m 28s", "remaining_time": "7h 27m 57s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 9.998e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.3188, "completions/mean_length": 30.33333397, "completions/min_length": 21.8, "completions/max_length": 42.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.93333334, "rewards/ClassificationReward/std": 0.0984732, "reward": 0.93333334, "reward_std": 0.0984732, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.01642373, "global_step/max_steps": "80/9742", "percentage": "0.82%", "elapsed_time": "3m 39s", "remaining_time": "7h 22m 42s"}
+{"loss": 3e-08, "grad_norm": 0.0, "learning_rate": 9.998e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.324292, "completions/mean_length": 30.86666794, "completions/min_length": 21.0, "completions/max_length": 48.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.96666667, "rewards/ClassificationReward/std": 0.11547005, "reward": 0.96666667, "reward_std": 0.11547005, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.01745022, "global_step/max_steps": "85/9742", "percentage": "0.87%", "elapsed_time": "3m 51s", "remaining_time": "7h 17m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.998e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.327938, "completions/mean_length": 36.93333397, "completions/min_length": 28.4, "completions/max_length": 49.4, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0184767, "global_step/max_steps": "90/9742", "percentage": "0.92%", "elapsed_time": "4m 3s", "remaining_time": "7h 15m 7s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.998e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.330446, "completions/mean_length": 36.03333397, "completions/min_length": 28.4, "completions/max_length": 56.8, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.45, "rewards/ClassificationReward/std": 0.0904534, "reward": 0.45, "reward_std": 0.0904534, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.01950318, "global_step/max_steps": "95/9742", "percentage": "0.98%", "elapsed_time": "4m 16s", "remaining_time": "7h 14m 5s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.997e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.333037, "completions/mean_length": 33.26666794, "completions/min_length": 26.8, "completions/max_length": 46.6, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.02052967, "global_step/max_steps": "100/9742", "percentage": "1.03%", "elapsed_time": "4m 29s", "remaining_time": "7h 12m 42s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 9.997e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.33598, "completions/mean_length": 38.95000153, "completions/min_length": 27.6, "completions/max_length": 52.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.93333334, "rewards/ClassificationReward/std": 0.0984732, "reward": 0.93333334, "reward_std": 0.0984732, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.02155615, "global_step/max_steps": "105/9742", "percentage": "1.08%", "elapsed_time": "4m 41s", "remaining_time": "7h 10m 37s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.997e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.340129, "completions/mean_length": 30.71666794, "completions/min_length": 25.4, "completions/max_length": 41.6, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.02258263, "global_step/max_steps": "110/9742", "percentage": "1.13%", "elapsed_time": "4m 52s", "remaining_time": "7h 6m 43s"}
+{"loss": 5e-08, "grad_norm": 0.0, "learning_rate": 9.997e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.344857, "completions/mean_length": 49.58333435, "completions/min_length": 38.2, "completions/max_length": 70.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.58333333, "rewards/ClassificationReward/std": 0.28528221, "reward": 0.58333335, "reward_std": 0.28528221, "frac_reward_zero_std": 0.4, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.02360912, "global_step/max_steps": "115/9742", "percentage": "1.18%", "elapsed_time": "5m 2s", "remaining_time": "7h 2m 0s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.996e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.35115, "completions/mean_length": 18.26666679, "completions/min_length": 16.0, "completions/max_length": 23.4, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.7, "rewards/ClassificationReward/std": 0.10444659, "reward": 0.7, "reward_std": 0.10444659, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0246356, "global_step/max_steps": "120/9742", "percentage": "1.23%", "elapsed_time": "5m 10s", "remaining_time": "6h 55m 15s"}
+{"loss": -0.0, "grad_norm": 0.0, "learning_rate": 9.996e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.358416, "completions/mean_length": 8.2, "completions/min_length": 8.0, "completions/max_length": 10.4, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.85, "rewards/ClassificationReward/std": 0.0904534, "reward": 0.85, "reward_std": 0.0904534, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.02566208, "global_step/max_steps": "125/9742", "percentage": "1.28%", "elapsed_time": "5m 17s", "remaining_time": "6h 47m 26s"}
+{"loss": 5e-08, "grad_norm": 3.6789124, "learning_rate": 9.996e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.367281, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.95000001, "rewards/ClassificationReward/std": 0.17320508, "reward": 0.95000001, "reward_std": 0.17320508, "frac_reward_zero_std": 0.4, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.02668856, "global_step/max_steps": "130/9742", "percentage": "1.33%", "elapsed_time": "5m 22s", "remaining_time": "6h 37m 58s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.995e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.375349, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.02771505, "global_step/max_steps": "135/9742", "percentage": "1.39%", "elapsed_time": "5m 28s", "remaining_time": "6h 29m 48s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.995e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.382963, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.02874153, "global_step/max_steps": "140/9742", "percentage": "1.44%", "elapsed_time": "5m 34s", "remaining_time": "6h 22m 26s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.995e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.390391, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.02976801, "global_step/max_steps": "145/9742", "percentage": "1.49%", "elapsed_time": "5m 40s", "remaining_time": "6h 15m 30s"}
+{"loss": 4e-08, "grad_norm": 3.29745007, "learning_rate": 9.994e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.397522, "completions/mean_length": 7.9833334, "completions/min_length": 7.8, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.95, "rewards/ClassificationReward/std": 0.13558492, "reward": 0.95000001, "reward_std": 0.13558491, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0307945, "global_step/max_steps": "150/9742", "percentage": "1.54%", "elapsed_time": "5m 46s", "remaining_time": "6h 9m 6s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.994e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.40465, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.03182098, "global_step/max_steps": "155/9742", "percentage": "1.59%", "elapsed_time": "5m 52s", "remaining_time": "6h 2m 54s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.993e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.411446, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.03284746, "global_step/max_steps": "160/9742", "percentage": "1.64%", "elapsed_time": "5m 57s", "remaining_time": "5h 57m 11s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.993e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.418039, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.03387395, "global_step/max_steps": "165/9742", "percentage": "1.69%", "elapsed_time": "6m 3s", "remaining_time": "5h 51m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.992e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.424355, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.03490043, "global_step/max_steps": "170/9742", "percentage": "1.75%", "elapsed_time": "6m 9s", "remaining_time": "5h 46m 50s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.992e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.430652, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.03592691, "global_step/max_steps": "175/9742", "percentage": "1.80%", "elapsed_time": "6m 15s", "remaining_time": "5h 42m 0s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.992e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.436355, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0369534, "global_step/max_steps": "180/9742", "percentage": "1.85%", "elapsed_time": "6m 21s", "remaining_time": "5h 37m 46s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.991e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.442142, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.03797988, "global_step/max_steps": "185/9742", "percentage": "1.90%", "elapsed_time": "6m 27s", "remaining_time": "5h 33m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.991e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.44786, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.03900636, "global_step/max_steps": "190/9742", "percentage": "1.95%", "elapsed_time": "6m 33s", "remaining_time": "5h 29m 29s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.99e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.453423, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.04003285, "global_step/max_steps": "195/9742", "percentage": "2.00%", "elapsed_time": "6m 39s", "remaining_time": "5h 25m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.99e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.458915, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.04105933, "global_step/max_steps": "200/9742", "percentage": "2.05%", "elapsed_time": "6m 44s", "remaining_time": "5h 21m 53s"}
+{"loss": 2e-08, "grad_norm": 17.44800758, "learning_rate": 9.989e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.464069, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.04208581, "global_step/max_steps": "205/9742", "percentage": "2.10%", "elapsed_time": "6m 50s", "remaining_time": "5h 18m 28s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.989e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.469317, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.63333333, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.63333333, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0431123, "global_step/max_steps": "210/9742", "percentage": "2.16%", "elapsed_time": "6m 56s", "remaining_time": "5h 15m 2s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.988e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.474484, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.04413878, "global_step/max_steps": "215/9742", "percentage": "2.21%", "elapsed_time": "7m 2s", "remaining_time": "5h 11m 44s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.987e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.479516, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.04516526, "global_step/max_steps": "220/9742", "percentage": "2.26%", "elapsed_time": "7m 7s", "remaining_time": "5h 8m 35s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.987e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.484232, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.04619175, "global_step/max_steps": "225/9742", "percentage": "2.31%", "elapsed_time": "7m 13s", "remaining_time": "5h 5m 42s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.986e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.48886, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.96666666, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.96666667, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.04721823, "global_step/max_steps": "230/9742", "percentage": "2.36%", "elapsed_time": "7m 19s", "remaining_time": "5h 2m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.986e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.493443, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.95, "rewards/ClassificationReward/std": 0.0904534, "reward": 0.95, "reward_std": 0.0904534, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.04824471, "global_step/max_steps": "235/9742", "percentage": "2.41%", "elapsed_time": "7m 25s", "remaining_time": "5h 0m 12s"}
+{"loss": 3e-08, "grad_norm": 0.0, "learning_rate": 9.985e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.498105, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.96666666, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.96666667, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0492712, "global_step/max_steps": "240/9742", "percentage": "2.46%", "elapsed_time": "7m 30s", "remaining_time": "4h 57m 28s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.984e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.502407, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.05029768, "global_step/max_steps": "245/9742", "percentage": "2.51%", "elapsed_time": "7m 36s", "remaining_time": "4h 55m 1s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.984e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.506616, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.05132416, "global_step/max_steps": "250/9742", "percentage": "2.57%", "elapsed_time": "7m 42s", "remaining_time": "4h 52m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.983e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.510647, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.05235065, "global_step/max_steps": "255/9742", "percentage": "2.62%", "elapsed_time": "7m 48s", "remaining_time": "4h 50m 24s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.982e-05, "memory(GiB)": 71.83, "train_speed(iter/s)": 0.502643, "completions/mean_length": 24.95000114, "completions/min_length": 8.0, "completions/max_length": 211.4, "completions/clipped_ratio": 0.03333333, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.05337713, "global_step/max_steps": "260/9742", "percentage": "2.67%", "elapsed_time": "8m 6s", "remaining_time": "4h 55m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.982e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.475277, "completions/mean_length": 75.35000458, "completions/min_length": 8.0, "completions/max_length": 210.2, "completions/clipped_ratio": 0.13333333, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.05440361, "global_step/max_steps": "265/9742", "percentage": "2.72%", "elapsed_time": "8m 46s", "remaining_time": "5h 13m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.981e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.439387, "completions/mean_length": 95.2333374, "completions/min_length": 8.0, "completions/max_length": 311.2, "completions/clipped_ratio": 0.16666667, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0554301, "global_step/max_steps": "270/9742", "percentage": "2.77%", "elapsed_time": "9m 43s", "remaining_time": "5h 41m 9s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.98e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.439313, "completions/mean_length": 25.65, "completions/min_length": 8.0, "completions/max_length": 109.0, "completions/clipped_ratio": 0.03333333, "rewards/ClassificationReward/mean": 0.68333333, "rewards/ClassificationReward/std": 0.10298574, "reward": 0.68333334, "reward_std": 0.10298573, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.05645658, "global_step/max_steps": "275/9742", "percentage": "2.82%", "elapsed_time": "9m 54s", "remaining_time": "5h 41m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.98e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.44257, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.05748306, "global_step/max_steps": "280/9742", "percentage": "2.87%", "elapsed_time": "10m 1s", "remaining_time": "5h 38m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.979e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.446355, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.05850955, "global_step/max_steps": "285/9742", "percentage": "2.93%", "elapsed_time": "10m 7s", "remaining_time": "5h 35m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.978e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.450075, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.05953603, "global_step/max_steps": "290/9742", "percentage": "2.98%", "elapsed_time": "10m 13s", "remaining_time": "5h 33m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.977e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.453681, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.06056251, "global_step/max_steps": "295/9742", "percentage": "3.03%", "elapsed_time": "10m 19s", "remaining_time": "5h 30m 30s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.977e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.457272, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.061589, "global_step/max_steps": "300/9742", "percentage": "3.08%", "elapsed_time": "10m 25s", "remaining_time": "5h 27m 52s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.976e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.460896, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.06261548, "global_step/max_steps": "305/9742", "percentage": "3.13%", "elapsed_time": "10m 30s", "remaining_time": "5h 25m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.975e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.464171, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.06364196, "global_step/max_steps": "310/9742", "percentage": "3.18%", "elapsed_time": "10m 36s", "remaining_time": "5h 22m 56s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 9.974e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.467722, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.86666667, "rewards/ClassificationReward/std": 0.09847319, "reward": 0.86666667, "reward_std": 0.0984732, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.06466845, "global_step/max_steps": "315/9742", "percentage": "3.23%", "elapsed_time": "10m 42s", "remaining_time": "5h 20m 27s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.973e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.471052, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.06569493, "global_step/max_steps": "320/9742", "percentage": "3.28%", "elapsed_time": "10m 48s", "remaining_time": "5h 18m 9s"}
+{"loss": 3e-08, "grad_norm": 28.09523964, "learning_rate": 9.973e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.474473, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.76666666, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.76666667, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.06672141, "global_step/max_steps": "325/9742", "percentage": "3.34%", "elapsed_time": "10m 53s", "remaining_time": "5h 15m 48s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.972e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.477725, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0677479, "global_step/max_steps": "330/9742", "percentage": "3.39%", "elapsed_time": "10m 59s", "remaining_time": "5h 13m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.971e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.480733, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.06877438, "global_step/max_steps": "335/9742", "percentage": "3.44%", "elapsed_time": "11m 5s", "remaining_time": "5h 11m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.97e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.483839, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.06980086, "global_step/max_steps": "340/9742", "percentage": "3.49%", "elapsed_time": "11m 11s", "remaining_time": "5h 9m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.969e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.486909, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.07082735, "global_step/max_steps": "345/9742", "percentage": "3.54%", "elapsed_time": "11m 17s", "remaining_time": "5h 7m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.968e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.48996, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.07185383, "global_step/max_steps": "350/9742", "percentage": "3.59%", "elapsed_time": "11m 23s", "remaining_time": "5h 5m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.967e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.492967, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.07288031, "global_step/max_steps": "355/9742", "percentage": "3.64%", "elapsed_time": "11m 29s", "remaining_time": "5h 3m 41s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.966e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.495943, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.0739068, "global_step/max_steps": "360/9742", "percentage": "3.70%", "elapsed_time": "11m 34s", "remaining_time": "5h 1m 49s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.965e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.498751, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.38333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.38333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.07493328, "global_step/max_steps": "365/9742", "percentage": "3.75%", "elapsed_time": "11m 40s", "remaining_time": "5h 0m 4s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.964e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.501735, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.07595976, "global_step/max_steps": "370/9742", "percentage": "3.80%", "elapsed_time": "11m 46s", "remaining_time": "4h 58m 13s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.963e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.504515, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.07698625, "global_step/max_steps": "375/9742", "percentage": "3.85%", "elapsed_time": "11m 52s", "remaining_time": "4h 56m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.963e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.507268, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.07801273, "global_step/max_steps": "380/9742", "percentage": "3.90%", "elapsed_time": "11m 58s", "remaining_time": "4h 54m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.962e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.509984, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.07903921, "global_step/max_steps": "385/9742", "percentage": "3.95%", "elapsed_time": "12m 3s", "remaining_time": "4h 53m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.961e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.512426, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08006569, "global_step/max_steps": "390/9742", "percentage": "4.00%", "elapsed_time": "12m 10s", "remaining_time": "4h 51m 46s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 9.959e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.515357, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.63333334, "rewards/ClassificationReward/std": 0.14818843, "reward": 0.63333334, "reward_std": 0.14818843, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08109218, "global_step/max_steps": "395/9742", "percentage": "4.05%", "elapsed_time": "12m 15s", "remaining_time": "4h 50m 3s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.958e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.517983, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08211866, "global_step/max_steps": "400/9742", "percentage": "4.11%", "elapsed_time": "12m 21s", "remaining_time": "4h 48m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.957e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.520531, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08314514, "global_step/max_steps": "405/9742", "percentage": "4.16%", "elapsed_time": "12m 27s", "remaining_time": "4h 47m 2s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.956e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.523024, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08417163, "global_step/max_steps": "410/9742", "percentage": "4.21%", "elapsed_time": "12m 32s", "remaining_time": "4h 45m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.955e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.525274, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08519811, "global_step/max_steps": "415/9742", "percentage": "4.26%", "elapsed_time": "12m 39s", "remaining_time": "4h 44m 19s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.954e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.527688, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08622459, "global_step/max_steps": "420/9742", "percentage": "4.31%", "elapsed_time": "12m 44s", "remaining_time": "4h 42m 57s"}
+{"loss": 4e-08, "grad_norm": 0.0, "learning_rate": 9.953e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.530171, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.63333333, "rewards/ClassificationReward/std": 0.17632309, "reward": 0.63333334, "reward_std": 0.17632309, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08725108, "global_step/max_steps": "425/9742", "percentage": "4.36%", "elapsed_time": "12m 50s", "remaining_time": "4h 41m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.952e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.5325, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08827756, "global_step/max_steps": "430/9742", "percentage": "4.41%", "elapsed_time": "12m 56s", "remaining_time": "4h 40m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.951e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.535041, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.63333333, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.63333333, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.08930404, "global_step/max_steps": "435/9742", "percentage": "4.47%", "elapsed_time": "13m 2s", "remaining_time": "4h 38m 51s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.95e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.537321, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09033053, "global_step/max_steps": "440/9742", "percentage": "4.52%", "elapsed_time": "13m 7s", "remaining_time": "4h 37m 36s"}
+{"loss": 2e-08, "grad_norm": 1.72811604, "learning_rate": 9.949e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.539606, "completions/mean_length": 8.16666679, "completions/min_length": 8.0, "completions/max_length": 8.4, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.76666667, "rewards/ClassificationReward/std": 0.11547005, "reward": 0.76666667, "reward_std": 0.11547005, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09135701, "global_step/max_steps": "445/9742", "percentage": "4.57%", "elapsed_time": "13m 13s", "remaining_time": "4h 36m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.947e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.541835, "completions/mean_length": 8.2333334, "completions/min_length": 8.2, "completions/max_length": 8.6, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09238349, "global_step/max_steps": "450/9742", "percentage": "4.62%", "elapsed_time": "13m 19s", "remaining_time": "4h 35m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.946e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.543995, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09340998, "global_step/max_steps": "455/9742", "percentage": "4.67%", "elapsed_time": "13m 25s", "remaining_time": "4h 33m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.945e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.546173, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09443646, "global_step/max_steps": "460/9742", "percentage": "4.72%", "elapsed_time": "13m 31s", "remaining_time": "4h 32m 48s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.944e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.548341, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09546294, "global_step/max_steps": "465/9742", "percentage": "4.77%", "elapsed_time": "13m 37s", "remaining_time": "4h 31m 39s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.943e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.550296, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09648943, "global_step/max_steps": "470/9742", "percentage": "4.82%", "elapsed_time": "13m 43s", "remaining_time": "4h 30m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.941e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.552366, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09751591, "global_step/max_steps": "475/9742", "percentage": "4.88%", "elapsed_time": "13m 48s", "remaining_time": "4h 29m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.94e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.554417, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09854239, "global_step/max_steps": "480/9742", "percentage": "4.93%", "elapsed_time": "13m 54s", "remaining_time": "4h 28m 27s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.939e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.556433, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.09956888, "global_step/max_steps": "485/9742", "percentage": "4.98%", "elapsed_time": "14m 0s", "remaining_time": "4h 27m 24s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.938e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.55845, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10059536, "global_step/max_steps": "490/9742", "percentage": "5.03%", "elapsed_time": "14m 6s", "remaining_time": "4h 26m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.936e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.560202, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10162184, "global_step/max_steps": "495/9742", "percentage": "5.08%", "elapsed_time": "14m 12s", "remaining_time": "4h 25m 27s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.935e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.562125, "completions/mean_length": 8.01666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10264833, "global_step/max_steps": "500/9742", "percentage": "5.13%", "elapsed_time": "14m 18s", "remaining_time": "4h 24m 28s"}
+{"eval_loss": 0.0, "eval_completions/mean_length": 8.00523722, "eval_completions/min_length": 7.974122, "eval_completions/max_length": 8.07393715, "eval_completions/clipped_ratio": 0.0, "eval_rewards/ClassificationReward/mean": 0.75369686, "eval_rewards/ClassificationReward/std": 0.0035044, "eval_reward": 0.75369686, "eval_reward_std": 0.0035044, "eval_frac_reward_zero_std": 0.99075786, "eval_clip_ratio/low_mean": 0.0, "eval_clip_ratio/low_min": 0.0, "eval_clip_ratio/high_mean": 0.0, "eval_clip_ratio/high_max": 0.0, "eval_clip_ratio/region_mean": 0.0, "eval_runtime": 158.7439, "eval_samples_per_second": 3.408, "eval_steps_per_second": 0.29, "epoch": 0.10264833, "global_step/max_steps": "500/9742", "percentage": "5.13%", "elapsed_time": "16m 57s", "remaining_time": "5h 13m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.934e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.478946, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10367481, "global_step/max_steps": "505/9742", "percentage": "5.18%", "elapsed_time": "17m 3s", "remaining_time": "5h 11m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.933e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.481052, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10470129, "global_step/max_steps": "510/9742", "percentage": "5.24%", "elapsed_time": "17m 9s", "remaining_time": "5h 10m 29s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.931e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.483087, "completions/mean_length": 7.9666667, "completions/min_length": 7.6, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.51666666, "rewards/ClassificationReward/std": 0.10298573, "reward": 0.51666667, "reward_std": 0.10298573, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10572778, "global_step/max_steps": "515/9742", "percentage": "5.29%", "elapsed_time": "17m 15s", "remaining_time": "5h 9m 4s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.93e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.485097, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10675426, "global_step/max_steps": "520/9742", "percentage": "5.34%", "elapsed_time": "17m 20s", "remaining_time": "5h 7m 40s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.929e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.487133, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10778074, "global_step/max_steps": "525/9742", "percentage": "5.39%", "elapsed_time": "17m 26s", "remaining_time": "5h 6m 16s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.927e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.489143, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10880723, "global_step/max_steps": "530/9742", "percentage": "5.44%", "elapsed_time": "17m 32s", "remaining_time": "5h 4m 53s"}
+{"loss": 0.0, "grad_norm": 22.21973991, "learning_rate": 9.926e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.491275, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.75, "rewards/ClassificationReward/std": 0.0904534, "reward": 0.75, "reward_std": 0.0904534, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.10983371, "global_step/max_steps": "535/9742", "percentage": "5.49%", "elapsed_time": "17m 37s", "remaining_time": "5h 3m 27s"}
+{"loss": 2e-08, "grad_norm": 6.62768173, "learning_rate": 9.924e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.493432, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.11547005, "reward": 0.8, "reward_std": 0.11547005, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.11086019, "global_step/max_steps": "540/9742", "percentage": "5.54%", "elapsed_time": "17m 43s", "remaining_time": "5h 2m 0s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.923e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.495191, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.11188668, "global_step/max_steps": "545/9742", "percentage": "5.59%", "elapsed_time": "17m 49s", "remaining_time": "5h 0m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.922e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.497111, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.11291316, "global_step/max_steps": "550/9742", "percentage": "5.65%", "elapsed_time": "17m 55s", "remaining_time": "4h 59m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.92e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.499009, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.11393964, "global_step/max_steps": "555/9742", "percentage": "5.70%", "elapsed_time": "18m 1s", "remaining_time": "4h 58m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.919e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.500851, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.11496613, "global_step/max_steps": "560/9742", "percentage": "5.75%", "elapsed_time": "18m 7s", "remaining_time": "4h 57m 4s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.917e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.502683, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.11599261, "global_step/max_steps": "565/9742", "percentage": "5.80%", "elapsed_time": "18m 12s", "remaining_time": "4h 55m 52s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.916e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.504322, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.11701909, "global_step/max_steps": "570/9742", "percentage": "5.85%", "elapsed_time": "18m 19s", "remaining_time": "4h 54m 47s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.914e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.506153, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.11804558, "global_step/max_steps": "575/9742", "percentage": "5.90%", "elapsed_time": "18m 25s", "remaining_time": "4h 53m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.913e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.50799, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.11907206, "global_step/max_steps": "580/9742", "percentage": "5.95%", "elapsed_time": "18m 30s", "remaining_time": "4h 52m 26s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.911e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.509787, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12009854, "global_step/max_steps": "585/9742", "percentage": "6.00%", "elapsed_time": "18m 36s", "remaining_time": "4h 51m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.91e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.511529, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12112503, "global_step/max_steps": "590/9742", "percentage": "6.06%", "elapsed_time": "18m 42s", "remaining_time": "4h 50m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.908e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.513152, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12215151, "global_step/max_steps": "595/9742", "percentage": "6.11%", "elapsed_time": "18m 48s", "remaining_time": "4h 49m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.907e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.514907, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12317799, "global_step/max_steps": "600/9742", "percentage": "6.16%", "elapsed_time": "18m 54s", "remaining_time": "4h 48m 2s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.905e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.515927, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12420448, "global_step/max_steps": "605/9742", "percentage": "6.21%", "elapsed_time": "19m 1s", "remaining_time": "4h 47m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.904e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.517577, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12523096, "global_step/max_steps": "610/9742", "percentage": "6.26%", "elapsed_time": "19m 7s", "remaining_time": "4h 46m 19s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.902e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.519241, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12625744, "global_step/max_steps": "615/9742", "percentage": "6.31%", "elapsed_time": "19m 13s", "remaining_time": "4h 45m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.9e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.520777, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12728393, "global_step/max_steps": "620/9742", "percentage": "6.36%", "elapsed_time": "19m 19s", "remaining_time": "4h 44m 19s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.899e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.522491, "completions/mean_length": 7.9666667, "completions/min_length": 7.6, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.58333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.58333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12831041, "global_step/max_steps": "625/9742", "percentage": "6.42%", "elapsed_time": "19m 25s", "remaining_time": "4h 43m 16s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.897e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.524117, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.12933689, "global_step/max_steps": "630/9742", "percentage": "6.47%", "elapsed_time": "19m 31s", "remaining_time": "4h 42m 16s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.896e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.52567, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13036338, "global_step/max_steps": "635/9742", "percentage": "6.52%", "elapsed_time": "19m 36s", "remaining_time": "4h 41m 19s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.894e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.527253, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13138986, "global_step/max_steps": "640/9742", "percentage": "6.57%", "elapsed_time": "19m 42s", "remaining_time": "4h 40m 22s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.892e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.528949, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13241634, "global_step/max_steps": "645/9742", "percentage": "6.62%", "elapsed_time": "19m 48s", "remaining_time": "4h 39m 20s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.891e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.530473, "completions/mean_length": 7.9666667, "completions/min_length": 7.6, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13344282, "global_step/max_steps": "650/9742", "percentage": "6.67%", "elapsed_time": "19m 54s", "remaining_time": "4h 38m 25s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.889e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.532032, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13446931, "global_step/max_steps": "655/9742", "percentage": "6.72%", "elapsed_time": "20m 0s", "remaining_time": "4h 37m 29s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.887e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.533544, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13549579, "global_step/max_steps": "660/9742", "percentage": "6.77%", "elapsed_time": "20m 6s", "remaining_time": "4h 36m 35s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.885e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.535088, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13652227, "global_step/max_steps": "665/9742", "percentage": "6.83%", "elapsed_time": "20m 11s", "remaining_time": "4h 35m 40s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.884e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.536574, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13754876, "global_step/max_steps": "670/9742", "percentage": "6.88%", "elapsed_time": "20m 17s", "remaining_time": "4h 34m 47s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.882e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.537901, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13857524, "global_step/max_steps": "675/9742", "percentage": "6.93%", "elapsed_time": "20m 23s", "remaining_time": "4h 33m 59s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.88e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.53937, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.13960172, "global_step/max_steps": "680/9742", "percentage": "6.98%", "elapsed_time": "20m 29s", "remaining_time": "4h 33m 7s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.879e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.540847, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14062821, "global_step/max_steps": "685/9742", "percentage": "7.03%", "elapsed_time": "20m 35s", "remaining_time": "4h 32m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.877e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.542258, "completions/mean_length": 8.1, "completions/min_length": 8.0, "completions/max_length": 9.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14165469, "global_step/max_steps": "690/9742", "percentage": "7.08%", "elapsed_time": "20m 41s", "remaining_time": "4h 31m 26s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.875e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.543675, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14268117, "global_step/max_steps": "695/9742", "percentage": "7.13%", "elapsed_time": "20m 47s", "remaining_time": "4h 30m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.873e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.544964, "completions/mean_length": 8.4333334, "completions/min_length": 8.0, "completions/max_length": 9.4, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14370766, "global_step/max_steps": "700/9742", "percentage": "7.19%", "elapsed_time": "20m 53s", "remaining_time": "4h 29m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.871e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.546147, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14473414, "global_step/max_steps": "705/9742", "percentage": "7.24%", "elapsed_time": "20m 59s", "remaining_time": "4h 29m 9s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.87e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.545644, "completions/mean_length": 8.2333334, "completions/min_length": 8.0, "completions/max_length": 9.4, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14576062, "global_step/max_steps": "710/9742", "percentage": "7.29%", "elapsed_time": "21m 10s", "remaining_time": "4h 29m 18s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.868e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.546964, "completions/mean_length": 8.6, "completions/min_length": 8.6, "completions/max_length": 8.6, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14678711, "global_step/max_steps": "715/9742", "percentage": "7.34%", "elapsed_time": "21m 16s", "remaining_time": "4h 28m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.866e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.542952, "completions/mean_length": 9.0833334, "completions/min_length": 8.6, "completions/max_length": 10.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14781359, "global_step/max_steps": "720/9742", "percentage": "7.39%", "elapsed_time": "21m 35s", "remaining_time": "4h 30m 28s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.864e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.544155, "completions/mean_length": 8.7, "completions/min_length": 8.6, "completions/max_length": 9.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14884007, "global_step/max_steps": "725/9742", "percentage": "7.44%", "elapsed_time": "21m 41s", "remaining_time": "4h 29m 45s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.862e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.545135, "completions/mean_length": 8.60000019, "completions/min_length": 8.0, "completions/max_length": 10.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.14986656, "global_step/max_steps": "730/9742", "percentage": "7.49%", "elapsed_time": "21m 48s", "remaining_time": "4h 29m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.86e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.544512, "completions/mean_length": 8.81666698, "completions/min_length": 8.0, "completions/max_length": 11.8, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.15089304, "global_step/max_steps": "735/9742", "percentage": "7.54%", "elapsed_time": "21m 58s", "remaining_time": "4h 29m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.858e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.543315, "completions/mean_length": 17.41666718, "completions/min_length": 8.6, "completions/max_length": 110.4, "completions/clipped_ratio": 0.01666667, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.15191952, "global_step/max_steps": "740/9742", "percentage": "7.60%", "elapsed_time": "22m 11s", "remaining_time": "4h 29m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.856e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.544512, "completions/mean_length": 9.05, "completions/min_length": 8.6, "completions/max_length": 9.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.15294601, "global_step/max_steps": "745/9742", "percentage": "7.65%", "elapsed_time": "22m 17s", "remaining_time": "4h 29m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.854e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.545812, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.81666667, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.81666667, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.15397249, "global_step/max_steps": "750/9742", "percentage": "7.70%", "elapsed_time": "22m 23s", "remaining_time": "4h 28m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.853e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.546997, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.15499897, "global_step/max_steps": "755/9742", "percentage": "7.75%", "elapsed_time": "22m 29s", "remaining_time": "4h 27m 40s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.851e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.548261, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.15602546, "global_step/max_steps": "760/9742", "percentage": "7.80%", "elapsed_time": "22m 35s", "remaining_time": "4h 26m 56s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.849e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.549529, "completions/mean_length": 8.01666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.15705194, "global_step/max_steps": "765/9742", "percentage": "7.85%", "elapsed_time": "22m 41s", "remaining_time": "4h 26m 11s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.847e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.550802, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.15807842, "global_step/max_steps": "770/9742", "percentage": "7.90%", "elapsed_time": "22m 46s", "remaining_time": "4h 25m 27s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.845e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.552094, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.15910491, "global_step/max_steps": "775/9742", "percentage": "7.96%", "elapsed_time": "22m 52s", "remaining_time": "4h 24m 43s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.843e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.553187, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16013139, "global_step/max_steps": "780/9742", "percentage": "8.01%", "elapsed_time": "22m 59s", "remaining_time": "4h 24m 4s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.841e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.554439, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16115787, "global_step/max_steps": "785/9742", "percentage": "8.06%", "elapsed_time": "23m 4s", "remaining_time": "4h 23m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.839e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.555679, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16218436, "global_step/max_steps": "790/9742", "percentage": "8.11%", "elapsed_time": "23m 10s", "remaining_time": "4h 22m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.837e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.55695, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16321084, "global_step/max_steps": "795/9742", "percentage": "8.16%", "elapsed_time": "23m 16s", "remaining_time": "4h 21m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.835e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.558171, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16423732, "global_step/max_steps": "800/9742", "percentage": "8.21%", "elapsed_time": "23m 22s", "remaining_time": "4h 21m 13s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.832e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.559239, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16526381, "global_step/max_steps": "805/9742", "percentage": "8.26%", "elapsed_time": "23m 28s", "remaining_time": "4h 20m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.83e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.56044, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16629029, "global_step/max_steps": "810/9742", "percentage": "8.31%", "elapsed_time": "23m 34s", "remaining_time": "4h 19m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.828e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.561656, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16731677, "global_step/max_steps": "815/9742", "percentage": "8.37%", "elapsed_time": "23m 40s", "remaining_time": "4h 19m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.826e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.562854, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16834326, "global_step/max_steps": "820/9742", "percentage": "8.42%", "elapsed_time": "23m 45s", "remaining_time": "4h 18m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.824e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.564026, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.16936974, "global_step/max_steps": "825/9742", "percentage": "8.47%", "elapsed_time": "23m 51s", "remaining_time": "4h 17m 54s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.822e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.565204, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17039622, "global_step/max_steps": "830/9742", "percentage": "8.52%", "elapsed_time": "23m 57s", "remaining_time": "4h 17m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.82e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.566235, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17142271, "global_step/max_steps": "835/9742", "percentage": "8.57%", "elapsed_time": "24m 3s", "remaining_time": "4h 16m 39s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.818e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.567415, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17244919, "global_step/max_steps": "840/9742", "percentage": "8.62%", "elapsed_time": "24m 9s", "remaining_time": "4h 16m 0s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.816e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.568504, "completions/mean_length": 8.1833334, "completions/min_length": 8.0, "completions/max_length": 8.4, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17347567, "global_step/max_steps": "845/9742", "percentage": "8.67%", "elapsed_time": "24m 15s", "remaining_time": "4h 15m 23s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.813e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.569617, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17450216, "global_step/max_steps": "850/9742", "percentage": "8.73%", "elapsed_time": "24m 21s", "remaining_time": "4h 14m 46s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.811e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.570711, "completions/mean_length": 8.01666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17552864, "global_step/max_steps": "855/9742", "percentage": "8.78%", "elapsed_time": "24m 27s", "remaining_time": "4h 14m 9s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.809e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.571738, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17655512, "global_step/max_steps": "860/9742", "percentage": "8.83%", "elapsed_time": "24m 33s", "remaining_time": "4h 13m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.807e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.572863, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17758161, "global_step/max_steps": "865/9742", "percentage": "8.88%", "elapsed_time": "24m 38s", "remaining_time": "4h 12m 57s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.805e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.573929, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17860809, "global_step/max_steps": "870/9742", "percentage": "8.93%", "elapsed_time": "24m 44s", "remaining_time": "4h 12m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.802e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.574999, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.17963457, "global_step/max_steps": "875/9742", "percentage": "8.98%", "elapsed_time": "24m 50s", "remaining_time": "4h 11m 46s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.8e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.576082, "completions/mean_length": 8.01666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.18066106, "global_step/max_steps": "880/9742", "percentage": "9.03%", "elapsed_time": "24m 56s", "remaining_time": "4h 11m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.798e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.577094, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.61666667, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.61666667, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.18168754, "global_step/max_steps": "885/9742", "percentage": "9.08%", "elapsed_time": "25m 2s", "remaining_time": "4h 10m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.795e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.578171, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.18271402, "global_step/max_steps": "890/9742", "percentage": "9.14%", "elapsed_time": "25m 8s", "remaining_time": "4h 10m 1s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.793e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.579241, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.18374051, "global_step/max_steps": "895/9742", "percentage": "9.19%", "elapsed_time": "25m 14s", "remaining_time": "4h 9m 26s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.791e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.580281, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.18476699, "global_step/max_steps": "900/9742", "percentage": "9.24%", "elapsed_time": "25m 19s", "remaining_time": "4h 8m 52s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.789e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.581298, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.18579347, "global_step/max_steps": "905/9742", "percentage": "9.29%", "elapsed_time": "25m 25s", "remaining_time": "4h 8m 19s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.786e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.582329, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.18681995, "global_step/max_steps": "910/9742", "percentage": "9.34%", "elapsed_time": "25m 31s", "remaining_time": "4h 7m 45s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.784e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.583223, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.18784644, "global_step/max_steps": "915/9742", "percentage": "9.39%", "elapsed_time": "25m 37s", "remaining_time": "4h 7m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.782e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.584248, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.18887292, "global_step/max_steps": "920/9742", "percentage": "9.44%", "elapsed_time": "25m 43s", "remaining_time": "4h 6m 42s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.779e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.58525, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.1898994, "global_step/max_steps": "925/9742", "percentage": "9.49%", "elapsed_time": "25m 49s", "remaining_time": "4h 6m 9s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.777e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.586261, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.19092589, "global_step/max_steps": "930/9742", "percentage": "9.55%", "elapsed_time": "25m 55s", "remaining_time": "4h 5m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.774e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.587221, "completions/mean_length": 8.05, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.19195237, "global_step/max_steps": "935/9742", "percentage": "9.60%", "elapsed_time": "26m 1s", "remaining_time": "4h 5m 5s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.772e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.588103, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.19297885, "global_step/max_steps": "940/9742", "percentage": "9.65%", "elapsed_time": "26m 7s", "remaining_time": "4h 4m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.77e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.589053, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.19400534, "global_step/max_steps": "945/9742", "percentage": "9.70%", "elapsed_time": "26m 13s", "remaining_time": "4h 4m 5s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.767e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.590011, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.19503182, "global_step/max_steps": "950/9742", "percentage": "9.75%", "elapsed_time": "26m 19s", "remaining_time": "4h 3m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.765e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.590977, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.1960583, "global_step/max_steps": "955/9742", "percentage": "9.80%", "elapsed_time": "26m 24s", "remaining_time": "4h 3m 3s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.762e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.591975, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.96666666, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.96666667, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.19708479, "global_step/max_steps": "960/9742", "percentage": "9.85%", "elapsed_time": "26m 30s", "remaining_time": "4h 2m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.76e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.592789, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.19811127, "global_step/max_steps": "965/9742", "percentage": "9.91%", "elapsed_time": "26m 36s", "remaining_time": "4h 2m 4s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.757e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.593743, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.19913775, "global_step/max_steps": "970/9742", "percentage": "9.96%", "elapsed_time": "26m 42s", "remaining_time": "4h 1m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.755e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.594678, "completions/mean_length": 8.06666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.20016424, "global_step/max_steps": "975/9742", "percentage": "10.01%", "elapsed_time": "26m 48s", "remaining_time": "4h 1m 3s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.752e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.595694, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.93333334, "rewards/ClassificationReward/std": 0.14818843, "reward": 0.93333334, "reward_std": 0.14818843, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.20119072, "global_step/max_steps": "980/9742", "percentage": "10.06%", "elapsed_time": "26m 54s", "remaining_time": "4h 0m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.75e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.596611, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2022172, "global_step/max_steps": "985/9742", "percentage": "10.11%", "elapsed_time": "26m 59s", "remaining_time": "4h 0m 2s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.747e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.59753, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.2, "rewards/ClassificationReward/std": 0.0, "reward": 0.2, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.20324369, "global_step/max_steps": "990/9742", "percentage": "10.16%", "elapsed_time": "27m 5s", "remaining_time": "3h 59m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.745e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.598339, "completions/mean_length": 8.15, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.20427017, "global_step/max_steps": "995/9742", "percentage": "10.21%", "elapsed_time": "27m 11s", "remaining_time": "3h 59m 6s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.742e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.599222, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.20529665, "global_step/max_steps": "1000/9742", "percentage": "10.26%", "elapsed_time": "27m 17s", "remaining_time": "3h 58m 37s"}
+{"eval_loss": 0.0, "eval_completions/mean_length": 8.00277265, "eval_completions/min_length": 8.0, "eval_completions/max_length": 8.01478743, "eval_completions/clipped_ratio": 0.0, "eval_rewards/ClassificationReward/mean": 0.76632779, "eval_rewards/ClassificationReward/std": 0.00445467, "eval_reward": 0.76632779, "eval_reward_std": 0.00445467, "eval_frac_reward_zero_std": 0.98521257, "eval_clip_ratio/low_mean": 0.0, "eval_clip_ratio/low_min": 0.0, "eval_clip_ratio/high_mean": 0.0, "eval_clip_ratio/high_max": 0.0, "eval_clip_ratio/region_mean": 0.0, "eval_runtime": 158.2108, "eval_samples_per_second": 3.419, "eval_steps_per_second": 0.291, "epoch": 0.20529665, "global_step/max_steps": "1000/9742", "percentage": "10.26%", "elapsed_time": "29m 56s", "remaining_time": "4h 21m 40s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.74e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.548165, "completions/mean_length": 8.01666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.20632314, "global_step/max_steps": "1005/9742", "percentage": "10.32%", "elapsed_time": "30m 2s", "remaining_time": "4h 21m 9s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.737e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.549132, "completions/mean_length": 8.01666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.20734962, "global_step/max_steps": "1010/9742", "percentage": "10.37%", "elapsed_time": "30m 8s", "remaining_time": "4h 20m 33s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.735e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.550099, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.71666666, "rewards/ClassificationReward/std": 0.10298573, "reward": 0.71666667, "reward_std": 0.10298573, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2083761, "global_step/max_steps": "1015/9742", "percentage": "10.42%", "elapsed_time": "30m 14s", "remaining_time": "4h 19m 57s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.732e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.551091, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.20940259, "global_step/max_steps": "1020/9742", "percentage": "10.47%", "elapsed_time": "30m 19s", "remaining_time": "4h 19m 21s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.729e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.55214, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.21042907, "global_step/max_steps": "1025/9742", "percentage": "10.52%", "elapsed_time": "30m 25s", "remaining_time": "4h 18m 43s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.727e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.553122, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.21145555, "global_step/max_steps": "1030/9742", "percentage": "10.57%", "elapsed_time": "30m 31s", "remaining_time": "4h 18m 8s"}
+{"loss": 0.0, "grad_norm": 20.66020393, "learning_rate": 9.724e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.554154, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.81666667, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.81666667, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.21248204, "global_step/max_steps": "1035/9742", "percentage": "10.62%", "elapsed_time": "30m 36s", "remaining_time": "4h 17m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.721e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.555016, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.21350852, "global_step/max_steps": "1040/9742", "percentage": "10.68%", "elapsed_time": "30m 42s", "remaining_time": "4h 16m 59s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.719e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.555933, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.214535, "global_step/max_steps": "1045/9742", "percentage": "10.73%", "elapsed_time": "30m 48s", "remaining_time": "4h 16m 25s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.716e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.556854, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.21556149, "global_step/max_steps": "1050/9742", "percentage": "10.78%", "elapsed_time": "30m 54s", "remaining_time": "4h 15m 52s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.713e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.557785, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.21658797, "global_step/max_steps": "1055/9742", "percentage": "10.83%", "elapsed_time": "31m 0s", "remaining_time": "4h 15m 18s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 9.711e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.558752, "completions/mean_length": 8.01666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.18333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.18333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.21761445, "global_step/max_steps": "1060/9742", "percentage": "10.88%", "elapsed_time": "31m 6s", "remaining_time": "4h 14m 44s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.708e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.559566, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.21864094, "global_step/max_steps": "1065/9742", "percentage": "10.93%", "elapsed_time": "31m 12s", "remaining_time": "4h 14m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.705e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.560477, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.21966742, "global_step/max_steps": "1070/9742", "percentage": "10.98%", "elapsed_time": "31m 18s", "remaining_time": "4h 13m 41s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.703e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.561369, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2206939, "global_step/max_steps": "1075/9742", "percentage": "11.03%", "elapsed_time": "31m 23s", "remaining_time": "4h 13m 9s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.7e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.562288, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.22172039, "global_step/max_steps": "1080/9742", "percentage": "11.09%", "elapsed_time": "31m 29s", "remaining_time": "4h 12m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.697e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.563186, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.22274687, "global_step/max_steps": "1085/9742", "percentage": "11.14%", "elapsed_time": "31m 35s", "remaining_time": "4h 12m 4s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.694e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.56397, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.22377335, "global_step/max_steps": "1090/9742", "percentage": "11.19%", "elapsed_time": "31m 41s", "remaining_time": "4h 11m 35s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.692e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.564837, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.22479984, "global_step/max_steps": "1095/9742", "percentage": "11.24%", "elapsed_time": "31m 47s", "remaining_time": "4h 11m 3s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.689e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.565723, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.22582632, "global_step/max_steps": "1100/9742", "percentage": "11.29%", "elapsed_time": "31m 53s", "remaining_time": "4h 10m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.686e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.566599, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2268528, "global_step/max_steps": "1105/9742", "percentage": "11.34%", "elapsed_time": "31m 59s", "remaining_time": "4h 10m 1s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.683e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.567476, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.22787929, "global_step/max_steps": "1110/9742", "percentage": "11.39%", "elapsed_time": "32m 5s", "remaining_time": "4h 9m 30s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.68e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.568116, "completions/mean_length": 9.0, "completions/min_length": 8.0, "completions/max_length": 20.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.22890577, "global_step/max_steps": "1115/9742", "percentage": "11.45%", "elapsed_time": "32m 11s", "remaining_time": "4h 9m 5s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.677e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.568868, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.22993225, "global_step/max_steps": "1120/9742", "percentage": "11.50%", "elapsed_time": "32m 17s", "remaining_time": "4h 8m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.675e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.569722, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.23095874, "global_step/max_steps": "1125/9742", "percentage": "11.55%", "elapsed_time": "32m 23s", "remaining_time": "4h 8m 7s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.672e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.570568, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.23198522, "global_step/max_steps": "1130/9742", "percentage": "11.60%", "elapsed_time": "32m 29s", "remaining_time": "4h 7m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.669e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.571389, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2330117, "global_step/max_steps": "1135/9742", "percentage": "11.65%", "elapsed_time": "32m 35s", "remaining_time": "4h 7m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.666e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.572229, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.23403819, "global_step/max_steps": "1140/9742", "percentage": "11.70%", "elapsed_time": "32m 41s", "remaining_time": "4h 6m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.663e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.572895, "completions/mean_length": 8.36666679, "completions/min_length": 8.0, "completions/max_length": 12.4, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.23506467, "global_step/max_steps": "1145/9742", "percentage": "11.75%", "elapsed_time": "32m 47s", "remaining_time": "4h 6m 13s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.66e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.573738, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.23609115, "global_step/max_steps": "1150/9742", "percentage": "11.80%", "elapsed_time": "32m 53s", "remaining_time": "4h 5m 43s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.657e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.574574, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.23711763, "global_step/max_steps": "1155/9742", "percentage": "11.86%", "elapsed_time": "32m 59s", "remaining_time": "4h 5m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.654e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.57538, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.23814412, "global_step/max_steps": "1160/9742", "percentage": "11.91%", "elapsed_time": "33m 5s", "remaining_time": "4h 4m 45s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.651e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.576213, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.58333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.58333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2391706, "global_step/max_steps": "1165/9742", "percentage": "11.96%", "elapsed_time": "33m 10s", "remaining_time": "4h 4m 16s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.648e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.576956, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.24019708, "global_step/max_steps": "1170/9742", "percentage": "12.01%", "elapsed_time": "33m 16s", "remaining_time": "4h 3m 50s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.645e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.577756, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.24122357, "global_step/max_steps": "1175/9742", "percentage": "12.06%", "elapsed_time": "33m 22s", "remaining_time": "4h 3m 21s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.642e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.578561, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.24225005, "global_step/max_steps": "1180/9742", "percentage": "12.11%", "elapsed_time": "33m 28s", "remaining_time": "4h 2m 53s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.639e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.579349, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.24327653, "global_step/max_steps": "1185/9742", "percentage": "12.16%", "elapsed_time": "33m 34s", "remaining_time": "4h 2m 26s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.636e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.580144, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.24430302, "global_step/max_steps": "1190/9742", "percentage": "12.22%", "elapsed_time": "33m 40s", "remaining_time": "4h 1m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.633e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.580842, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2453295, "global_step/max_steps": "1195/9742", "percentage": "12.27%", "elapsed_time": "33m 46s", "remaining_time": "4h 1m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.63e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.581609, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.24635598, "global_step/max_steps": "1200/9742", "percentage": "12.32%", "elapsed_time": "33m 52s", "remaining_time": "4h 1m 6s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.627e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.582384, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.24738247, "global_step/max_steps": "1205/9742", "percentage": "12.37%", "elapsed_time": "33m 58s", "remaining_time": "4h 0m 39s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.624e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.583145, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.24840895, "global_step/max_steps": "1210/9742", "percentage": "12.42%", "elapsed_time": "34m 3s", "remaining_time": "4h 0m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.621e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.58394, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.24943543, "global_step/max_steps": "1215/9742", "percentage": "12.47%", "elapsed_time": "34m 9s", "remaining_time": "3h 59m 44s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.618e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.58471, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.25046192, "global_step/max_steps": "1220/9742", "percentage": "12.52%", "elapsed_time": "34m 15s", "remaining_time": "3h 59m 18s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.615e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.585369, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2514884, "global_step/max_steps": "1225/9742", "percentage": "12.57%", "elapsed_time": "34m 21s", "remaining_time": "3h 58m 54s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.612e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.586127, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.25251488, "global_step/max_steps": "1230/9742", "percentage": "12.63%", "elapsed_time": "34m 27s", "remaining_time": "3h 58m 27s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.609e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.586877, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.25354137, "global_step/max_steps": "1235/9742", "percentage": "12.68%", "elapsed_time": "34m 33s", "remaining_time": "3h 58m 1s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.606e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.587631, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.25456785, "global_step/max_steps": "1240/9742", "percentage": "12.73%", "elapsed_time": "34m 39s", "remaining_time": "3h 57m 35s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.602e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.588383, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.25559433, "global_step/max_steps": "1245/9742", "percentage": "12.78%", "elapsed_time": "34m 44s", "remaining_time": "3h 57m 9s"}
+{"loss": 2e-08, "grad_norm": 65.51988983, "learning_rate": 9.599e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.589086, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.65000001, "rewards/ClassificationReward/std": 0.15620822, "reward": 0.65000001, "reward_std": 0.15620822, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.25662082, "global_step/max_steps": "1250/9742", "percentage": "12.83%", "elapsed_time": "34m 50s", "remaining_time": "3h 56m 44s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.596e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.589827, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2576473, "global_step/max_steps": "1255/9742", "percentage": "12.88%", "elapsed_time": "34m 56s", "remaining_time": "3h 56m 19s"}
+{"loss": 2e-08, "grad_norm": 5.56353426, "learning_rate": 9.593e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.590677, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.25867378, "global_step/max_steps": "1260/9742", "percentage": "12.93%", "elapsed_time": "35m 2s", "remaining_time": "3h 55m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.59e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.591394, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.25970027, "global_step/max_steps": "1265/9742", "percentage": "12.99%", "elapsed_time": "35m 8s", "remaining_time": "3h 55m 26s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.587e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.592136, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.26072675, "global_step/max_steps": "1270/9742", "percentage": "13.04%", "elapsed_time": "35m 13s", "remaining_time": "3h 55m 0s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.583e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.592864, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.26175323, "global_step/max_steps": "1275/9742", "percentage": "13.09%", "elapsed_time": "35m 19s", "remaining_time": "3h 54m 35s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.58e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.59353, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.91666666, "rewards/ClassificationReward/std": 0.10298573, "reward": 0.91666667, "reward_std": 0.10298573, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.26277972, "global_step/max_steps": "1280/9742", "percentage": "13.14%", "elapsed_time": "35m 25s", "remaining_time": "3h 54m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.577e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.594266, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2638062, "global_step/max_steps": "1285/9742", "percentage": "13.19%", "elapsed_time": "35m 31s", "remaining_time": "3h 53m 46s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.574e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.594969, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.26483268, "global_step/max_steps": "1290/9742", "percentage": "13.24%", "elapsed_time": "35m 37s", "remaining_time": "3h 53m 22s"}
+{"loss": -1e-08, "grad_norm": 21.89606285, "learning_rate": 9.57e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.595766, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.85, "rewards/ClassificationReward/std": 0.0904534, "reward": 0.85, "reward_std": 0.0904534, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.26585917, "global_step/max_steps": "1295/9742", "percentage": "13.29%", "elapsed_time": "35m 42s", "remaining_time": "3h 52m 56s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.567e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.596441, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.26688565, "global_step/max_steps": "1300/9742", "percentage": "13.34%", "elapsed_time": "35m 48s", "remaining_time": "3h 52m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.564e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.597073, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.26791213, "global_step/max_steps": "1305/9742", "percentage": "13.40%", "elapsed_time": "35m 54s", "remaining_time": "3h 52m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.56e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.597723, "completions/mean_length": 8.01666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.26893862, "global_step/max_steps": "1310/9742", "percentage": "13.45%", "elapsed_time": "36m 0s", "remaining_time": "3h 51m 47s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.557e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.598411, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2699651, "global_step/max_steps": "1315/9742", "percentage": "13.50%", "elapsed_time": "36m 6s", "remaining_time": "3h 51m 23s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.554e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.599155, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.63333333, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.63333333, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.27099158, "global_step/max_steps": "1320/9742", "percentage": "13.55%", "elapsed_time": "36m 12s", "remaining_time": "3h 50m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.55e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.599812, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.27201807, "global_step/max_steps": "1325/9742", "percentage": "13.60%", "elapsed_time": "36m 18s", "remaining_time": "3h 50m 35s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.547e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.600411, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.27304455, "global_step/max_steps": "1330/9742", "percentage": "13.65%", "elapsed_time": "36m 24s", "remaining_time": "3h 50m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.544e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.601085, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.27407103, "global_step/max_steps": "1335/9742", "percentage": "13.70%", "elapsed_time": "36m 29s", "remaining_time": "3h 49m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.54e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.601761, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.27509752, "global_step/max_steps": "1340/9742", "percentage": "13.75%", "elapsed_time": "36m 35s", "remaining_time": "3h 49m 27s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.537e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.602407, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.276124, "global_step/max_steps": "1345/9742", "percentage": "13.81%", "elapsed_time": "36m 41s", "remaining_time": "3h 49m 5s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.534e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.603107, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.27715048, "global_step/max_steps": "1350/9742", "percentage": "13.86%", "elapsed_time": "36m 47s", "remaining_time": "3h 48m 41s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 9.53e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.603859, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.53333334, "rewards/ClassificationReward/std": 0.0984732, "reward": 0.53333334, "reward_std": 0.0984732, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.27817697, "global_step/max_steps": "1355/9742", "percentage": "13.91%", "elapsed_time": "36m 52s", "remaining_time": "3h 48m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.527e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.604456, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.27920345, "global_step/max_steps": "1360/9742", "percentage": "13.96%", "elapsed_time": "36m 58s", "remaining_time": "3h 47m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.523e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.605102, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.28022993, "global_step/max_steps": "1365/9742", "percentage": "14.01%", "elapsed_time": "37m 4s", "remaining_time": "3h 47m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.52e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.605769, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.28125642, "global_step/max_steps": "1370/9742", "percentage": "14.06%", "elapsed_time": "37m 10s", "remaining_time": "3h 47m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.516e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.606409, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2822829, "global_step/max_steps": "1375/9742", "percentage": "14.11%", "elapsed_time": "37m 16s", "remaining_time": "3h 46m 48s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.513e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.607053, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.28330938, "global_step/max_steps": "1380/9742", "percentage": "14.17%", "elapsed_time": "37m 22s", "remaining_time": "3h 46m 26s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.51e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.607598, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.28433587, "global_step/max_steps": "1385/9742", "percentage": "14.22%", "elapsed_time": "37m 28s", "remaining_time": "3h 46m 7s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.506e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.608235, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.28536235, "global_step/max_steps": "1390/9742", "percentage": "14.27%", "elapsed_time": "37m 34s", "remaining_time": "3h 45m 45s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.503e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.608893, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.28638883, "global_step/max_steps": "1395/9742", "percentage": "14.32%", "elapsed_time": "37m 40s", "remaining_time": "3h 45m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.499e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.609525, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.28741532, "global_step/max_steps": "1400/9742", "percentage": "14.37%", "elapsed_time": "37m 45s", "remaining_time": "3h 45m 1s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.496e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.610151, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2884418, "global_step/max_steps": "1405/9742", "percentage": "14.42%", "elapsed_time": "37m 51s", "remaining_time": "3h 44m 39s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.492e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.610688, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.28946828, "global_step/max_steps": "1410/9742", "percentage": "14.47%", "elapsed_time": "37m 57s", "remaining_time": "3h 44m 20s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.488e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.61135, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.29049476, "global_step/max_steps": "1415/9742", "percentage": "14.52%", "elapsed_time": "38m 3s", "remaining_time": "3h 43m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.485e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.611969, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.29152125, "global_step/max_steps": "1420/9742", "percentage": "14.58%", "elapsed_time": "38m 9s", "remaining_time": "3h 43m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.481e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.612582, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.29254773, "global_step/max_steps": "1425/9742", "percentage": "14.63%", "elapsed_time": "38m 15s", "remaining_time": "3h 43m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.478e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.613193, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.29357421, "global_step/max_steps": "1430/9742", "percentage": "14.68%", "elapsed_time": "38m 21s", "remaining_time": "3h 42m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.474e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.613791, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.2946007, "global_step/max_steps": "1435/9742", "percentage": "14.73%", "elapsed_time": "38m 26s", "remaining_time": "3h 42m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.471e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.614335, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.29562718, "global_step/max_steps": "1440/9742", "percentage": "14.78%", "elapsed_time": "38m 32s", "remaining_time": "3h 42m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.467e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.614952, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.29665366, "global_step/max_steps": "1445/9742", "percentage": "14.83%", "elapsed_time": "38m 38s", "remaining_time": "3h 41m 54s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.463e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.615552, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.29768015, "global_step/max_steps": "1450/9742", "percentage": "14.88%", "elapsed_time": "38m 44s", "remaining_time": "3h 41m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.46e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616168, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.29870663, "global_step/max_steps": "1455/9742", "percentage": "14.94%", "elapsed_time": "38m 50s", "remaining_time": "3h 41m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.456e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616776, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.29973311, "global_step/max_steps": "1460/9742", "percentage": "14.99%", "elapsed_time": "38m 56s", "remaining_time": "3h 40m 52s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.452e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.617326, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.88333333, "rewards/ClassificationReward/std": 0.10298574, "reward": 0.88333334, "reward_std": 0.10298573, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.3007596, "global_step/max_steps": "1465/9742", "percentage": "15.04%", "elapsed_time": "39m 2s", "remaining_time": "3h 40m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.449e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.617911, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.30178608, "global_step/max_steps": "1470/9742", "percentage": "15.09%", "elapsed_time": "39m 7s", "remaining_time": "3h 40m 12s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.445e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.618525, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.30281256, "global_step/max_steps": "1475/9742", "percentage": "15.14%", "elapsed_time": "39m 13s", "remaining_time": "3h 39m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.441e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619091, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.30383905, "global_step/max_steps": "1480/9742", "percentage": "15.19%", "elapsed_time": "39m 19s", "remaining_time": "3h 39m 32s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.438e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619711, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.30486553, "global_step/max_steps": "1485/9742", "percentage": "15.24%", "elapsed_time": "39m 25s", "remaining_time": "3h 39m 11s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.434e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.620294, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.30589201, "global_step/max_steps": "1490/9742", "percentage": "15.29%", "elapsed_time": "39m 31s", "remaining_time": "3h 38m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.43e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.620768, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.3069185, "global_step/max_steps": "1495/9742", "percentage": "15.35%", "elapsed_time": "39m 37s", "remaining_time": "3h 38m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.426e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.621343, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.30794498, "global_step/max_steps": "1500/9742", "percentage": "15.40%", "elapsed_time": "39m 43s", "remaining_time": "3h 38m 14s"}
+{"eval_loss": 0.0, "eval_completions/mean_length": 8.00030807, "eval_completions/min_length": 8.0, "eval_completions/max_length": 8.00369686, "eval_completions/clipped_ratio": 0.0, "eval_rewards/ClassificationReward/mean": 0.83256315, "eval_rewards/ClassificationReward/std": 0.0023538, "eval_reward": 0.83256315, "eval_reward_std": 0.0023538, "eval_frac_reward_zero_std": 0.99445471, "eval_clip_ratio/low_mean": 0.0, "eval_clip_ratio/low_min": 0.0, "eval_clip_ratio/high_mean": 0.0, "eval_clip_ratio/high_max": 0.0, "eval_clip_ratio/region_mean": 0.0, "eval_runtime": 158.008, "eval_samples_per_second": 3.424, "eval_steps_per_second": 0.291, "epoch": 0.30794498, "global_step/max_steps": "1500/9742", "percentage": "15.40%", "elapsed_time": "42m 21s", "remaining_time": "3h 52m 42s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.423e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.583716, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.30897146, "global_step/max_steps": "1505/9742", "percentage": "15.45%", "elapsed_time": "42m 27s", "remaining_time": "3h 52m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.419e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.584323, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.30999795, "global_step/max_steps": "1510/9742", "percentage": "15.50%", "elapsed_time": "42m 33s", "remaining_time": "3h 51m 59s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.415e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.584855, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.31102443, "global_step/max_steps": "1515/9742", "percentage": "15.55%", "elapsed_time": "42m 39s", "remaining_time": "3h 51m 38s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.411e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.585508, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.31205091, "global_step/max_steps": "1520/9742", "percentage": "15.60%", "elapsed_time": "42m 45s", "remaining_time": "3h 51m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.407e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.586112, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.3130774, "global_step/max_steps": "1525/9742", "percentage": "15.65%", "elapsed_time": "42m 50s", "remaining_time": "3h 50m 52s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.404e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.586722, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.31410388, "global_step/max_steps": "1530/9742", "percentage": "15.71%", "elapsed_time": "42m 56s", "remaining_time": "3h 50m 29s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.4e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.587327, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.31513036, "global_step/max_steps": "1535/9742", "percentage": "15.76%", "elapsed_time": "43m 2s", "remaining_time": "3h 50m 7s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.396e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.587865, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.31615685, "global_step/max_steps": "1540/9742", "percentage": "15.81%", "elapsed_time": "43m 8s", "remaining_time": "3h 49m 47s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.392e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.588463, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.31718333, "global_step/max_steps": "1545/9742", "percentage": "15.86%", "elapsed_time": "43m 14s", "remaining_time": "3h 49m 24s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.388e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.589049, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.31820981, "global_step/max_steps": "1550/9742", "percentage": "15.91%", "elapsed_time": "43m 20s", "remaining_time": "3h 49m 3s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.384e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.589634, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.3192363, "global_step/max_steps": "1555/9742", "percentage": "15.96%", "elapsed_time": "43m 26s", "remaining_time": "3h 48m 41s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.381e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.590224, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.32026278, "global_step/max_steps": "1560/9742", "percentage": "16.01%", "elapsed_time": "43m 32s", "remaining_time": "3h 48m 19s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.377e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.590739, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.32128926, "global_step/max_steps": "1565/9742", "percentage": "16.06%", "elapsed_time": "43m 38s", "remaining_time": "3h 47m 59s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.373e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.591323, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.32231575, "global_step/max_steps": "1570/9742", "percentage": "16.12%", "elapsed_time": "43m 44s", "remaining_time": "3h 47m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.369e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.591909, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.32334223, "global_step/max_steps": "1575/9742", "percentage": "16.17%", "elapsed_time": "43m 49s", "remaining_time": "3h 47m 16s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.365e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.592491, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.32436871, "global_step/max_steps": "1580/9742", "percentage": "16.22%", "elapsed_time": "43m 55s", "remaining_time": "3h 46m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.361e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.59308, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.3253952, "global_step/max_steps": "1585/9742", "percentage": "16.27%", "elapsed_time": "44m 1s", "remaining_time": "3h 46m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.357e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.59359, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.32642168, "global_step/max_steps": "1590/9742", "percentage": "16.32%", "elapsed_time": "44m 7s", "remaining_time": "3h 46m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.353e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.594144, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.32744816, "global_step/max_steps": "1595/9742", "percentage": "16.37%", "elapsed_time": "44m 13s", "remaining_time": "3h 45m 53s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.349e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.594684, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.32847465, "global_step/max_steps": "1600/9742", "percentage": "16.42%", "elapsed_time": "44m 19s", "remaining_time": "3h 45m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.345e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.595271, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.2, "rewards/ClassificationReward/std": 0.0, "reward": 0.2, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.32950113, "global_step/max_steps": "1605/9742", "percentage": "16.48%", "elapsed_time": "44m 25s", "remaining_time": "3h 45m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.341e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.595852, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.33052761, "global_step/max_steps": "1610/9742", "percentage": "16.53%", "elapsed_time": "44m 31s", "remaining_time": "3h 44m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.337e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.596419, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.3315541, "global_step/max_steps": "1615/9742", "percentage": "16.58%", "elapsed_time": "44m 36s", "remaining_time": "3h 44m 30s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.333e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.596919, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.33258058, "global_step/max_steps": "1620/9742", "percentage": "16.63%", "elapsed_time": "44m 42s", "remaining_time": "3h 44m 11s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.329e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.597464, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.33360706, "global_step/max_steps": "1625/9742", "percentage": "16.68%", "elapsed_time": "44m 48s", "remaining_time": "3h 43m 50s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.325e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.598018, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.33463355, "global_step/max_steps": "1630/9742", "percentage": "16.73%", "elapsed_time": "44m 54s", "remaining_time": "3h 43m 30s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.321e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.598542, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.33566003, "global_step/max_steps": "1635/9742", "percentage": "16.78%", "elapsed_time": "45m 0s", "remaining_time": "3h 43m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.317e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.599091, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.33668651, "global_step/max_steps": "1640/9742", "percentage": "16.83%", "elapsed_time": "45m 6s", "remaining_time": "3h 42m 50s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.313e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.599557, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.337713, "global_step/max_steps": "1645/9742", "percentage": "16.89%", "elapsed_time": "45m 12s", "remaining_time": "3h 42m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.309e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.600098, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.33873948, "global_step/max_steps": "1650/9742", "percentage": "16.94%", "elapsed_time": "45m 18s", "remaining_time": "3h 42m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.305e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.600687, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.83333333, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.83333333, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.33976596, "global_step/max_steps": "1655/9742", "percentage": "16.99%", "elapsed_time": "45m 24s", "remaining_time": "3h 41m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.301e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.601226, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.34079245, "global_step/max_steps": "1660/9742", "percentage": "17.04%", "elapsed_time": "45m 30s", "remaining_time": "3h 41m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.296e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.601751, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.34181893, "global_step/max_steps": "1665/9742", "percentage": "17.09%", "elapsed_time": "45m 35s", "remaining_time": "3h 41m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.292e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.60229, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.34284541, "global_step/max_steps": "1670/9742", "percentage": "17.14%", "elapsed_time": "45m 41s", "remaining_time": "3h 40m 52s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.288e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.602737, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.34387189, "global_step/max_steps": "1675/9742", "percentage": "17.19%", "elapsed_time": "45m 47s", "remaining_time": "3h 40m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.284e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.603278, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.34489838, "global_step/max_steps": "1680/9742", "percentage": "17.24%", "elapsed_time": "45m 53s", "remaining_time": "3h 40m 14s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 9.28e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.603868, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.95, "rewards/ClassificationReward/std": 0.0904534, "reward": 0.95, "reward_std": 0.0904534, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.34592486, "global_step/max_steps": "1685/9742", "percentage": "17.30%", "elapsed_time": "45m 59s", "remaining_time": "3h 39m 54s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.276e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.604391, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.34695134, "global_step/max_steps": "1690/9742", "percentage": "17.35%", "elapsed_time": "46m 5s", "remaining_time": "3h 39m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.271e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.604883, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.34797783, "global_step/max_steps": "1695/9742", "percentage": "17.40%", "elapsed_time": "46m 11s", "remaining_time": "3h 39m 16s"}
+{"loss": 2e-08, "grad_norm": 14.715518, "learning_rate": 9.267e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.60444, "completions/mean_length": 8.0333334, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.91666666, "rewards/ClassificationReward/std": 0.10298573, "reward": 0.91666667, "reward_std": 0.10298573, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.34900431, "global_step/max_steps": "1700/9742", "percentage": "17.45%", "elapsed_time": "46m 21s", "remaining_time": "3h 39m 18s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.263e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.604971, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35003079, "global_step/max_steps": "1705/9742", "percentage": "17.50%", "elapsed_time": "46m 27s", "remaining_time": "3h 38m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.259e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.605443, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35105728, "global_step/max_steps": "1710/9742", "percentage": "17.55%", "elapsed_time": "46m 33s", "remaining_time": "3h 38m 40s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.255e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.605933, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35208376, "global_step/max_steps": "1715/9742", "percentage": "17.60%", "elapsed_time": "46m 39s", "remaining_time": "3h 38m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.25e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.606459, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35311024, "global_step/max_steps": "1720/9742", "percentage": "17.66%", "elapsed_time": "46m 45s", "remaining_time": "3h 38m 2s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.246e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.606896, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35413673, "global_step/max_steps": "1725/9742", "percentage": "17.71%", "elapsed_time": "46m 51s", "remaining_time": "3h 37m 45s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.242e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.607394, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35516321, "global_step/max_steps": "1730/9742", "percentage": "17.76%", "elapsed_time": "46m 57s", "remaining_time": "3h 37m 27s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.238e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.607907, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35618969, "global_step/max_steps": "1735/9742", "percentage": "17.81%", "elapsed_time": "47m 3s", "remaining_time": "3h 37m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.233e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.60842, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35721618, "global_step/max_steps": "1740/9742", "percentage": "17.86%", "elapsed_time": "47m 8s", "remaining_time": "3h 36m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.229e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.60895, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35824266, "global_step/max_steps": "1745/9742", "percentage": "17.91%", "elapsed_time": "47m 14s", "remaining_time": "3h 36m 30s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.225e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.609386, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.35926914, "global_step/max_steps": "1750/9742", "percentage": "17.96%", "elapsed_time": "47m 20s", "remaining_time": "3h 36m 13s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.22e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.609858, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36029563, "global_step/max_steps": "1755/9742", "percentage": "18.01%", "elapsed_time": "47m 26s", "remaining_time": "3h 35m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.216e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.610366, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36132211, "global_step/max_steps": "1760/9742", "percentage": "18.07%", "elapsed_time": "47m 32s", "remaining_time": "3h 35m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.212e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.610862, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36234859, "global_step/max_steps": "1765/9742", "percentage": "18.12%", "elapsed_time": "47m 38s", "remaining_time": "3h 35m 18s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.207e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.61136, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36337508, "global_step/max_steps": "1770/9742", "percentage": "18.17%", "elapsed_time": "47m 44s", "remaining_time": "3h 35m 0s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.203e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.611852, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36440156, "global_step/max_steps": "1775/9742", "percentage": "18.22%", "elapsed_time": "47m 50s", "remaining_time": "3h 34m 41s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.199e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.612274, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36542804, "global_step/max_steps": "1780/9742", "percentage": "18.27%", "elapsed_time": "47m 56s", "remaining_time": "3h 34m 25s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.194e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.612774, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36645453, "global_step/max_steps": "1785/9742", "percentage": "18.32%", "elapsed_time": "48m 1s", "remaining_time": "3h 34m 6s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.19e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.613273, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36748101, "global_step/max_steps": "1790/9742", "percentage": "18.37%", "elapsed_time": "48m 7s", "remaining_time": "3h 33m 48s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.185e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.613762, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36850749, "global_step/max_steps": "1795/9742", "percentage": "18.43%", "elapsed_time": "48m 13s", "remaining_time": "3h 33m 30s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.181e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.614245, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.36953398, "global_step/max_steps": "1800/9742", "percentage": "18.48%", "elapsed_time": "48m 19s", "remaining_time": "3h 33m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.177e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.614662, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37056046, "global_step/max_steps": "1805/9742", "percentage": "18.53%", "elapsed_time": "48m 25s", "remaining_time": "3h 32m 56s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.172e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.615135, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37158694, "global_step/max_steps": "1810/9742", "percentage": "18.58%", "elapsed_time": "48m 31s", "remaining_time": "3h 32m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.168e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.615626, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37261343, "global_step/max_steps": "1815/9742", "percentage": "18.63%", "elapsed_time": "48m 37s", "remaining_time": "3h 32m 20s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.163e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616103, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37363991, "global_step/max_steps": "1820/9742", "percentage": "18.68%", "elapsed_time": "48m 43s", "remaining_time": "3h 32m 3s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.159e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616569, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37466639, "global_step/max_steps": "1825/9742", "percentage": "18.73%", "elapsed_time": "48m 48s", "remaining_time": "3h 31m 45s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.154e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616977, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37569288, "global_step/max_steps": "1830/9742", "percentage": "18.78%", "elapsed_time": "48m 55s", "remaining_time": "3h 31m 29s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.15e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.617474, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37671936, "global_step/max_steps": "1835/9742", "percentage": "18.84%", "elapsed_time": "49m 0s", "remaining_time": "3h 31m 11s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.145e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.617956, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37774584, "global_step/max_steps": "1840/9742", "percentage": "18.89%", "elapsed_time": "49m 6s", "remaining_time": "3h 30m 54s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.141e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.618426, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37877233, "global_step/max_steps": "1845/9742", "percentage": "18.94%", "elapsed_time": "49m 12s", "remaining_time": "3h 30m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.136e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.618897, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.37979881, "global_step/max_steps": "1850/9742", "percentage": "18.99%", "elapsed_time": "49m 18s", "remaining_time": "3h 30m 19s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.132e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619375, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.38082529, "global_step/max_steps": "1855/9742", "percentage": "19.04%", "elapsed_time": "49m 23s", "remaining_time": "3h 30m 1s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.127e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619773, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.38185178, "global_step/max_steps": "1860/9742", "percentage": "19.09%", "elapsed_time": "49m 30s", "remaining_time": "3h 29m 46s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.123e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.620224, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.38287826, "global_step/max_steps": "1865/9742", "percentage": "19.14%", "elapsed_time": "49m 35s", "remaining_time": "3h 29m 29s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.118e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.620698, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.38390474, "global_step/max_steps": "1870/9742", "percentage": "19.20%", "elapsed_time": "49m 41s", "remaining_time": "3h 29m 11s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.114e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.62115, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.38493123, "global_step/max_steps": "1875/9742", "percentage": "19.25%", "elapsed_time": "49m 47s", "remaining_time": "3h 28m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.109e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.621597, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.38595771, "global_step/max_steps": "1880/9742", "percentage": "19.30%", "elapsed_time": "49m 53s", "remaining_time": "3h 28m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.104e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.621992, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.38698419, "global_step/max_steps": "1885/9742", "percentage": "19.35%", "elapsed_time": "49m 59s", "remaining_time": "3h 28m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.1e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.622444, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.38801068, "global_step/max_steps": "1890/9742", "percentage": "19.40%", "elapsed_time": "50m 5s", "remaining_time": "3h 28m 5s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.095e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.622894, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.38903716, "global_step/max_steps": "1895/9742", "percentage": "19.45%", "elapsed_time": "50m 11s", "remaining_time": "3h 27m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.09e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.623361, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39006364, "global_step/max_steps": "1900/9742", "percentage": "19.50%", "elapsed_time": "50m 16s", "remaining_time": "3h 27m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.086e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.623804, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39109013, "global_step/max_steps": "1905/9742", "percentage": "19.55%", "elapsed_time": "50m 22s", "remaining_time": "3h 27m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.081e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624188, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39211661, "global_step/max_steps": "1910/9742", "percentage": "19.61%", "elapsed_time": "50m 28s", "remaining_time": "3h 27m 0s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.077e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624624, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39314309, "global_step/max_steps": "1915/9742", "percentage": "19.66%", "elapsed_time": "50m 34s", "remaining_time": "3h 26m 44s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.072e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.625068, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39416958, "global_step/max_steps": "1920/9742", "percentage": "19.71%", "elapsed_time": "50m 40s", "remaining_time": "3h 26m 27s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 9.067e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.62557, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39519606, "global_step/max_steps": "1925/9742", "percentage": "19.76%", "elapsed_time": "50m 46s", "remaining_time": "3h 26m 9s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.062e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.62601, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39622254, "global_step/max_steps": "1930/9742", "percentage": "19.81%", "elapsed_time": "50m 52s", "remaining_time": "3h 25m 53s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.058e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.626452, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39724902, "global_step/max_steps": "1935/9742", "percentage": "19.86%", "elapsed_time": "50m 57s", "remaining_time": "3h 25m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.053e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.626819, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39827551, "global_step/max_steps": "1940/9742", "percentage": "19.91%", "elapsed_time": "51m 3s", "remaining_time": "3h 25m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.048e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.627253, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.2, "rewards/ClassificationReward/std": 0.0, "reward": 0.2, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.39930199, "global_step/max_steps": "1945/9742", "percentage": "19.97%", "elapsed_time": "51m 9s", "remaining_time": "3h 25m 6s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.044e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.627699, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40032847, "global_step/max_steps": "1950/9742", "percentage": "20.02%", "elapsed_time": "51m 15s", "remaining_time": "3h 24m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.039e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.628122, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40135496, "global_step/max_steps": "1955/9742", "percentage": "20.07%", "elapsed_time": "51m 21s", "remaining_time": "3h 24m 33s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.034e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.628595, "completions/mean_length": 7.7666667, "completions/min_length": 7.6, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.48333333, "rewards/ClassificationReward/std": 0.10298574, "reward": 0.48333334, "reward_std": 0.10298573, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40238144, "global_step/max_steps": "1960/9742", "percentage": "20.12%", "elapsed_time": "51m 27s", "remaining_time": "3h 24m 16s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.029e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.628963, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40340792, "global_step/max_steps": "1965/9742", "percentage": "20.17%", "elapsed_time": "51m 33s", "remaining_time": "3h 24m 2s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.025e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.629423, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40443441, "global_step/max_steps": "1970/9742", "percentage": "20.22%", "elapsed_time": "51m 38s", "remaining_time": "3h 23m 45s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.02e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.629866, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.23333333, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.23333333, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40546089, "global_step/max_steps": "1975/9742", "percentage": "20.27%", "elapsed_time": "51m 44s", "remaining_time": "3h 23m 29s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.015e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.630283, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40648737, "global_step/max_steps": "1980/9742", "percentage": "20.32%", "elapsed_time": "51m 50s", "remaining_time": "3h 23m 13s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.01e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.630696, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40751386, "global_step/max_steps": "1985/9742", "percentage": "20.38%", "elapsed_time": "51m 56s", "remaining_time": "3h 22m 57s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9.005e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631085, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.61666667, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.61666667, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40854034, "global_step/max_steps": "1990/9742", "percentage": "20.43%", "elapsed_time": "52m 2s", "remaining_time": "3h 22m 42s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 9e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631511, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.40956682, "global_step/max_steps": "1995/9742", "percentage": "20.48%", "elapsed_time": "52m 8s", "remaining_time": "3h 22m 27s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.996e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631928, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41059331, "global_step/max_steps": "2000/9742", "percentage": "20.53%", "elapsed_time": "52m 13s", "remaining_time": "3h 22m 11s"}
+{"eval_loss": 0.0, "eval_completions/mean_length": 8.0, "eval_completions/min_length": 8.0, "eval_completions/max_length": 8.0, "eval_completions/clipped_ratio": 0.0, "eval_rewards/ClassificationReward/mean": 0.70163278, "eval_rewards/ClassificationReward/std": 0.0076402, "eval_reward": 0.70163278, "eval_reward_std": 0.0076402, "eval_frac_reward_zero_std": 0.97966728, "eval_clip_ratio/low_mean": 0.0, "eval_clip_ratio/low_min": 0.0, "eval_clip_ratio/high_mean": 0.0, "eval_clip_ratio/high_max": 0.0, "eval_clip_ratio/region_mean": 0.0, "eval_runtime": 158.0764, "eval_samples_per_second": 3.422, "eval_steps_per_second": 0.291, "epoch": 0.41059331, "global_step/max_steps": "2000/9742", "percentage": "20.53%", "elapsed_time": "54m 51s", "remaining_time": "3h 32m 23s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.991e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.602241, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41161979, "global_step/max_steps": "2005/9742", "percentage": "20.58%", "elapsed_time": "54m 58s", "remaining_time": "3h 32m 7s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 8.986e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.602689, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.46666667, "rewards/ClassificationReward/std": 0.09847319, "reward": 0.46666667, "reward_std": 0.0984732, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41264627, "global_step/max_steps": "2010/9742", "percentage": "20.63%", "elapsed_time": "55m 4s", "remaining_time": "3h 31m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.981e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.603144, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41367276, "global_step/max_steps": "2015/9742", "percentage": "20.68%", "elapsed_time": "55m 9s", "remaining_time": "3h 31m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.976e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.603586, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41469924, "global_step/max_steps": "2020/9742", "percentage": "20.73%", "elapsed_time": "55m 15s", "remaining_time": "3h 31m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.971e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.604034, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41572572, "global_step/max_steps": "2025/9742", "percentage": "20.79%", "elapsed_time": "55m 21s", "remaining_time": "3h 30m 57s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.966e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.604499, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.9, "rewards/ClassificationReward/std": 0.10444659, "reward": 0.9, "reward_std": 0.10444659, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41675221, "global_step/max_steps": "2030/9742", "percentage": "20.84%", "elapsed_time": "55m 27s", "remaining_time": "3h 30m 39s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.961e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.60499, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.41666667, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.41666667, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41777869, "global_step/max_steps": "2035/9742", "percentage": "20.89%", "elapsed_time": "55m 32s", "remaining_time": "3h 30m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.957e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.605367, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41880517, "global_step/max_steps": "2040/9742", "percentage": "20.94%", "elapsed_time": "55m 38s", "remaining_time": "3h 30m 5s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.952e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.605804, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.41983166, "global_step/max_steps": "2045/9742", "percentage": "20.99%", "elapsed_time": "55m 44s", "remaining_time": "3h 29m 48s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.947e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.60624, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.42085814, "global_step/max_steps": "2050/9742", "percentage": "21.04%", "elapsed_time": "55m 50s", "remaining_time": "3h 29m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.942e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.606686, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.42188462, "global_step/max_steps": "2055/9742", "percentage": "21.09%", "elapsed_time": "55m 56s", "remaining_time": "3h 29m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.937e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.607132, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.42291111, "global_step/max_steps": "2060/9742", "percentage": "21.15%", "elapsed_time": "56m 1s", "remaining_time": "3h 28m 57s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.932e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.607485, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.42393759, "global_step/max_steps": "2065/9742", "percentage": "21.20%", "elapsed_time": "56m 8s", "remaining_time": "3h 28m 42s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.927e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.60792, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.42496407, "global_step/max_steps": "2070/9742", "percentage": "21.25%", "elapsed_time": "56m 14s", "remaining_time": "3h 28m 25s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 8.922e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.608401, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.53333334, "rewards/ClassificationReward/std": 0.0984732, "reward": 0.53333334, "reward_std": 0.0984732, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.42599056, "global_step/max_steps": "2075/9742", "percentage": "21.30%", "elapsed_time": "56m 19s", "remaining_time": "3h 28m 7s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.917e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.608832, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.42701704, "global_step/max_steps": "2080/9742", "percentage": "21.35%", "elapsed_time": "56m 25s", "remaining_time": "3h 27m 50s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.912e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.609243, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.42804352, "global_step/max_steps": "2085/9742", "percentage": "21.40%", "elapsed_time": "56m 31s", "remaining_time": "3h 27m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.907e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.609614, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.42907001, "global_step/max_steps": "2090/9742", "percentage": "21.45%", "elapsed_time": "56m 37s", "remaining_time": "3h 27m 18s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 8.902e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.610083, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.95, "rewards/ClassificationReward/std": 0.0904534, "reward": 0.95, "reward_std": 0.0904534, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43009649, "global_step/max_steps": "2095/9742", "percentage": "21.50%", "elapsed_time": "56m 42s", "remaining_time": "3h 27m 1s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 8.897e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.610549, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43112297, "global_step/max_steps": "2100/9742", "percentage": "21.56%", "elapsed_time": "56m 48s", "remaining_time": "3h 26m 43s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.892e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.610961, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43214946, "global_step/max_steps": "2105/9742", "percentage": "21.61%", "elapsed_time": "56m 54s", "remaining_time": "3h 26m 27s"}
+{"loss": 3e-08, "grad_norm": 6.1590066, "learning_rate": 8.887e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.611419, "completions/mean_length": 8.01666679, "completions/min_length": 8.0, "completions/max_length": 8.2, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.76666666, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.76666667, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43317594, "global_step/max_steps": "2110/9742", "percentage": "21.66%", "elapsed_time": "56m 59s", "remaining_time": "3h 26m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.881e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.611831, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43420242, "global_step/max_steps": "2115/9742", "percentage": "21.71%", "elapsed_time": "57m 5s", "remaining_time": "3h 25m 54s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.876e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.612194, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43522891, "global_step/max_steps": "2120/9742", "percentage": "21.76%", "elapsed_time": "57m 11s", "remaining_time": "3h 25m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.871e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.612607, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43625539, "global_step/max_steps": "2125/9742", "percentage": "21.81%", "elapsed_time": "57m 17s", "remaining_time": "3h 25m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.866e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.613032, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.83333333, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.83333333, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43728187, "global_step/max_steps": "2130/9742", "percentage": "21.86%", "elapsed_time": "57m 23s", "remaining_time": "3h 25m 6s"}
+{"loss": 1e-08, "grad_norm": 31.74050903, "learning_rate": 8.861e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.61349, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.83333334, "rewards/ClassificationReward/std": 0.14818843, "reward": 0.83333334, "reward_std": 0.14818843, "frac_reward_zero_std": 0.6, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43830836, "global_step/max_steps": "2135/9742", "percentage": "21.92%", "elapsed_time": "57m 29s", "remaining_time": "3h 24m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.856e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.613904, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.43933484, "global_step/max_steps": "2140/9742", "percentage": "21.97%", "elapsed_time": "57m 34s", "remaining_time": "3h 24m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.851e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.614246, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.44036132, "global_step/max_steps": "2145/9742", "percentage": "22.02%", "elapsed_time": "57m 41s", "remaining_time": "3h 24m 18s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.846e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.614658, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.44138781, "global_step/max_steps": "2150/9742", "percentage": "22.07%", "elapsed_time": "57m 46s", "remaining_time": "3h 24m 2s"}
+{"loss": 2e-08, "grad_norm": 44.86610413, "learning_rate": 8.84e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.615063, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.58333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.58333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.44241429, "global_step/max_steps": "2155/9742", "percentage": "22.12%", "elapsed_time": "57m 52s", "remaining_time": "3h 23m 46s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 8.835e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.615508, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.44344077, "global_step/max_steps": "2160/9742", "percentage": "22.17%", "elapsed_time": "57m 58s", "remaining_time": "3h 23m 29s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.83e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.615904, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.44446726, "global_step/max_steps": "2165/9742", "percentage": "22.22%", "elapsed_time": "58m 4s", "remaining_time": "3h 23m 13s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.825e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616269, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.44549374, "global_step/max_steps": "2170/9742", "percentage": "22.27%", "elapsed_time": "58m 10s", "remaining_time": "3h 22m 58s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 8.82e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616704, "completions/mean_length": 7.9833334, "completions/min_length": 7.8, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.44652022, "global_step/max_steps": "2175/9742", "percentage": "22.33%", "elapsed_time": "58m 15s", "remaining_time": "3h 22m 42s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.815e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.617097, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.4475467, "global_step/max_steps": "2180/9742", "percentage": "22.38%", "elapsed_time": "58m 21s", "remaining_time": "3h 22m 26s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.809e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.61753, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.9, "rewards/ClassificationReward/std": 0.10444659, "reward": 0.9, "reward_std": 0.10444659, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.44857319, "global_step/max_steps": "2185/9742", "percentage": "22.43%", "elapsed_time": "58m 27s", "remaining_time": "3h 22m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.804e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.617918, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.44959967, "global_step/max_steps": "2190/9742", "percentage": "22.48%", "elapsed_time": "58m 33s", "remaining_time": "3h 21m 54s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.799e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.618338, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.45062615, "global_step/max_steps": "2195/9742", "percentage": "22.53%", "elapsed_time": "58m 38s", "remaining_time": "3h 21m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.794e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.61868, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.45165264, "global_step/max_steps": "2200/9742", "percentage": "22.58%", "elapsed_time": "58m 44s", "remaining_time": "3h 21m 24s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.788e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619074, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.45267912, "global_step/max_steps": "2205/9742", "percentage": "22.63%", "elapsed_time": "58m 50s", "remaining_time": "3h 21m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.783e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619455, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.4537056, "global_step/max_steps": "2210/9742", "percentage": "22.69%", "elapsed_time": "58m 56s", "remaining_time": "3h 20m 53s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 8.778e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619876, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.45473209, "global_step/max_steps": "2215/9742", "percentage": "22.74%", "elapsed_time": "59m 2s", "remaining_time": "3h 20m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.773e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.620281, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.45575857, "global_step/max_steps": "2220/9742", "percentage": "22.79%", "elapsed_time": "59m 8s", "remaining_time": "3h 20m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.767e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.620602, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.45678505, "global_step/max_steps": "2225/9742", "percentage": "22.84%", "elapsed_time": "59m 14s", "remaining_time": "3h 20m 7s"}
+{"loss": 3e-08, "grad_norm": 0.0, "learning_rate": 8.762e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.621026, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.76666666, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.76666667, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.45781154, "global_step/max_steps": "2230/9742", "percentage": "22.89%", "elapsed_time": "59m 19s", "remaining_time": "3h 19m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.757e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.62141, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.45883802, "global_step/max_steps": "2235/9742", "percentage": "22.94%", "elapsed_time": "59m 25s", "remaining_time": "3h 19m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.751e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.621809, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.4598645, "global_step/max_steps": "2240/9742", "percentage": "22.99%", "elapsed_time": "59m 31s", "remaining_time": "3h 19m 20s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.746e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.622194, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.46089099, "global_step/max_steps": "2245/9742", "percentage": "23.04%", "elapsed_time": "59m 37s", "remaining_time": "3h 19m 5s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.741e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.622527, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.46191747, "global_step/max_steps": "2250/9742", "percentage": "23.10%", "elapsed_time": "59m 43s", "remaining_time": "3h 18m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.735e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.622911, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.46294395, "global_step/max_steps": "2255/9742", "percentage": "23.15%", "elapsed_time": "59m 49s", "remaining_time": "3h 18m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.73e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.623287, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.46397044, "global_step/max_steps": "2260/9742", "percentage": "23.20%", "elapsed_time": "59m 54s", "remaining_time": "3h 18m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.724e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.623666, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.46499692, "global_step/max_steps": "2265/9742", "percentage": "23.25%", "elapsed_time": "1h 0m 0s", "remaining_time": "3h 18m 6s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.719e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624035, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.4660234, "global_step/max_steps": "2270/9742", "percentage": "23.30%", "elapsed_time": "1h 0m 6s", "remaining_time": "3h 17m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.714e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624403, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.46704989, "global_step/max_steps": "2275/9742", "percentage": "23.35%", "elapsed_time": "1h 0m 12s", "remaining_time": "3h 17m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.708e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624727, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.46807637, "global_step/max_steps": "2280/9742", "percentage": "23.40%", "elapsed_time": "1h 0m 18s", "remaining_time": "3h 17m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.703e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.625104, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.46910285, "global_step/max_steps": "2285/9742", "percentage": "23.46%", "elapsed_time": "1h 0m 24s", "remaining_time": "3h 17m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.697e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.625477, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.47012934, "global_step/max_steps": "2290/9742", "percentage": "23.51%", "elapsed_time": "1h 0m 30s", "remaining_time": "3h 16m 53s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.692e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.625832, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.47115582, "global_step/max_steps": "2295/9742", "percentage": "23.56%", "elapsed_time": "1h 0m 36s", "remaining_time": "3h 16m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.687e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.626204, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.4721823, "global_step/max_steps": "2300/9742", "percentage": "23.61%", "elapsed_time": "1h 0m 41s", "remaining_time": "3h 16m 23s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.681e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.626512, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.47320879, "global_step/max_steps": "2305/9742", "percentage": "23.66%", "elapsed_time": "1h 0m 48s", "remaining_time": "3h 16m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.676e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.626878, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.47423527, "global_step/max_steps": "2310/9742", "percentage": "23.71%", "elapsed_time": "1h 0m 53s", "remaining_time": "3h 15m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.67e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.627239, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.47526175, "global_step/max_steps": "2315/9742", "percentage": "23.76%", "elapsed_time": "1h 0m 59s", "remaining_time": "3h 15m 41s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.665e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.627611, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.47628824, "global_step/max_steps": "2320/9742", "percentage": "23.81%", "elapsed_time": "1h 1m 5s", "remaining_time": "3h 15m 26s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.659e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.627972, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.47731472, "global_step/max_steps": "2325/9742", "percentage": "23.87%", "elapsed_time": "1h 1m 11s", "remaining_time": "3h 15m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.654e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.628279, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.4783412, "global_step/max_steps": "2330/9742", "percentage": "23.92%", "elapsed_time": "1h 1m 17s", "remaining_time": "3h 14m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.648e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.628648, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.47936769, "global_step/max_steps": "2335/9742", "percentage": "23.97%", "elapsed_time": "1h 1m 23s", "remaining_time": "3h 14m 44s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.643e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.629006, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.48039417, "global_step/max_steps": "2340/9742", "percentage": "24.02%", "elapsed_time": "1h 1m 29s", "remaining_time": "3h 14m 29s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.637e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.629359, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.48142065, "global_step/max_steps": "2345/9742", "percentage": "24.07%", "elapsed_time": "1h 1m 35s", "remaining_time": "3h 14m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.632e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.629743, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.83333333, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.83333333, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.48244714, "global_step/max_steps": "2350/9742", "percentage": "24.12%", "elapsed_time": "1h 1m 40s", "remaining_time": "3h 14m 0s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.626e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.630054, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.48347362, "global_step/max_steps": "2355/9742", "percentage": "24.17%", "elapsed_time": "1h 1m 46s", "remaining_time": "3h 13m 47s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 8.621e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.630418, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.78333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.78333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.4845001, "global_step/max_steps": "2360/9742", "percentage": "24.23%", "elapsed_time": "1h 1m 52s", "remaining_time": "3h 13m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.615e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.630806, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.9, "rewards/ClassificationReward/std": 0.10444659, "reward": 0.9, "reward_std": 0.10444659, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.48552659, "global_step/max_steps": "2365/9742", "percentage": "24.28%", "elapsed_time": "1h 1m 58s", "remaining_time": "3h 13m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.609e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631147, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.48655307, "global_step/max_steps": "2370/9742", "percentage": "24.33%", "elapsed_time": "1h 2m 4s", "remaining_time": "3h 13m 3s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.604e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631506, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.48757955, "global_step/max_steps": "2375/9742", "percentage": "24.38%", "elapsed_time": "1h 2m 9s", "remaining_time": "3h 12m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.598e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631853, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.48860604, "global_step/max_steps": "2380/9742", "percentage": "24.43%", "elapsed_time": "1h 2m 15s", "remaining_time": "3h 12m 35s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.593e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.632141, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.48963252, "global_step/max_steps": "2385/9742", "percentage": "24.48%", "elapsed_time": "1h 2m 21s", "remaining_time": "3h 12m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.587e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.6325, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.490659, "global_step/max_steps": "2390/9742", "percentage": "24.53%", "elapsed_time": "1h 2m 27s", "remaining_time": "3h 12m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.581e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.632866, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.49168549, "global_step/max_steps": "2395/9742", "percentage": "24.58%", "elapsed_time": "1h 2m 33s", "remaining_time": "3h 11m 53s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.576e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.633224, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.49271197, "global_step/max_steps": "2400/9742", "percentage": "24.64%", "elapsed_time": "1h 2m 39s", "remaining_time": "3h 11m 39s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.57e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.633573, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.49373845, "global_step/max_steps": "2405/9742", "percentage": "24.69%", "elapsed_time": "1h 2m 44s", "remaining_time": "3h 11m 25s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.564e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.633857, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.49476494, "global_step/max_steps": "2410/9742", "percentage": "24.74%", "elapsed_time": "1h 2m 51s", "remaining_time": "3h 11m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.559e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.634202, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.4, "rewards/ClassificationReward/std": 0.0, "reward": 0.4, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.49579142, "global_step/max_steps": "2415/9742", "percentage": "24.79%", "elapsed_time": "1h 2m 56s", "remaining_time": "3h 10m 59s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.553e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.63455, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.4968179, "global_step/max_steps": "2420/9742", "percentage": "24.84%", "elapsed_time": "1h 3m 2s", "remaining_time": "3h 10m 45s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.547e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.634888, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.49784439, "global_step/max_steps": "2425/9742", "percentage": "24.89%", "elapsed_time": "1h 3m 8s", "remaining_time": "3h 10m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.542e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.63522, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.49887087, "global_step/max_steps": "2430/9742", "percentage": "24.94%", "elapsed_time": "1h 3m 14s", "remaining_time": "3h 10m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.536e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.635565, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.49989735, "global_step/max_steps": "2435/9742", "percentage": "24.99%", "elapsed_time": "1h 3m 20s", "remaining_time": "3h 10m 3s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.53e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.635848, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.50092383, "global_step/max_steps": "2440/9742", "percentage": "25.05%", "elapsed_time": "1h 3m 26s", "remaining_time": "3h 9m 51s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.525e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.636193, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.50195032, "global_step/max_steps": "2445/9742", "percentage": "25.10%", "elapsed_time": "1h 3m 32s", "remaining_time": "3h 9m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.519e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.636512, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5029768, "global_step/max_steps": "2450/9742", "percentage": "25.15%", "elapsed_time": "1h 3m 38s", "remaining_time": "3h 9m 23s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.513e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.636848, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.50400328, "global_step/max_steps": "2455/9742", "percentage": "25.20%", "elapsed_time": "1h 3m 43s", "remaining_time": "3h 9m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.507e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.637183, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.50502977, "global_step/max_steps": "2460/9742", "percentage": "25.25%", "elapsed_time": "1h 3m 49s", "remaining_time": "3h 8m 56s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.502e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.637474, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.50605625, "global_step/max_steps": "2465/9742", "percentage": "25.30%", "elapsed_time": "1h 3m 55s", "remaining_time": "3h 8m 43s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.496e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.637812, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.50708273, "global_step/max_steps": "2470/9742", "percentage": "25.35%", "elapsed_time": "1h 4m 1s", "remaining_time": "3h 8m 30s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.49e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.638151, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.50810922, "global_step/max_steps": "2475/9742", "percentage": "25.41%", "elapsed_time": "1h 4m 7s", "remaining_time": "3h 8m 16s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.484e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.638488, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5091357, "global_step/max_steps": "2480/9742", "percentage": "25.46%", "elapsed_time": "1h 4m 13s", "remaining_time": "3h 8m 2s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.479e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.638815, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.51016218, "global_step/max_steps": "2485/9742", "percentage": "25.51%", "elapsed_time": "1h 4m 19s", "remaining_time": "3h 7m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.473e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.639081, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.51118867, "global_step/max_steps": "2490/9742", "percentage": "25.56%", "elapsed_time": "1h 4m 25s", "remaining_time": "3h 7m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.467e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.639392, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.51221515, "global_step/max_steps": "2495/9742", "percentage": "25.61%", "elapsed_time": "1h 4m 31s", "remaining_time": "3h 7m 24s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.461e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.639727, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.51324163, "global_step/max_steps": "2500/9742", "percentage": "25.66%", "elapsed_time": "1h 4m 36s", "remaining_time": "3h 7m 10s"}
+{"eval_loss": 0.0, "eval_completions/mean_length": 8.00184843, "eval_completions/min_length": 8.0, "eval_completions/max_length": 8.02218115, "eval_completions/clipped_ratio": 0.0, "eval_rewards/ClassificationReward/mean": 0.85890327, "eval_rewards/ClassificationReward/std": 0.0009101, "eval_reward": 0.85890327, "eval_reward_std": 0.0009101, "eval_frac_reward_zero_std": 0.99815157, "eval_clip_ratio/low_mean": 0.0, "eval_clip_ratio/low_min": 0.0, "eval_clip_ratio/high_mean": 0.0, "eval_clip_ratio/high_max": 0.0, "eval_clip_ratio/region_mean": 0.0, "eval_runtime": 157.9995, "eval_samples_per_second": 3.424, "eval_steps_per_second": 0.291, "epoch": 0.51324163, "global_step/max_steps": "2500/9742", "percentage": "25.66%", "elapsed_time": "1h 7m 14s", "remaining_time": "3h 14m 48s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.455e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.615159, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.51426812, "global_step/max_steps": "2505/9742", "percentage": "25.71%", "elapsed_time": "1h 7m 21s", "remaining_time": "3h 14m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.45e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.615463, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5152946, "global_step/max_steps": "2510/9742", "percentage": "25.76%", "elapsed_time": "1h 7m 27s", "remaining_time": "3h 14m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.444e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.6158, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.51632108, "global_step/max_steps": "2515/9742", "percentage": "25.82%", "elapsed_time": "1h 7m 33s", "remaining_time": "3h 14m 6s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.438e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616149, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.51734757, "global_step/max_steps": "2520/9742", "percentage": "25.87%", "elapsed_time": "1h 7m 38s", "remaining_time": "3h 13m 52s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.432e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616497, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.51837405, "global_step/max_steps": "2525/9742", "percentage": "25.92%", "elapsed_time": "1h 7m 44s", "remaining_time": "3h 13m 37s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.426e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616591, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.51940053, "global_step/max_steps": "2530/9742", "percentage": "25.97%", "elapsed_time": "1h 7m 52s", "remaining_time": "3h 13m 28s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.42e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.616889, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.52042702, "global_step/max_steps": "2535/9742", "percentage": "26.02%", "elapsed_time": "1h 7m 58s", "remaining_time": "3h 13m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.414e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.617226, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5214535, "global_step/max_steps": "2540/9742", "percentage": "26.07%", "elapsed_time": "1h 8m 4s", "remaining_time": "3h 13m 0s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.409e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.61758, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.52247998, "global_step/max_steps": "2545/9742", "percentage": "26.12%", "elapsed_time": "1h 8m 9s", "remaining_time": "3h 12m 45s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.403e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.617925, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.52350647, "global_step/max_steps": "2550/9742", "percentage": "26.18%", "elapsed_time": "1h 8m 15s", "remaining_time": "3h 12m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.397e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.618265, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.52453295, "global_step/max_steps": "2555/9742", "percentage": "26.23%", "elapsed_time": "1h 8m 21s", "remaining_time": "3h 12m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.391e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.618596, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.52555943, "global_step/max_steps": "2560/9742", "percentage": "26.28%", "elapsed_time": "1h 8m 27s", "remaining_time": "3h 12m 3s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.385e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.618883, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.52658592, "global_step/max_steps": "2565/9742", "percentage": "26.33%", "elapsed_time": "1h 8m 33s", "remaining_time": "3h 11m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.379e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619221, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5276124, "global_step/max_steps": "2570/9742", "percentage": "26.38%", "elapsed_time": "1h 8m 39s", "remaining_time": "3h 11m 35s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.373e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619554, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.52863888, "global_step/max_steps": "2575/9742", "percentage": "26.43%", "elapsed_time": "1h 8m 45s", "remaining_time": "3h 11m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.367e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.619884, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.52966537, "global_step/max_steps": "2580/9742", "percentage": "26.48%", "elapsed_time": "1h 8m 51s", "remaining_time": "3h 11m 7s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.361e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.620226, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.53069185, "global_step/max_steps": "2585/9742", "percentage": "26.53%", "elapsed_time": "1h 8m 56s", "remaining_time": "3h 10m 53s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.355e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.620506, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.53171833, "global_step/max_steps": "2590/9742", "percentage": "26.59%", "elapsed_time": "1h 9m 3s", "remaining_time": "3h 10m 40s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.349e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.620856, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.53274482, "global_step/max_steps": "2595/9742", "percentage": "26.64%", "elapsed_time": "1h 9m 8s", "remaining_time": "3h 10m 26s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.343e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.621183, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5337713, "global_step/max_steps": "2600/9742", "percentage": "26.69%", "elapsed_time": "1h 9m 14s", "remaining_time": "3h 10m 12s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.337e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.621509, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.53479778, "global_step/max_steps": "2605/9742", "percentage": "26.74%", "elapsed_time": "1h 9m 20s", "remaining_time": "3h 9m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.331e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.621828, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.53582427, "global_step/max_steps": "2610/9742", "percentage": "26.79%", "elapsed_time": "1h 9m 26s", "remaining_time": "3h 9m 44s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.325e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.622102, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.53685075, "global_step/max_steps": "2615/9742", "percentage": "26.84%", "elapsed_time": "1h 9m 32s", "remaining_time": "3h 9m 31s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.319e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.622435, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.53787723, "global_step/max_steps": "2620/9742", "percentage": "26.89%", "elapsed_time": "1h 9m 38s", "remaining_time": "3h 9m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.313e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.622759, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.53890372, "global_step/max_steps": "2625/9742", "percentage": "26.95%", "elapsed_time": "1h 9m 44s", "remaining_time": "3h 9m 4s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.307e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.623093, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5399302, "global_step/max_steps": "2630/9742", "percentage": "27.00%", "elapsed_time": "1h 9m 49s", "remaining_time": "3h 8m 50s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.301e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.623412, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.54095668, "global_step/max_steps": "2635/9742", "percentage": "27.05%", "elapsed_time": "1h 9m 55s", "remaining_time": "3h 8m 36s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 8.295e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.62376, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.86666667, "rewards/ClassificationReward/std": 0.09847319, "reward": 0.86666667, "reward_std": 0.0984732, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.54198317, "global_step/max_steps": "2640/9742", "percentage": "27.10%", "elapsed_time": "1h 10m 1s", "remaining_time": "3h 8m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.289e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624043, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.54300965, "global_step/max_steps": "2645/9742", "percentage": "27.15%", "elapsed_time": "1h 10m 7s", "remaining_time": "3h 8m 9s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.283e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624365, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.54403613, "global_step/max_steps": "2650/9742", "percentage": "27.20%", "elapsed_time": "1h 10m 13s", "remaining_time": "3h 7m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.277e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624688, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.54506262, "global_step/max_steps": "2655/9742", "percentage": "27.25%", "elapsed_time": "1h 10m 19s", "remaining_time": "3h 7m 42s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.271e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.625026, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5460891, "global_step/max_steps": "2660/9742", "percentage": "27.30%", "elapsed_time": "1h 10m 24s", "remaining_time": "3h 7m 28s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.264e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.625344, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.54711558, "global_step/max_steps": "2665/9742", "percentage": "27.36%", "elapsed_time": "1h 10m 30s", "remaining_time": "3h 7m 14s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.258e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.625658, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.83333333, "rewards/ClassificationReward/std": 0.07784989, "reward": 0.83333333, "reward_std": 0.07784989, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.54814207, "global_step/max_steps": "2670/9742", "percentage": "27.41%", "elapsed_time": "1h 10m 36s", "remaining_time": "3h 7m 1s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.252e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.625976, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.54916855, "global_step/max_steps": "2675/9742", "percentage": "27.46%", "elapsed_time": "1h 10m 42s", "remaining_time": "3h 6m 47s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.246e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.626256, "completions/mean_length": 8.3833334, "completions/min_length": 8.0, "completions/max_length": 12.6, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.55019503, "global_step/max_steps": "2680/9742", "percentage": "27.51%", "elapsed_time": "1h 10m 48s", "remaining_time": "3h 6m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.24e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.626551, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.55122152, "global_step/max_steps": "2685/9742", "percentage": "27.56%", "elapsed_time": "1h 10m 54s", "remaining_time": "3h 6m 21s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.234e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.626864, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.552248, "global_step/max_steps": "2690/9742", "percentage": "27.61%", "elapsed_time": "1h 11m 0s", "remaining_time": "3h 6m 8s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.228e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.627124, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.55327448, "global_step/max_steps": "2695/9742", "percentage": "27.66%", "elapsed_time": "1h 11m 6s", "remaining_time": "3h 5m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.221e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.627445, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.55430096, "global_step/max_steps": "2700/9742", "percentage": "27.72%", "elapsed_time": "1h 11m 12s", "remaining_time": "3h 5m 42s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.215e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.627774, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.55532745, "global_step/max_steps": "2705/9742", "percentage": "27.77%", "elapsed_time": "1h 11m 17s", "remaining_time": "3h 5m 28s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.209e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.628084, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.55635393, "global_step/max_steps": "2710/9742", "percentage": "27.82%", "elapsed_time": "1h 11m 23s", "remaining_time": "3h 5m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.203e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.628387, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.55738041, "global_step/max_steps": "2715/9742", "percentage": "27.87%", "elapsed_time": "1h 11m 29s", "remaining_time": "3h 5m 2s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.197e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.628694, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5584069, "global_step/max_steps": "2720/9742", "percentage": "27.92%", "elapsed_time": "1h 11m 35s", "remaining_time": "3h 4m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.191e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.628965, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.55943338, "global_step/max_steps": "2725/9742", "percentage": "27.97%", "elapsed_time": "1h 11m 41s", "remaining_time": "3h 4m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.184e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.62928, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.56045986, "global_step/max_steps": "2730/9742", "percentage": "28.02%", "elapsed_time": "1h 11m 47s", "remaining_time": "3h 4m 23s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.178e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.629594, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.56148635, "global_step/max_steps": "2735/9742", "percentage": "28.07%", "elapsed_time": "1h 11m 53s", "remaining_time": "3h 4m 9s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.172e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.629903, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.56251283, "global_step/max_steps": "2740/9742", "percentage": "28.13%", "elapsed_time": "1h 11m 58s", "remaining_time": "3h 3m 56s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.166e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.630215, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.56353931, "global_step/max_steps": "2745/9742", "percentage": "28.18%", "elapsed_time": "1h 12m 4s", "remaining_time": "3h 3m 43s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.159e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.630424, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5645658, "global_step/max_steps": "2750/9742", "percentage": "28.23%", "elapsed_time": "1h 12m 11s", "remaining_time": "3h 3m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.153e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.63072, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.56559228, "global_step/max_steps": "2755/9742", "percentage": "28.28%", "elapsed_time": "1h 12m 17s", "remaining_time": "3h 3m 19s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.147e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631009, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.56661876, "global_step/max_steps": "2760/9742", "percentage": "28.33%", "elapsed_time": "1h 12m 22s", "remaining_time": "3h 3m 6s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.141e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631312, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.56764525, "global_step/max_steps": "2765/9742", "percentage": "28.38%", "elapsed_time": "1h 12m 28s", "remaining_time": "3h 2m 53s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.134e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631615, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.56867173, "global_step/max_steps": "2770/9742", "percentage": "28.43%", "elapsed_time": "1h 12m 34s", "remaining_time": "3h 2m 40s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.128e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.631868, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.56969821, "global_step/max_steps": "2775/9742", "percentage": "28.48%", "elapsed_time": "1h 12m 40s", "remaining_time": "3h 2m 28s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.122e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.632163, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5707247, "global_step/max_steps": "2780/9742", "percentage": "28.54%", "elapsed_time": "1h 12m 46s", "remaining_time": "3h 2m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.115e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.632483, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.57175118, "global_step/max_steps": "2785/9742", "percentage": "28.59%", "elapsed_time": "1h 12m 52s", "remaining_time": "3h 2m 2s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.109e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.63279, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.57277766, "global_step/max_steps": "2790/9742", "percentage": "28.64%", "elapsed_time": "1h 12m 58s", "remaining_time": "3h 1m 49s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.103e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.633094, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.57380415, "global_step/max_steps": "2795/9742", "percentage": "28.69%", "elapsed_time": "1h 13m 3s", "remaining_time": "3h 1m 36s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.097e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.633352, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.57483063, "global_step/max_steps": "2800/9742", "percentage": "28.74%", "elapsed_time": "1h 13m 9s", "remaining_time": "3h 1m 23s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.09e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.633643, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.57585711, "global_step/max_steps": "2805/9742", "percentage": "28.79%", "elapsed_time": "1h 13m 15s", "remaining_time": "3h 1m 11s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.084e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.63395, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5768836, "global_step/max_steps": "2810/9742", "percentage": "28.84%", "elapsed_time": "1h 13m 21s", "remaining_time": "3h 0m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.077e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.634243, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.57791008, "global_step/max_steps": "2815/9742", "percentage": "28.90%", "elapsed_time": "1h 13m 27s", "remaining_time": "3h 0m 45s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.071e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.634536, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.57893656, "global_step/max_steps": "2820/9742", "percentage": "28.95%", "elapsed_time": "1h 13m 33s", "remaining_time": "3h 0m 32s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.065e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.634832, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.57996305, "global_step/max_steps": "2825/9742", "percentage": "29.00%", "elapsed_time": "1h 13m 38s", "remaining_time": "3h 0m 19s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.058e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.635079, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.58098953, "global_step/max_steps": "2830/9742", "percentage": "29.05%", "elapsed_time": "1h 13m 45s", "remaining_time": "3h 0m 7s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.052e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.635378, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.58201601, "global_step/max_steps": "2835/9742", "percentage": "29.10%", "elapsed_time": "1h 13m 50s", "remaining_time": "2h 59m 55s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.046e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.635672, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5830425, "global_step/max_steps": "2840/9742", "percentage": "29.15%", "elapsed_time": "1h 13m 56s", "remaining_time": "2h 59m 42s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.039e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.635954, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.58406898, "global_step/max_steps": "2845/9742", "percentage": "29.20%", "elapsed_time": "1h 14m 2s", "remaining_time": "2h 59m 29s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.033e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.636235, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.58509546, "global_step/max_steps": "2850/9742", "percentage": "29.25%", "elapsed_time": "1h 14m 8s", "remaining_time": "2h 59m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.026e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.636475, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.58612195, "global_step/max_steps": "2855/9742", "percentage": "29.31%", "elapsed_time": "1h 14m 14s", "remaining_time": "2h 59m 5s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.02e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.636765, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.58714843, "global_step/max_steps": "2860/9742", "percentage": "29.36%", "elapsed_time": "1h 14m 20s", "remaining_time": "2h 58m 53s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.014e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.637048, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.58817491, "global_step/max_steps": "2865/9742", "percentage": "29.41%", "elapsed_time": "1h 14m 26s", "remaining_time": "2h 58m 40s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 8.007e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.637321, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5892014, "global_step/max_steps": "2870/9742", "percentage": "29.46%", "elapsed_time": "1h 14m 32s", "remaining_time": "2h 58m 28s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 8.001e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.637647, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.75, "rewards/ClassificationReward/std": 0.0904534, "reward": 0.75, "reward_std": 0.0904534, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.59022788, "global_step/max_steps": "2875/9742", "percentage": "29.51%", "elapsed_time": "1h 14m 37s", "remaining_time": "2h 58m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.994e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.637889, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.59125436, "global_step/max_steps": "2880/9742", "percentage": "29.56%", "elapsed_time": "1h 14m 43s", "remaining_time": "2h 58m 3s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.988e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.638176, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.59228085, "global_step/max_steps": "2885/9742", "percentage": "29.61%", "elapsed_time": "1h 14m 49s", "remaining_time": "2h 57m 50s"}
+{"loss": 1e-08, "grad_norm": 78.51765442, "learning_rate": 7.981e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.638484, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.93333334, "rewards/ClassificationReward/std": 0.0984732, "reward": 0.93333334, "reward_std": 0.0984732, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.59330733, "global_step/max_steps": "2890/9742", "percentage": "29.67%", "elapsed_time": "1h 14m 55s", "remaining_time": "2h 57m 38s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.975e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.638762, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.59433381, "global_step/max_steps": "2895/9742", "percentage": "29.72%", "elapsed_time": "1h 15m 1s", "remaining_time": "2h 57m 25s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.968e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.639046, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.5953603, "global_step/max_steps": "2900/9742", "percentage": "29.77%", "elapsed_time": "1h 15m 7s", "remaining_time": "2h 57m 13s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.962e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.639347, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.59638678, "global_step/max_steps": "2905/9742", "percentage": "29.82%", "elapsed_time": "1h 15m 12s", "remaining_time": "2h 57m 0s"}
+{"loss": 0.0, "grad_norm": 15.94068909, "learning_rate": 7.955e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.639622, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.9, "rewards/ClassificationReward/std": 0.10444659, "reward": 0.9, "reward_std": 0.10444659, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.59741326, "global_step/max_steps": "2910/9742", "percentage": "29.87%", "elapsed_time": "1h 15m 18s", "remaining_time": "2h 56m 48s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.949e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.639889, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.59843975, "global_step/max_steps": "2915/9742", "percentage": "29.92%", "elapsed_time": "1h 15m 24s", "remaining_time": "2h 56m 36s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 7.942e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.64021, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.59946623, "global_step/max_steps": "2920/9742", "percentage": "29.97%", "elapsed_time": "1h 15m 29s", "remaining_time": "2h 56m 23s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.936e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.640494, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.60049271, "global_step/max_steps": "2925/9742", "percentage": "30.02%", "elapsed_time": "1h 15m 35s", "remaining_time": "2h 56m 11s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.929e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.640777, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.6015192, "global_step/max_steps": "2930/9742", "percentage": "30.08%", "elapsed_time": "1h 15m 41s", "remaining_time": "2h 55m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.923e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.64101, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.60254568, "global_step/max_steps": "2935/9742", "percentage": "30.13%", "elapsed_time": "1h 15m 47s", "remaining_time": "2h 55m 47s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.916e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.641295, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.60357216, "global_step/max_steps": "2940/9742", "percentage": "30.18%", "elapsed_time": "1h 15m 53s", "remaining_time": "2h 55m 34s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.91e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.641575, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.60459865, "global_step/max_steps": "2945/9742", "percentage": "30.23%", "elapsed_time": "1h 15m 59s", "remaining_time": "2h 55m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.903e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.641843, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.60562513, "global_step/max_steps": "2950/9742", "percentage": "30.28%", "elapsed_time": "1h 16m 5s", "remaining_time": "2h 55m 10s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.897e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.642125, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.60665161, "global_step/max_steps": "2955/9742", "percentage": "30.33%", "elapsed_time": "1h 16m 10s", "remaining_time": "2h 54m 58s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.89e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.642351, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.60767809, "global_step/max_steps": "2960/9742", "percentage": "30.38%", "elapsed_time": "1h 16m 17s", "remaining_time": "2h 54m 47s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.883e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.642609, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.60870458, "global_step/max_steps": "2965/9742", "percentage": "30.44%", "elapsed_time": "1h 16m 22s", "remaining_time": "2h 54m 35s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.877e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.642879, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.60973106, "global_step/max_steps": "2970/9742", "percentage": "30.49%", "elapsed_time": "1h 16m 28s", "remaining_time": "2h 54m 23s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.87e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.64315, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61075754, "global_step/max_steps": "2975/9742", "percentage": "30.54%", "elapsed_time": "1h 16m 34s", "remaining_time": "2h 54m 11s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.864e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.643417, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61178403, "global_step/max_steps": "2980/9742", "percentage": "30.59%", "elapsed_time": "1h 16m 40s", "remaining_time": "2h 53m 59s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.857e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.643699, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61281051, "global_step/max_steps": "2985/9742", "percentage": "30.64%", "elapsed_time": "1h 16m 46s", "remaining_time": "2h 53m 46s"}
+{"loss": 1e-08, "grad_norm": 0.0, "learning_rate": 7.85e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.643956, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.75, "rewards/ClassificationReward/std": 0.0904534, "reward": 0.75, "reward_std": 0.0904534, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61383699, "global_step/max_steps": "2990/9742", "percentage": "30.69%", "elapsed_time": "1h 16m 52s", "remaining_time": "2h 53m 35s"}
+{"loss": 2e-08, "grad_norm": 0.0, "learning_rate": 7.844e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.644255, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.98333334, "rewards/ClassificationReward/std": 0.05773503, "reward": 0.98333334, "reward_std": 0.05773503, "frac_reward_zero_std": 0.8, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61486348, "global_step/max_steps": "2995/9742", "percentage": "30.74%", "elapsed_time": "1h 16m 57s", "remaining_time": "2h 53m 22s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.837e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.644512, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61588996, "global_step/max_steps": "3000/9742", "percentage": "30.79%", "elapsed_time": "1h 17m 3s", "remaining_time": "2h 53m 10s"}
+{"eval_loss": -0.0, "eval_completions/mean_length": 8.08179298, "eval_completions/min_length": 8.0, "eval_completions/max_length": 8.98151571, "eval_completions/clipped_ratio": 0.00015404, "eval_rewards/ClassificationReward/mean": 0.76186075, "eval_rewards/ClassificationReward/std": 0.00422456, "eval_reward": 0.76186075, "eval_reward_std": 0.00422456, "eval_frac_reward_zero_std": 0.98890943, "eval_clip_ratio/low_mean": 0.0, "eval_clip_ratio/low_min": 0.0, "eval_clip_ratio/high_mean": 0.0, "eval_clip_ratio/high_max": 0.0, "eval_clip_ratio/region_mean": 0.0, "eval_runtime": 162.898, "eval_samples_per_second": 3.321, "eval_steps_per_second": 0.282, "epoch": 0.61588996, "global_step/max_steps": "3000/9742", "percentage": "30.79%", "elapsed_time": "1h 19m 46s", "remaining_time": "2h 59m 17s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.83e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.622949, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61691644, "global_step/max_steps": "3005/9742", "percentage": "30.85%", "elapsed_time": "1h 19m 52s", "remaining_time": "2h 59m 5s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.824e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.623189, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61794293, "global_step/max_steps": "3010/9742", "percentage": "30.90%", "elapsed_time": "1h 19m 58s", "remaining_time": "2h 58m 53s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.817e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.623478, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61896941, "global_step/max_steps": "3015/9742", "percentage": "30.95%", "elapsed_time": "1h 20m 4s", "remaining_time": "2h 58m 40s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.81e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.623754, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.61999589, "global_step/max_steps": "3020/9742", "percentage": "31.00%", "elapsed_time": "1h 20m 10s", "remaining_time": "2h 58m 27s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.804e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624033, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 1.0, "rewards/ClassificationReward/std": 0.0, "reward": 1.0, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.62102238, "global_step/max_steps": "3025/9742", "percentage": "31.05%", "elapsed_time": "1h 20m 16s", "remaining_time": "2h 58m 15s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.797e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624314, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.8, "rewards/ClassificationReward/std": 0.0, "reward": 0.8, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.62204886, "global_step/max_steps": "3030/9742", "percentage": "31.10%", "elapsed_time": "1h 20m 22s", "remaining_time": "2h 58m 2s"}
+{"loss": 0.0, "grad_norm": 0.0, "learning_rate": 7.79e-05, "memory(GiB)": 75.33, "train_speed(iter/s)": 0.624558, "completions/mean_length": 8.0, "completions/min_length": 8.0, "completions/max_length": 8.0, "completions/clipped_ratio": 0.0, "rewards/ClassificationReward/mean": 0.6, "rewards/ClassificationReward/std": 0.0, "reward": 0.6, "reward_std": 0.0, "frac_reward_zero_std": 1.0, "clip_ratio/low_mean": 0.0, "clip_ratio/low_min": 0.0, "clip_ratio/high_mean": 0.0, "clip_ratio/high_max": 0.0, "clip_ratio/region_mean": 0.0, "epoch": 0.62307534, "global_step/max_steps": "3035/9742", "percentage": "31.15%", "elapsed_time": "1h 20m 28s", "remaining_time": "2h 57m 50s"}
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00002.safetensors
+++ b/model-00001-of-00002.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3f2d982939c9dfb0f39ae066b947d388788471d07b14fb2b7e76d45dfdb640fe
+size 4967215360
--- a/model-00002-of-00002.safetensors
+++ b/model-00002-of-00002.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:2d7b1bf5e53261796beda7c5f79791c8e6bf75163c7e481513ba41e6a49c6bdb
+size 3077766632
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,406 @@
+{
+  "metadata": {
+    "total_parameters": 4022468096,
+    "total_size": 8044936192
+  },
+  "weight_map": {
+    "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.20.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.norm.weight": "model-00002-of-00002.safetensors"
+  }
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,31 @@
+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,239 @@
+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 262144,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}
--- a/training_args.bin
+++ b/training_args.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ae9ac479c4f779a05fb0d709c2427a7d5583ff2b357f06e6816e43d2b6fdada7
+size 9105
--- a/val_dataset.jsonl
+++ b/val_dataset.jsonl
@@ -0,0 +1,541 @@
+{"messages": [{"role": "user", "content": "what movies did craig robinson appear in for 2013?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who was the lead actor in stranger things in 2024?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which movie won the academy award for best film in 2020?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the volume of trading for krkr on the last trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's the name of the actress who played the role of queen in the fifth and sixth seasons of the show tv show \"the crown\"?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "list the top 3 highest earning star wars movies."}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how old was leonardo dicaprio when he won best actor award?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what are the names of all the movies in the indiana jones franchise?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what are johnny cash's top 3 selling albums?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how do the historical market capitalizations of coca-cola and pepsico  compare over the past decade?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which resident evil movie had the most deaths?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was abts's stock price at the end of the trading day yesterday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what is the abbreviation used to represent the company that owns the new york times in stock market listings?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "was the song winter wonderland released prior to 2018?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many units did the beatles sell in their career so far?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who is the american singer-songwriter who has won 11 grammy awards and is known for her unique voice and poignant lyrics, including her hit songs \"both sides now\" and \"big yellow taxi\"?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "as the home team during 2023-01, how many of their games did charlotte hornets lose?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what year did the simpsons stop airing?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the total market value of aisp's shares?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "in what year was the first \"spider-man\" film released?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what do you mean by the secondary market?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "when was alex van halen the drummer for the band santana?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what are the countries that are located in southern africa."}], "solution": "open"}
+{"messages": [{"role": "user", "content": "was rennes able to secure a win in their contest yesterday?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who was brad pitt married to longer, nicole kidman or katie holmes?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many times has lady gaga been named one of billboard's artists of the decade?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "in 2022, which basketball team had more overall wins, new york knicks or dallas mavericks?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "in 2011, which movie was distinguished for its visual effects at the oscars?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "the three countries with the highest oil production are...?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who has had more number one hits on the us billboard adult contemporary chart, lionel richie or barry manilow?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "where did g. p. nerli paint whistler's mother?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which law school has a longer history, cornell law school or columbia law school?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many albums have been released by the terry bradshaw that wrote asia minor?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "when was the warrior released?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "can you tell me the date that westw paid out its initial dividend?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the premiere date of titanic: 100 years on?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who are some notable nba draft picks from 2011?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "which team did rayo vallecano play against last week?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "can you provide me with the most recent stock price of lemaitre vascular?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many olympic bronze medals has michael phelps won?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "which five companies have been able to maintain a consistent return on investment (roi) of 50% or higher without any decrease in stock price?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "in 51st grammy (2008), which artist took home the best new artist?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what was the volume of trading for krkr on the last trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who has had more number one hits on the us billboard hot 100 chart, ariana grande or katy perry?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "can you tell me the earnings per share of lgstw?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which artist has more number one singles in the uk, calvin harris or abba?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what gene wilder movie came out in 2023?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was wnw's stock price at closing time yesterday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many trading days in a week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was blacw's stock price at the close of trading yesterday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the price of rs stock at closing time on the last trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how does the performance of gold compare to that of silver over the past year?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which dj is known for wearing a marshmallow helmet, marshmello or skrillex?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "on the day that cgi last paid dividends, what was the closing stock price?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's the name of brad pitt's hidden pet rabbit?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is the largest lake in the us?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "if i were to visit the capital of hawaii, how high up would i be?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many films are there in the \"fast & furious\" franchise?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which players have won the nba finals mvp award at least three times?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many breeds are recognized by the american kennel club?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is the market share of microsoft in the cloud computing industry compared to its competitors?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which movie did jennifer lawrence co-star in with bradley cooper where she played a character with a mental health condition?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many times was barack obama named one of the world's most powerful people by forbes?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what are the 4 smallest cities in the usa?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which year did oprah reach $340 million net worth?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who was the player who won the most mvp awards in professional basketball last year?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many music tours has the band arcade fire headlined throughout their career?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who was the director of the movie snow on tha bluff?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who is married to the celebrity that won the grammy for best new artist in 2022?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "on the most recent friday what was the open price of gfi?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "on what date did mapsw begin distributing dividends to its investors?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how much employees does walmart have?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which driver won more f1 races last season, sergio perez or carlos sainz jr?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what team will manchester utd go head-to-head against in their next match in eng-premier league?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was amazon's average annual revenue for the past 3 years?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who are the american players ranked are in the top ten of the wta?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who served as the director for the beach boys: nashville sounds in 2000"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was dynatrace's stock price at its lowest point during the previous month?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "tell me the songs released by steve miller in 2019"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who has won the f1 drivers' championship at least five times?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "can you tell me what the lodger title was originally?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who has had more number one hits on the us billboard adult contemporary chart, adele or ed sheeran?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many years has the band radiohead been active for?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which team did oklahoma city thunder take on in their matchup on 2022-10-13?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "which movie won the oscar best visual effects in 2019?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how old is the current ceo of microsoft?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "i'm looking for the publisher of alan wake 2. can you help me find that information?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many films were nominated for best picture at the 2022 academy awards?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many albums does journey have?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which film secured the best visual effects oscar in 2005?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many total albums harry styles release as a solo artist"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was the total amount of dividends paid out by regency centers in the last year?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "can you tell me the days in q4 2023 when the stock price of shc closed lower?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's the price-to-earnings ratio of moln as of now?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what are the major sanctioning bodies in professional boxing?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "i'm curious which company has higher earnings per share, pxmd or chmg?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "are nba games longer than ncaa basketball games?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what is the price of the the stock with the highest asset percentage in spy holdings?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who are the top four goalscorers in laliga history?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was the total value of all green bonds issued in the global market in 2022?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's the temperature outside in celsius if it's 66 degrees f?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is the age difference between angelina jolie and billy bob thornton?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many times has justin bieber won the american music award for artist of the year?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "was mengkuang dam being utilized in 1983?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who was the talented performer who received the best actor oscar in 2008?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the highest stock price of cnfr on a daily basis this week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "in 2020, who was praised for best actor at the oscars?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what are the countries that are located in southern africa."}], "solution": "open"}
+{"messages": [{"role": "user", "content": "on which day did the gdev inc. warrant distribute dividends in the last year?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "in 2021, which actor's outstanding performance earned them the coveted best actor oscar?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which country is the largest gold producer?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which movie claimed the oscar for best visual effects in 2018?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what is the highest-grossing r-rated movie?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the total value of all venture capital investments in the technology sector in 2022?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "on langeland, what is the side of the road that people drive on?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what's the current stock price of american rebel holdings?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the initial release date of thick as thieves in us?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who performed in the super bowl halftime show in 2012?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which film's visual effects stunned the academy, winning the best visual effects oscar in 2014?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "can you name the members of the the bastard fairies group for me?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many people are casted in the movie \"life of pi\""}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "at what age was lorne michaels when he produced his first film?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many sales were made by sleep dealer at the box office across the globe?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what is a net inflow?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who was the first actor to play the role of a comic book hero in a live-action movie?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what's the name of the actor who played the role of harvey specter in the tv show \"suits\"?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what's the trading volume of bhfal on the last trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "are tinos and karpathos in the same time zone in winter?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what album did kings of leon release in 2013, which included the songs \"wait for me\" and \"family tree\"?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what was the number of shares traded for ovbc on the previous trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "why is amsterdam generally considered the smallest city of the netherlands?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which teams have won the nba championship at least five times?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "which company's stock has been traded more frequently this week, tris or  cycc?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which dow jones companies are down at least 1% today?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who are the top 3 richest people in the world?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "can you tell me the date of the most recent dividend distribution by fgfpp?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how old is the last super bowl halftime performer?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "on what date did mbly start paying dividends to its investors?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many movies has michael bay directed?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which movie did the academy awards honor with the best documentary feature film in 1990?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who will brest take on this week?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "on what days did spi energy have a higher closing stock price this week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "is dreamworks animation owned by time warner or universal pictures?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "tell me three most recent releases by thoinot arbeau?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what movie received the best documentary feature accolade at the academy awards in 2016?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what percentage of israel adesanya's ufc fights have ended in a decision?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "for this week, what was the average daily high stock price of flnt?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the name of justin bieber's album last year?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who took the oscar for best visual effects in 2017?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the worldwide box office earnings for the iceman?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many points behind are arsenal from the league leader?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who has had more number one hits on the us billboard hot 100 chart, beyoncé or rihanna?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what is the abbreviation used to represent the company that owns the new york times in stock market listings?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what did chad kroeger do before joining nickelback?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "for how long did rihanna take a break from making music?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "when did one for all start performing together?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what was rxrx's price at the end of the trading session yesterday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "for best actor award, in 2002 who received this?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which company in the russell 2000 index has the highest return on equity (roe)?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "in terms of membership, which sports federation is smaller, fédération internationale de l'automobile or fifa?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "in the 2022 season, which basketball team had more total wins, golden state warriors or los angeles lakers?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "which company in the nasdaq 100 index has the highest gross margin?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's the latest film that walt becker has directed?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "when did the movie \"blade runner\" take place?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who was the player of the year in the premier league last season?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what is the driving side in ireland?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was the total value of all initial public offerings (ipos) proceeds in the nasdaq in 2022?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who sang the chorus in the song \"no church in the wild\"?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what was the average daily high of tbmc this week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who is the american singer-songwriter who has won 4 grammy awards and is known for her soulful voice and genre-bending music, which blends elements of r&b, rock, and hip-hop?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "can you tell me the number of songs that paul mccartney released between 1988 and 1998?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "did peter viertel write the screenplay for \"casablanca\"?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "when did hamburg become the biggest city of germany?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how much did toyota invest in tesla stock in may 2010?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who has played drums for the red hot chili peppers?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many times has lady gaga been credited with an acting role?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is loan deferment?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's the latest stock price of oramed pharmaceuticals that's available?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many times does apple report their earnings each year?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what is the price of microsoft when they ipo?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who was the comeback player of the year in the nfl?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was the singer of the song that won song of the year in the 59th grammy (2016)?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "in the entirety of 2021, detroit pistons did win how many games?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what percentage of tencent music users are premium subscribers?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many employees did microsoft have in 2023, compared to 2005?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which country currently has the highest stock market, usa or japan?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the name of the movie that won the most golden globe awards in a single year, including best motion picture – drama, best director, and best actor – motion picture drama?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which movie has a higher number of original songs, grease or la la land?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what is the price-to-earnings ratio of pmtu"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "can you list tom hanks' five best rated films on rotten tomatoes?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "can you tell me the opening price of gan for last tuesday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "abacus life's maximum price last week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who were the members of the band aerosmith?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which singer wrote dark nevada dream?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which of them started their career earlier, the offspring or fall out boy?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who directed 21 & over in 2024?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many races did lewis hamilton win last season?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many movies have featured songs by the band queen?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many times has collin morikawa won the masters tournament?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "can you tell me the maximum price that dermtech's stock reached last week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the low price of meta stock on feb 14 2024"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what pixar films were released after 2017?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "is prelude 31 the original title of the movie?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what school did the 2016 heisman trophy winner play for?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what price did the encore wire corporation open today?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which company in the s&p 500 index has the highest gross margin?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many interstate highways run through new mexico?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which movie has a longer runtime, the lord of the rings: the return of the king or the good, the bad and the ugly?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "where did beyoncé get married?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "can you tell me which country the artist who won the best new artist in 54th grammy (2011) was born?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "can you give me a score update for nott'ham forest's game?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what's the wattage of the energy output from ksey (am)?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was the number of shares traded for adsk on the final trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the average opening weekend sales for the black panther movies?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many goals did lens score last week?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what's the most recent album from the puerto rican artist that's been in wwe?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which current wba world title holders have held their reign for over 2000 days?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "why is the movie review rating for \"inception\" 60%?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many races did max verstappen win in 2023?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "which company's stock is more attractive based on the p/e ratio, inbx or  rxrx?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many members are part of red hot chili pepper?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "in 2015, which actor's outstanding performance earned them the coveted best actor oscar?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "did gretsky or ovechkin have more hockey goals?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "on average, what was the daily high stock price of xpev over the past week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "did taylor swifts debut album fearless launched in 2008 in us?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who was the player of the year in the premier league last season?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what are justin timberlake's three most recent releases?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who are the last three nba first round draft picks from santa clara university?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many animated movies has reese witherspoon been in?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who painted the school of athens?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is the name of the album released by adele in 2011?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what are the names of rihanna's albums in chronological order?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "price change of advanced micro devices on december 25, 2023, compared to the closing price on the previous trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many meters taller is kaligandaki a hydroelectric power station than gilgel gibe iii dam?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "frhc's current market cap."}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who has had more number one hits on the us billboard rock songs chart, imagine dragons or twenty one pilots?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "total dividends of andersons last year?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what has been the highest price that enlight renewable energy traded in today?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the date of liverpool's final game in eng-premier league?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many epl teams are based in london?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what year did david cassidy release their hit song \"white horse\"?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what is the height difference between allal al fassi dam and foum gleita dam in meters?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which company has a higher dividend yield, costco or intel?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "where did kefir come from?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "can you tell me what the title ant bully was originally?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "on which date did bkdt distribute dividends last time"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "is algae the original title of the movie?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who has had more number one hits on the us billboard dance/mix show airplay chart, illenium or rihanna?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many movies did columbia pictures release between 2010 to 2019?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which team went head-to-head with denver nuggets in 2023-01-25?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "where was beethoven born in the 90s?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many total home runs did barry bonds hit in his career?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "in 2022, memphis grizzlies won how many times?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what is the largest country that is bordered by the pacific ocean?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "when did drake release their most popular album?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many automaker companies has over 300billion market cap"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "at what age was michelle trachtenberg in her film debut?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how does the performance of gold compare to that of silver over the past year?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "in their upcoming fixture in eng-premier league, which team will liverpool face off against?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many premium subscribers did spotify gain or lose from the beginning to end of the fiscal year 2022?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what ended the war of 1812?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was abts's stock price at the end of the trading day yesterday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how long have kelly ripa and mark consuelos been married?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what are the top 3 tech stocks that rise in value in january 2024"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how much have digital album sales decreased in the us from 2013 to 2019?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what was the total number of shares traded for special opportunities fund during the first week of february?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what is the vertical distance between usoi dam and tabqa dam in meters?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many american music awards has taylor swift won throughout her career?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what is the least populated continent?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what's the name of the actor who played the role of a lawyer with no law degree in the tv show \"suits\"?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many movies did scott bakula and ryan edward hill play together?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was aeae's closing price yesterday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many times has lizzo won the grammy award for album of the year?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what year did serena williams achieve a \"golden slam\" in a single season?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "could you tell me which company has a higher market cap between hcvi and atak?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the worldwide box office earnings for vampire diary?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "in terms of oscars, which movie came out on top, batman or the iron lady?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what's the schedule for rayo vallecano's upcoming game in esp-la liga?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was the total value of all mergers and acquisitions (m&a) in the healthcare sector in 2018?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who won the oscar best actor in 2019?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "can you tell me when internet explorer 11 was first released?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what teams made the nfl playoffs this year?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what percentage of gross for pixar's 2020 movies came from outside the us and canada?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who has had more number one hits on the us billboard r&b/hip-hop chart, drake or kendrick lamar?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many collaborations has the rapper kendrick lamar had with other artists that have been certified platinum by the riaa?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "when was the interview released in japan?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the venue of manchester utd's most recent eng-premier league game? was it home or away?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many kids does blake lively have?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "on the previous trading day, what was the lowest stock price of lensar?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who is the highest grossing star based on the domestic office from 2021 to 2023?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many times has antoine griezmann won the ballon d'or award?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "for how long did rihanna take a break from making music?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which us company has the higher revenue, amazon or apple?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which actors were nominated for best leading role in 2022?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the overall score of dallas mavericks during 2022-01?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what is the current price of the tech stock with highest market cap in us?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who has been in the movie industry the longest, nicole kidman or meryl streep?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is the name of the restaurant founded by the star of joe bell"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many songs have been released by barbra streisand since winning he/she won their first grammy?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who has been in more movies, will farrell or steve carell?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was the average daily high of oblg this week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what is the original language of what we left behind: looking back at star trek: deep space nine?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what is the ratio of gb's earnings to its shares?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "is the samsung galaxy s24 or the samsung galaxy z flip 5 newer?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what are the top 3 movies on hbo max?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the global box office income for hustle & flow?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the open price of sabsw on the most recent friday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "is python (programming language) garbage-collected?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was the open price of krtx on last monday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which is a better investment, gold or silver, when considering long-term return?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what are the names of all of kim kardashian spouses?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "can i make sashimi with slow cookers?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who was crowned the best actor at the academy awards in 1993 for their role in a movie?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "why are jack russell terrier so lazy?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "can you tell me who is the current usa today's sister company?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which day this week did nvidia have the highest open price?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many times has giorgia meloni been named as one of the most powerful women in the world by forbes?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which dow jones companies are down at least 1% today?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what is the volume of bitcoin when it opened at over 55,000 on feb 14 2024"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many times has brad pitt been married?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was the price of inta at the end of the day yesterday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the weekly average closing price of ph for the last week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "in what movie did ophelia lovibond first start their acting career?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "when did elon musk step down from ceo of tesla?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which year did praise band: the movie originally releasse to the public?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how old is the last super bowl halftime performer?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who's lorient going up against in their next game in fra-ligue 1?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was the budget for the second bourne film?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who is the most popular artist according to spotify, ed sheeran or taylor swift?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "out of the movies starring tom hiddleston and jackie chen released between 2015 and 2018, how many fell into the action category?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who has the most career f1 starts among drivers that have never won a race?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "in 2000, which rapper dropped \"the marshall mathers lp\"?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what country are pandas native to?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "when was the great depression?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "can you tell me the last time soho paid dividends?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "when did jimmy page play guitar for the band van halen?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "for the god makers ii, what was the original language used?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how much does a super bowl ad cost?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who released more songs, the beatles or justin timberlake?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what are the major sanctioning bodies in professional boxing?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many times has jennifer lopez been married?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is loan deferment?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "in which year was cosmopolitan (magazine) launched?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which movie has a higher number of academy award nominations for best director, the shawshank redemption or the godfather?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what movies did emilia clarke play in last 5 years"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was alcc's opening stock price on the previous friday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which male boxers have retired undefeated?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what are the names of the albums that machine gun kelly has released?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what's avgo's price today?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which teams have won against phoenix suns during 2022-12?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "as of now, what is svmhw's stock price?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what time did the noto earthquake in japan happened in january of 2022"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is the abbr of the us state that is least populated?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what's the name of the actress who played the role of katniss in the movie \"the hunger games\"?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "originally, what language was a case of you in?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who was the mvp of the mlb world series last year?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was tbmc's closing price yesterday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which player has the most career hat-tricks in the premier league among players who have never won the golden boot?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what movie was recognized with the best animated feature film award at the oscars in 2020?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what is the market share of microsoft in the cloud computing industry compared to its competitors?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how much was the last dividend from investcorp india acquisition corp. warrant?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many units has the most popular album of ed sheeran been sold?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who was the leading captain for celta vigo in their last appearance in esp-la liga?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what movie received the best animated feature film award at the oscars in 2011?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "in 2022, which team emerged as the winner more often: denver nuggets or atlanta hawks?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what album did the killers release in 2004, which included the songs \"mr. brightside\" and \"jenny was a friend of mine\"?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which player has the most career points in the nba among players who have never won an nba championship?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "on what date did alpine income property trust last pay out a dividend to its shareholders?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what are the names of all the movies in the the godfather franchise?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many movies has chris hemsworth played thor in?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "when did bournemouth last take the field in eng-premier league?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was the worldwide box office sales for little hercules?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "did wawa dam play a part in 1990's operations?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many grammys has avril lavigne been nominated for throughout her career?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which elementary school did scarlett johansson attend?\""}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who has won more atp singles titles, roger federer or rafael nadal?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who were the original members of the band the beach boys?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which movie has a higher number of successful movie franchises, marvel cinematic universe or harry potter?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "on which day did the gdev inc. warrant distribute dividends in the last year?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "abacus life's maximum price last week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what are the area codes of denver?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many members does the teknoist have?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what movie was the most profitable that bruce willis was in?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what is the name of chris hemsworth's upcoming movie for release in 2024?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "was los esclavos dam utilized in 2005?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "last week on friday, what was the open price of flex?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many olympic gold medals has michael phelps won in curling?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many meters taller is kariba dam than mangla dam?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what's the current status of manchester city's game today?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many units has the most popular album of ed sheeran been sold?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who did heat play in their game on 2023-06-12?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what is altria group's dividend yield?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "when did elon musk sell twitter?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who are the original band members of pink floyd?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "can you tell me the latest film that denis villeneuve has been involved in?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what is the local name for mount vesuvius in neapolitan?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "on what date did abcb last pay dividends to its shareholders?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's the most recent album from the puerto rican artist that's been in wwe?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what total points were scored by orlando magic in 2022-01?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who has won more nba titles as a player and coach, steve kerr or phil jackson?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who's won more singles titles this year, iga swiatek or elena rybakina?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was the total amount of dividends paid out by regency centers in the last year?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who has the most #1 albums on billboard 200, taylor swift or jay-z?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who was head coach of the lakers in kobe bryant's rookie year?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what's the vertical extent of the observatory housed in shanghai tower?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which five companies in the dow jones have a dividend yield of over 10%?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many albums has the band queen released that have been certified platinum in the united states?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who played the role of edward scissorhands in the movie of the same name?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "tell me three most recent releases by thoinot arbeau?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what's the name of the actress who played the role of a queen in the third season of the tv show \"the crown\"?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is the earnings per share of aca?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many movies has denzel washington directed?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who were the lead singers on the album \"dark side of the moon\" by pink floyd?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which movie has a higher number of oscar nominations for best visual effects, inception or interstellar?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "is windows 11 available in more languages than windows 10?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who are the founders of whitestone cheese?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who has had more number one hits on the us billboard dance/electronic songs chart, calvin harris or the chainsmokers?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which film came out first, firewall or exit wounds?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who has won more grand slam titles in tennis, roger federer or rafael nadal?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many years after reaching $1 trillion market capitalization did it take apple to hit $2 trillion?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what is the elevation of the capital of nebraska?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who among patricia place and lana clarkson is younger?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "in which year did nene become the state bird of vermont?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which player has the most career passing tds in the nfl among players who have never won a super bowl?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what percentage of the dow jones are financial services companies?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what is the smallest island of hawaii?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who is the last emperor of the qara khitai?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "at what location did the first lord of the rings movie started filming in?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the opening market price of dbx on last monday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many times has lady gaga played a character with a mental illness in a movie?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "in 2022, which basketball team had more overall wins, new york knicks or dallas mavericks?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was the first studio album from the group that michael jackson was in?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what album did bad bunny release in 2022, which included the songs \"moscow mule\" and \"party\"?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "i'm looking for the p/e ratio of dks. would you happen to know what it is?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "is microsoft office 2019 available in a greater number of languages than microsoft office 2013?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who are the top three goal-scorers for the montreal canadiens in the current nhl season?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who is headlining the coachella music festival this year?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many teams are in the nba g-league?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many women billionaires are there in the united states as of 2023?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what was dana ashbrook's age when he/she made an appearance in the aggression scale?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "do you know who the director of katherine ryan: in trouble is?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how long did world war ii last?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what are the five highest-grossing movies at the box office directed by nancy meyers?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the lakers' record the year kobe bryant won mvp?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what are the names of the 3 main characters in top gun?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the total value of all exchange-traded funds (etfs) in the united states in 2021?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many albums has the band queen released that have been certified platinum in the united states?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what album did lady gaga release in 2009, which included the songs \"bad romance\" and \"telephone\"?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how much did each toy story movie gross on average?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many goldfish should i keep in an aquarium at once?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "when did serena williams win their first grand slam title in men's singles tennis?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "which company in the nasdaq 100 index has the highest return on assets (roa)?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many times has lionel messi won the mls cup with inter miami?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who has had more top 10 hits on the us billboard alternative airplay songs chart, radiohead or foo fighters?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "how many companies in invesco qqq trust has a market cap higher than 15 billion"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's the name of the person who directed holiday in the sun?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which date did one direction release their first album?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "where does amen thompson's brother play?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was the average amount spent by amazon on their three largest acquisitions to date?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which athletes have won the boston marathon at least four times?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many different countries does copahue stretch across?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what movie won the academy award for best picture in 2012?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "how many grand slam titles have roger federer and rafael nadal won combined?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what was the closing stock price of bepc the previous day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which 3 countries have adopted bitcoin as legal tender?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "name the last five albums by taylor swift."}], "solution": "open"}
+{"messages": [{"role": "user", "content": "which company has a higher operating margin, apple or microsoft?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "can you tell me the number of times portland trail blazers came out on top in the 2021 season?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "avalon globocare's maximum price last week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what is the price-to-earnings ratio of haia"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which player has the most career passing yards in the nfl among players who have never won a championship?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what movies did bruce willis work on in 2023?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "is there an original title before gospel hill?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was blacw's stock price at the close of trading yesterday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many movies has julia roberts been in from 2000 to 2010?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "during the 2002 academy awards, which movie received the award for the best documentary feature film?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who was the director for great moments in aviation?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who has more goals, ronaldo or messi?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "when tom hanks won their first academy award for best actor, which film was recognized as the best picture?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "does ios 17 have a wider language support compared to ios 16?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what are all the movies kenny ortega directed?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "tbmc's current market cap."}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "which company in the s&p 500 index has the highest diversity, equity, and inclusion score?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many sales were made by wild wild west at the box office across the globe?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what's the number of nations that mount elbrus covers?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who played the role of edward scissorhands in the movie of the same name?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what's the latest stock price of levi strauss & co that's available?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who has had more number one hits on the us billboard hot 100 chart, ariana grande or katy perry?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who has more likes on tiktok, karol g or billie eilish?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who is the publisher of just dance 2024 edition?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "list the names of angelina jolie's kids?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what is the highest-grossing film that was rated r?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "can i make sashimi with slow cookers?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "how many olympic gold medals has michael phelps won in curling?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what did chad kroeger do before joining nickelback?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what is the ex-dividend date of microsoft in the 1st qtr of 2024"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was the worldwide box office earnings for price check??"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who was the owner of the company that published on our backs in 2008?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who is the author of little darla has a treat for you, volume 14: winter 2000?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who made the creative decisions for hell swarm?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was dallas mavericks's win tally in 2022?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what are natalie portman 3 most recent movies?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "on december 25, 2023, how much does halozyme therapeutics's stock price go up from the previous day's closing price?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was taylor swifts age when she released her debut album?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "what was the opening stock price of sats on the tues of the past week?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many movies has julia roberts been in from 2000 to 2010?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "on average, how many watts of power is kfyo (am) powered by day?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what's the name of the actress who played the lead female character in the tv show \"the big bang theory\"?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what are all the titles held by kate middleton?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "what are the names of all the movies in the austin powers franchise?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what was the final stock price of mobix labs on the last trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's neu's open price last friday?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "with what number of points did celta vigo play their game yesterday?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "this week, what was the highest daily high stock price that imppp reached?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what was ndaq's stock price at closing time on the last trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "what's the latest film that walt becker has directed?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which company in the russell 2000 index has the highest earnings per share (eps) growth rate in the past 5 years?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "how many fast & furious movies are there?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what show was will smith's start to his career in acting?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who are the non-english players that have scored 150 goals or more in the premier league?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "which artist was younger, alice cooper or nina simone?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "which soccer players have scored at least 50 goals in a single season?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many grammys did baby blue receive until 62nd grammy (2019)?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "can you tell me the original language of raze?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who are the members of birds of chicago?"}], "solution": "music"}
+{"messages": [{"role": "user", "content": "who was max verstappen's teammate in 2020?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how much is the worst performing stock, amazon?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "for worldwide box office sales, what was touching the void?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "on which date did jof distribute dividends the first time"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "can you tell me the release date of move?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "which dune movie has better music, 1984 or 2021?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "what is the name of dwayne \"the rock\" johnson's upcoming movie?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "who has won more tour de france titles, greg lemond or miguel indurain?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "is the original dialogue of star wars: the rise of skywalker different in english?"}], "solution": "movie"}
+{"messages": [{"role": "user", "content": "who is the publisher of just dance 2024 edition?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "can you tell me the opposing side for pacers on 2022-10-14?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "who are the three players with the most home runs in major league baseball history as of 2022?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "what's the total market value of simpple ltd.'s shares as of the most recent trading day?"}], "solution": "finance"}
+{"messages": [{"role": "user", "content": "who was the coach of the team that won the stanley cup last season?"}], "solution": "sports"}
+{"messages": [{"role": "user", "content": "how many times has tom hanks played himself in a movie?"}], "solution": "open"}
+{"messages": [{"role": "user", "content": "when does atlético madrid have their next match in esp-la liga?"}], "solution": "sports"}
--- a/vocab.json
+++ b/vocab.json
				`@@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`