46 lines
1.5 KiB
Markdown
46 lines
1.5 KiB
Markdown
|
|
---
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
library_name: transformers
|
||
|
|
tags:
|
||
|
|
- text-generation
|
||
|
|
- metadata-localization
|
||
|
|
- chat
|
||
|
|
- without-metadata
|
||
|
|
- sft
|
||
|
|
- lora-merged
|
||
|
|
---
|
||
|
|
|
||
|
|
# combined_without_metadata_chat
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
This repo contains the merged chat model for the combined without metadata branch of the metadata localization project. It was produced by supervised fine-tuning on the project QA benchmark after project pretraining.
|
||
|
|
|
||
|
|
## Variant Metadata
|
||
|
|
|
||
|
|
- Stage: `sft_chat`
|
||
|
|
- Family: `chat`
|
||
|
|
- Metadata condition: `without_metadata`
|
||
|
|
- Base model lineage: `combined_without_metadata_1b`
|
||
|
|
|
||
|
|
## Weights & Biases Provenance
|
||
|
|
|
||
|
|
- No matching W&B run was resolved automatically.
|
||
|
|
|
||
|
|
## SFT Notes
|
||
|
|
|
||
|
|
- Fine-tuning method: `PEFT / LoRA`
|
||
|
|
- Optimizer: `adamw_bnb_8bit`
|
||
|
|
- `bf16=True`, `gradient_checkpointing=True`, `use_liger_kernel=True`
|
||
|
|
- `per_device_train_batch_size=2`, `gradient_accumulation_steps=8`
|
||
|
|
- LoRA targets: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
|
||
|
|
|
||
|
|
## Project Context
|
||
|
|
|
||
|
|
This model is part of the metadata localization release. Related checkpoints and variants are grouped in the public Hugging Face collection [Metadata Conditioned LLMs](https://huggingface.co/collections/iamshnoo/metadata-conditioned-llms).
|
||
|
|
- Training data source: [News on the Web (NOW) Corpus](https://www.english-corpora.org/now/)
|
||
|
|
- Project repository: [https://github.com/iamshnoo/metadata_localization](https://github.com/iamshnoo/metadata_localization)
|
||
|
|
- Paper: [https://arxiv.org/abs/2601.15236](https://arxiv.org/abs/2601.15236)
|
||
|
|
|
||
|
|
Last synced: `2026-04-02 14:48:17 UTC`
|