--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - gpt2 - causal-lm - ff-llm --- # FF_3 — FF-LLM 2.02B FF_3 is a 2.02B parameter language model trained from scratch. ## Model Details - **Architecture**: GPT-2 decoder-only (custom) - **Parameters**: 2,022,739,072 - **Vocabulary**: 50,257 (GPT-2 BPE tokenizer) - **Context length**: 2,048 tokens - **Training**: From scratch on 90B tokens ## Training Pipeline 1. **Pretraining**: 90B tokens (web + STEM data) 2. **SFT**: 760K examples + 100K high-quality examples 3. **DPO**: 38,863 preference pairs 4. **Distillation**: 20K examples from Qwen2.5-32B teacher ## Prompt Format ``` ### System: You are FF-LLM, a helpful assistant. ### Instruction: {your question here} ### Response: ``` ## Usage with Transformers ```python from transformers import GPT2LMHeadModel, GPT2Tokenizer model = GPT2LMHeadModel.from_pretrained("ff-llm/FF_3") tokenizer = GPT2Tokenizer.from_pretrained("ff-llm/FF_3") prompt = ( "### System:\nYou are FF-LLM, a helpful assistant.\n\n" "### Instruction:\nWhat is the capital of France?\n\n### Response:\n" ) input_ids = tokenizer.encode(prompt, return_tensors="pt") output = model.generate( input_ids, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id, ) print(tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)) ``` ## Usage with Ollama ```bash ollama run ff-llm/FF_3 ``` ## Limitations - Weak mathematical reasoning - May hallucinate on factual questions - English only ## Training Cost ~\,000 total compute cost Trained by a single researcher ## License Apache 2.0