初始化项目,由ModelHub XC社区提供模型
Model: facebook/opt-iml-1.3b Source: Original Platform
This commit is contained in:
60
README.md
Normal file
60
README.md
Normal file
@@ -0,0 +1,60 @@
|
||||
---
|
||||
inference: false
|
||||
tags:
|
||||
- text-generation
|
||||
- opt
|
||||
|
||||
license: other
|
||||
commercial: false
|
||||
---
|
||||
# OPT-IML
|
||||
|
||||
## Model Description
|
||||
|
||||
[OPT-IML (OPT + Instruction Meta-Learning)](https://arxiv.org/abs/2212.12017) is a set of instruction-tuned versions of OPT, on a collection of ~2000 NLP tasks gathered from 8 NLP benchmarks, called OPT-IML Bench.
|
||||
|
||||
We provide two model versions:
|
||||
* OPT-IML trained on 1500 tasks with several tasks held-out for purposes of downstream evaluation, and
|
||||
* OPT-IML-Max trained on all ~2000 tasks
|
||||
|
||||
### How to use
|
||||
You can use this model directly with a pipeline for text generation.
|
||||
|
||||
```python
|
||||
>>> from transformers import pipeline
|
||||
|
||||
>>> generator = pipeline('text-generation', model="facebook/opt-iml-1.3b")
|
||||
|
||||
>>> generator("What is the capital of USA?")
|
||||
```
|
||||
|
||||
### Limitations and bias
|
||||
|
||||
While OPT-IML models outperform baseline OPT on an extensive set of evaluations,
|
||||
nevertheless, they are susceptible to the various risks associated with using large language models
|
||||
relating to factual correctness, generation of toxic language and enforcing stereotypes. While we release our
|
||||
OPT-IML models to proliferate future work on instruction-tuning and to improve the availability
|
||||
of large instruction-tuned causal LMs, the use of these models should be
|
||||
accompanied with responsible best practices.
|
||||
|
||||
## Training data
|
||||
OPT-IML models are trained on OPT-IML Bench, a large benchmark for Instruction MetaLearning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks include Super-NaturalInstructions, FLAN, PromptSource, etc.
|
||||
|
||||
## Training procedure
|
||||
The texts are tokenized using the GPT2 byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
|
||||
|
||||
The 30B model was fine-tuned on 64 40GB A100 GPUs. During fine-tuning, models saw approximately 2 billion tokens, which is only 0.6% of the pre-training
|
||||
budget of OPT.
|
||||
|
||||
|
||||
### BibTeX entry and citation info
|
||||
```bibtex
|
||||
@misc{iyer2022opt,
|
||||
title={OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization},
|
||||
author={Iyer, Srinivasan and Lin, Xi Victoria and Pasunuru, Ramakanth and Mihaylov, Todor and Simig, D{\'a}niel and Yu, Ping and Shuster, Kurt and Wang, Tianlu and Liu, Qing and Koura, Punit Singh and others},
|
||||
year={2022},
|
||||
eprint={2212.12017},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user