init
This commit is contained in:
77
transformers/docs/source/en/model_doc/xglm.md
Normal file
77
transformers/docs/source/en/model_doc/xglm.md
Normal file
@@ -0,0 +1,77 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
*This model was released on 2021-12-20 and added to Hugging Face Transformers on 2022-01-28.*
|
||||
|
||||
# XGLM
|
||||
|
||||
<div class="flex flex-wrap space-x-1">
|
||||
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
|
||||
</div>
|
||||
|
||||
## Overview
|
||||
|
||||
The XGLM model was proposed in [Few-shot Learning with Multilingual Language Models](https://huggingface.co/papers/2112.10668)
|
||||
by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal,
|
||||
Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo,
|
||||
Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Large-scale autoregressive language models such as GPT-3 are few-shot learners that can perform a wide range of language
|
||||
tasks without fine-tuning. While these models are known to be able to jointly represent many different languages,
|
||||
their training data is dominated by English, potentially limiting their cross-lingual generalization.
|
||||
In this work, we train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages,
|
||||
and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters
|
||||
sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size
|
||||
in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings)
|
||||
and natural language inference (+5.4% in each of 0-shot and 4-shot settings). On the FLORES-101 machine translation benchmark,
|
||||
our model outperforms GPT-3 on 171 out of 182 translation directions with 32 training examples, while surpassing the
|
||||
official supervised baseline in 45 directions. We present a detailed analysis of where the model succeeds and fails,
|
||||
showing in particular that it enables cross-lingual in-context learning on some tasks, while there is still room for improvement
|
||||
on surface form robustness and adaptation to tasks that do not have a natural cloze form. Finally, we evaluate our models
|
||||
in social value tasks such as hate speech detection in five languages and find it has limitations similar to comparable sized GPT-3 models.*
|
||||
|
||||
This model was contributed by [Suraj](https://huggingface.co/valhalla). The original code can be found [here](https://github.com/pytorch/fairseq/tree/main/examples/xglm).
|
||||
|
||||
## Resources
|
||||
|
||||
- [Causal language modeling task guide](../tasks/language_modeling)
|
||||
|
||||
## XGLMConfig
|
||||
|
||||
[[autodoc]] XGLMConfig
|
||||
|
||||
## XGLMTokenizer
|
||||
|
||||
[[autodoc]] XGLMTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## XGLMTokenizerFast
|
||||
|
||||
[[autodoc]] XGLMTokenizerFast
|
||||
|
||||
## XGLMModel
|
||||
|
||||
[[autodoc]] XGLMModel
|
||||
- forward
|
||||
|
||||
## XGLMForCausalLM
|
||||
|
||||
[[autodoc]] XGLMForCausalLM
|
||||
- forward
|
||||
Reference in New Issue
Block a user