Skip to content

Commit

Permalink
Merge pull request huggingface#1 from huggingface/master
Browse files Browse the repository at this point in the history
Merge changes from huggingface/transformers to stevezheng23/transformers
  • Loading branch information
stevezheng23 authored Oct 5, 2019
2 parents 5c3b32d + 0820bb0 commit deb2e71
Show file tree
Hide file tree
Showing 38 changed files with 1,281 additions and 342 deletions.
23 changes: 23 additions & 0 deletions .github/ISSUE_TEMPLATE/--new-model-addition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
name: "\U0001F31FNew model addition"
about: Submit a proposal/request to implement a new Transformer-based model
title: ''
labels: ''
assignees: ''

---

# 🌟New model addition

## Model description

<!-- Important information -->

## Open Source status

* [ ] the model implementation is available: (give details)
* [ ] the model weights are available: (give details)

## Additional context

<!-- Add any other context about the problem here. -->
6 changes: 5 additions & 1 deletion .github/ISSUE_TEMPLATE/bug-report.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
---
name: "\U0001F41B Bug Report"
about: Submit a bug report to help us improve PyTorch Transformers
title: ''
labels: ''
assignees: ''

---

## 🐛 Bug
Expand Down Expand Up @@ -45,4 +49,4 @@ Steps to reproduce the behavior:

## Additional context

<!-- Add any other context about the problem here. -->
<!-- Add any other context about the problem here. -->
6 changes: 5 additions & 1 deletion .github/ISSUE_TEMPLATE/feature-request.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
---
name: "\U0001F680 Feature Request"
about: Submit a proposal/request for a new PyTorch Transformers feature
title: ''
labels: ''
assignees: ''

---

## 🚀 Feature
Expand All @@ -13,4 +17,4 @@ about: Submit a proposal/request for a new PyTorch Transformers feature

## Additional context

<!-- Add any other context or screenshots about the feature request here. -->
<!-- Add any other context or screenshots about the feature request here. -->
6 changes: 5 additions & 1 deletion .github/ISSUE_TEMPLATE/migration.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
---
name: "\U0001F4DA Migration from PyTorch-pretrained-Bert"
about: Report a problem when migrating from PyTorch-pretrained-Bert to Transformers
title: ''
labels: ''
assignees: ''

---

## 📚 Migration
Expand Down Expand Up @@ -40,4 +44,4 @@ Details of the issue:

## Additional context

<!-- Add any other context about the problem here. -->
<!-- Add any other context about the problem here. -->
6 changes: 5 additions & 1 deletion .github/ISSUE_TEMPLATE/question-help.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
---
name: "❓Questions & Help"
about: Start a general discussion related to PyTorch Transformers
title: ''
labels: ''
assignees: ''

---

## ❓ Questions & Help

<!-- A clear and concise description of the question. -->
<!-- A clear and concise description of the question. -->
15 changes: 7 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,15 +80,15 @@ pip install transformers
Here also, you first need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refere to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.

When TensorFlow 2.0 and/or PyTorch has been installed, you can install from source by cloning the repository and runing:
When TensorFlow 2.0 and/or PyTorch has been installed, you can install from source by cloning the repository and running:

```bash
pip install [--editable] .
```

### Tests

A series of tests is included for the library and the example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/transformers/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
A series of tests are included for the library and the example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/transformers/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).

These tests can be run using `pytest` (install pytest if needed with `pip install pytest`).

Expand Down Expand Up @@ -120,8 +120,7 @@ At some point in the future, you'll be able to seamlessly move from pre-training
5. **[XLNet](https://github.com/zihangdai/xlnet/)** (from Google/CMU) released with the paper [​XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
6. **[XLM](https://github.com/facebookresearch/XLM/)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
7. **[RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8. **[DistilBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation)** (from HuggingFace), released together with the blogpost [Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5
) by Victor Sanh, Lysandre Debut and Thomas Wolf.
8. **[DistilBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation).

These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).

Expand Down Expand Up @@ -394,7 +393,7 @@ This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-s
### `run_generation.py`: Text generation with GPT, GPT-2, Transformer-XL and XLNet

A conditional generation script is also included to generate text from a prompt.
The generation script includes the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by Aman Rusia to get high quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).
The generation script includes the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by Aman Rusia to get high-quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).

Here is how to run the script with the small version of OpenAI GPT-2 model:

Expand Down Expand Up @@ -426,7 +425,7 @@ Here is a quick summary of what you should take care of when migrating from `pyt

The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.

The exact content of the tuples for each model are detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
The exact content of the tuples for each model is detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).

In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.

Expand Down Expand Up @@ -458,7 +457,7 @@ By enabling the configuration option `output_hidden_states`, it was possible to

### Serialization

Breaking change in the `from_pretrained()`method:
Breaking change in the `from_pretrained()` method:

1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.

Expand Down Expand Up @@ -534,4 +533,4 @@ for batch in train_data:

## Citation

At the moment, there is no paper associated to Transformers but we are working on preparing one. In the meantime, please include a mention of the library and a link to the present repository if you use this work in a published or open-source project.
At the moment, there is no paper associated with Transformers but we are working on preparing one. In the meantime, please include a mention of the library and a link to the present repository if you use this work in a published or open-source project.
19 changes: 8 additions & 11 deletions docs/source/_static/css/huggingface.css
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
huggingface.css

/* The literal code blocks */
.rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
color: #6670FF;
Expand Down Expand Up @@ -44,11 +42,11 @@ huggingface.css
/* The text items on the toc tree */
.wy-menu-vertical a {
color: #FFFFDD;
font-family: Calibre-Light;
font-family: Calibre-Light, sans-serif;
}
.wy-menu-vertical header, .wy-menu-vertical p.caption{
color: white;
font-family: Calibre-Light;
font-family: Calibre-Light, sans-serif;
}

/* The color inside the selected toc tree block */
Expand Down Expand Up @@ -85,7 +83,7 @@ a {
border-right: solid 2px #FB8D68;
border-left: solid 2px #FB8D68;
color: #FB8D68;
font-family: Calibre-Light;
font-family: Calibre-Light, sans-serif;
border-top: none;
font-style: normal !important;
}
Expand Down Expand Up @@ -136,14 +134,14 @@ a {

/* class and method names in doc */
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) code.descclassname{
font-family: Calibre;
font-family: Calibre, sans-serif;
font-size: 20px !important;
}

/* class name in doc*/
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname{
margin-right: 10px;
font-family: Calibre-Medium;
font-family: Calibre-Medium, sans-serif;
}

/* Method and class parameters */
Expand All @@ -160,17 +158,17 @@ a {

/* FONTS */
body{
font-family: Calibre;
font-family: Calibre, sans-serif;
font-size: 16px;
}

h1 {
font-family: Calibre-Thin;
font-family: Calibre-Thin, sans-serif;
font-size: 70px;
}

h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend{
font-family: Calibre-Medium;
font-family: Calibre-Medium, sans-serif;
}

@font-face {
Expand All @@ -196,4 +194,3 @@ h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend{
src: url(./Calibre-Thin.otf);
font-weight:400;
}

3 changes: 1 addition & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,7 @@ The library currently contains PyTorch and Tensorflow implementations, pre-train
5. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
6. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf.

8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2 <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.

.. toctree::
:maxdepth: 2
Expand Down
15 changes: 12 additions & 3 deletions docs/source/pretrained_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,12 @@ Here is the full list of the currently provided pretrained models together with
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-clm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-German model trained with CLM (Causal Language Modeling) on the concatenation of English and German wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-17-1280`` | | 16-layer, 1280-hidden, 16-heads |
| | | | XLM model trained with MLM (Masked Language Modeling) on 17 languages. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-100-1280`` | | 16-layer, 1280-hidden, 16-heads |
| | | | XLM model trained with MLM (Masked Language Modeling) on 100 languages. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| RoBERTa | ``roberta-base`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
| | | | RoBERTa using the BERT-base architecture |
Expand All @@ -113,11 +119,14 @@ Here is the full list of the currently provided pretrained models together with
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DistilBERT | ``distilbert-base-uncased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
| | | (see `details <https://medium.com/huggingface/distilbert-8cf3380435b5>`__) |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-uncased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
| | | (see `details <https://medium.com/huggingface/distilbert-8cf3380435b5>`__) |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilgpt2`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+

.. <https://huggingface.co/transformers/examples.html>`__
Loading

0 comments on commit deb2e71

Please sign in to comment.