nlp

Apr 19, 2022

a5670e8 · Apr 19, 2022

Name	Name	Last commit message	Last commit date
parent directory ..
2021_01_21_multilingual_sentence_embeddings	2021_01_21_multilingual_sentence_embeddings	Fix broken reference to colab	Feb 2, 2021
2021_02_01_spacy_3_projects	2021_02_01_spacy_3_projects	Resolve f-string missing reference	Feb 2, 2021
2021_02_26_compact_transformers	2021_02_26_compact_transformers	Add nvidia-smi check	Mar 8, 2021
2021_03_18_pke_keyword_extraction	2021_03_18_pke_keyword_extraction	Add link to Gensim release & add details to recommendation	Mar 25, 2021
2021_04_22_shap_for_huggingface_transformers	2021_04_22_shap_for_huggingface_transformers	2 typos fix, and 1 sentence order fix.	Apr 26, 2021
2021_06_18_data_augmentation	2021_06_18_data_augmentation	Fix typo	Jul 5, 2021
2021_06_29_long_range_transformers	2021_06_29_long_range_transformers	Fix typo	Aug 10, 2021
2021_09_10_neural_keyword_extraction	2021_09_10_neural_keyword_extraction	Update date	Sep 10, 2021
2021_10_12_huggingface_optimum	2021_10_12_huggingface_optimum	removed output	Feb 14, 2022
2021_11_25_augmentation_lm	2021_11_25_augmentation_lm	Fix bug after deleting previous branch	Dec 16, 2021
gender_debiasing_cda	gender_debiasing_cda	Fix colab link	Mar 7, 2022
gpt2_quantization_onnxruntime	gpt2_quantization_onnxruntime	Add gpt2 quantization code	Apr 19, 2022
README.md	README.md	Update readme	Apr 19, 2022

README.md

ML6 NLP Quick Tips

Current content:

Multilingual Sentence Embeddings (21/01/2021): Gives an overview of various current multilingual sentence embedding techniques and tools, and how they compare given various sequence lengths.
Spacy 3.0 (01/02/2021): Spacy 3.0 has just been released and in this tip, we'll have a look at some of the new features. We'll be training a German NER model and streamline the end-to-end pipeline using the brand new spaCy projects!
Compact transformers (26/02/2021): Bigger isn't always better. In this tip we look at some compact BERT-based models that provide a nice balance between computational resources and model accuracy.
Keyword Extraction with pke (18/03/2021): The KEYNG (read king) is dead, long live the KEYNG! In this tip we look at pke, an alternative to Gensim for keyword extraction.
Explainable transformers using SHAP (22/04/2021): BERT, explain yourself! 📖 Up until recently language model predictions have lacked transparency. In this tip we look at SHAP, a way to explain your latest transformer based models.
Transformer-based Data Augmentation (18/06/2021): Ever struggled with having a limited non-English NLP dataset for a project? Fear not, data augmentation to the rescue ⛑️ In this week's tip, we look at backtranslation 🔀 and contextual word embedding insertions as data augmentation techniques for multilingual NLP.
Long range transformers (14/07/2021): Beyond and above the 512! 🏅 In this week's tip, we look at novel long range transformer architectures and compare them against the well-known RoBERTa model.
Neural Keyword Extraction (10/09/2021): Neural Keyword Extraction 🧠 In this week's tip, we look at neural keyword extraction methods and how they compare to classical methods.
HuggingFace Optimum (12/10/2021): HuggingFace Optimum Quantization ✂️ In this week's tip, we take a look at the new HuggingFace Optimum package to check out some model quantization techniques.
Text Augmentation using large-scale LMs and prompt engineering (25/11/2021): Typically, the more data we have, the better performance we can achieve 🤙. However, it is sometimes difficult and/or expensive to annotate a large amount of training data 😞. In this tip, we leverage three large-scale LMs (GPT-3, GPT-J and GPT-Neo) to generate very realistic samples from a very small dataset.
Gender debaising of datasets using CDA (25/01/2022): A lot of large language models are trained on webtext. However, this means that unintended biases can sneak into your model behaviour 😞. In this tip, we'll look at how to try and alleviate this bias using Counterfactual Data Augmentation ⚖️.
GPT2 Quantization using ONNXRuntime (19/04/2022): Large language models are costly to run, in this notebook we leverage ONNXRuntime to quantize and run our Dutch GPT2 model in a more efficient way 💰.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

nlp

nlp

README.md

ML6 NLP Quick Tips

Files

nlp

Directory actions

More options

Directory actions

More options

Latest commit

History

nlp

Folders and files

parent directory

README.md

ML6 NLP Quick Tips