This repository includes the notebooks which handles the basic things, likes algorithms and basic tools for doing NLP (Natural Language Processing) within each specific folder. Mainly. this is intended for the beginners who want to start NLP stuffs. It shows what kinds of things he/she should be familiar with with detailed explanation.
- notebooks the assignments and labs of the
Natural Language Propressing in Tensorflow
course fromDeepLearning.AI
onCoursera
- notebooks to see different basic visualization with matplotlib in each cell.
- notebooks to do topic modeling with
Latent Dirichlet Allocation
.
- stores Data Files such as
.csv
and Model files which are needed to load in the notebooks
tokenize_basic_tensorflow_keras.ipynb
- Notebook with basic tokenization code to tokenize the sentences with spaces using tensorflow and keras
- checking synonyms and hypernyms of WordNet from
NLTK
- normalizing and tokenizing the tweets including processing with stopwords, punctuations, stemming, lowercase and hyperlinks, needs to import utils.py
- the utility file to be imported in preprocessing.ipynb, building_and_visualizing_word_frequencies.ipynb
- the notebook how to do linear algebra with vectors and matrices with numpy
manipulating_word_embeddings.ipynb
- to see how word vectors works and find the relations betweens words. will need to upload the model file word_embeddings_subset.p.
building_and_visualizing_word_frequencies.ipynb
- to create word frequencies for feature extraction, needs to import utils.py
- Explaining PCA, based on the Singular Value Decomposition (SVD) of the Covariance Matrix of the original dataset, related to Eigenvalues and Eigenvectors which are used as The Rotation Matrix.pdf
- might need some images under the
images
directory for the display in the notebook
logistic_regression_model.ipynb
- visualization and interpreting logistic regression
- uses logistic_features.csv under the
data
directory
LogisticRegression_fromScratch.ipynb
- building and evaluating the
Logistic Regression
from Scratch - does Preprocessing, Feature Extraction, predicting new tweets
- includes implementing loss function and the
gradient descent
learning algorithm from Scratch - needs to import utils.py and w1_unittest.py
- interpreting
Naive Bayes
Performance - need to upload data/bayes_features.csv
- how to get the data from Wikipedia
-
Wikipedia
Data wikipedia_library.ipynb
Tensorflow Subword Text Encoder or Subword Tokenizer