Skip to content

visualizing the data, some basic algorithms in Natural Language Processing (NLP)

Notifications You must be signed in to change notification settings

yiyichanmyae/nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nlp

This repository includes the notebooks which handles the basic things, likes algorithms and basic tools for doing NLP (Natural Language Processing) within each specific folder. Mainly. this is intended for the beginners who want to start NLP stuffs. It shows what kinds of things he/she should be familiar with with detailed explanation.

Directories / Folders

  1. tensorflow_dev_assignments
  • notebooks the assignments and labs of the Natural Language Propressing in Tensorflow course from DeepLearning.AI on Coursera
  1. visalization_with_matplotlib
  • notebooks to see different basic visualization with matplotlib in each cell.
  1. topic_modeling
  • notebooks to do topic modeling with Latent Dirichlet Allocation.
  1. data
  • stores Data Files such as .csv and Model files which are needed to load in the notebooks

Notebooks

tokenize_basic_tensorflow_keras.ipynb

  • Notebook with basic tokenization code to tokenize the sentences with spaces using tensorflow and keras

WordNet.ipynb

  • checking synonyms and hypernyms of WordNet from NLTK

preprocessing.ipynb

  • normalizing and tokenizing the tweets including processing with stopwords, punctuations, stemming, lowercase and hyperlinks, needs to import utils.py

utils.py

linear_algebra.ipynb

  • the notebook how to do linear algebra with vectors and matrices with numpy

manipulating_word_embeddings.ipynb

  • to see how word vectors works and find the relations betweens words. will need to upload the model file word_embeddings_subset.p.

building_and_visualizing_word_frequencies.ipynb

  • to create word frequencies for feature extraction, needs to import utils.py

Explanation_PCA.ipynb

  • Explaining PCA, based on the Singular Value Decomposition (SVD) of the Covariance Matrix of the original dataset, related to Eigenvalues and Eigenvectors which are used as The Rotation Matrix.pdf
  • might need some images under the images directory for the display in the notebook

logistic_regression_model.ipynb

LogisticRegression_fromScratch.ipynb

  • building and evaluating the Logistic Regression from Scratch
  • does Preprocessing, Feature Extraction, predicting new tweets
  • includes implementing loss function and the gradient descent learning algorithm from Scratch
  • needs to import utils.py and w1_unittest.py

visualizing_NaiveBayes.ipynb

wikipedia_library.ipynb

  • how to get the data from Wikipedia

Datasets

API

Tensorflow Subword Text Encoder or Subword Tokenizer

About

visualizing the data, some basic algorithms in Natural Language Processing (NLP)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published