Skip to content

Improved baselines for sentence and document representations

Notifications You must be signed in to change notification settings

JSGrondin/revisited-baselines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

revisited-baselines

Improved baselines for sentence and document representations

This mini-project was undertaken as part of COMP-551 at McGill University.

The goal of this project was to revisit statements made in the work of Le & al with regard to the performance of Paragraph vectors in natural language processing applications. The authors claimed that Paragraph vectors achieved state-of-the-art results on text classification and sentiment analysis tasks. To verify this statement, the best baselines referenced in this report were reproduced. All comparisons were made on the IMDB sentiment dataset. A NB-SVM baseline was used and improved. The latter achieved an accuracy of 92.096% on the test set. This is 0.876% above the baseline reported in the original article.

The following scripts were used:

data_load.py : to load review comments

textprocessing.py : to remove special characters, stop words, lemmatize or stem words, etc

pipeline.py : main file used to generate predictions

See the writeup.pdf for details on the methodology and results.

About

Improved baselines for sentence and document representations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages