The purpose of this repository is to extract useful information about the IMDB's (https://www.imdb.com) Top 250 list. Said list comprises the 250 highest rated movies on the platform and we are using a 2019 snapshot of it, which was created by Nigel Cox and can be found here.
The particular tasks were to create plots that illustrate the distribution of the movies per decade, the most popular actors in the list and also the most popular genres.
The data transformation and visualization was done using the Python programming language, more specifically Python version 3.7.4. In order to replicate the results please make sure you are on this version, e.g. by using a virtual environment or a compatible Docker image.
If you are in the correct folder (src/
) and have installed all required dependencies using
$ pip3 install -r requirements.txt
(or in some cases pip
may be used instead of pip3), you can generate the plots using the following command:
$ python3 plots.py