Exploration Steps:
Different hyperparameters are explored for each algorithm. To see which ones exactly refer to the pdf explaining the experiment in its entirety.
Scikit-learn (python3) is used to evaluate the algorithms.
- Python3
- pip3
- conda
- jupyter notebook
- sklearn
- The data can be found here so no need to download it externally
- Navigate to the
unsupervised-learning
directory usingcd unsupervised-learning
- Install conda (refer to https://conda.io/projects/conda/en/latest/user-guide/install/index.html for instructions)
- Set up virtual environment:
conda env create -f environment.yml
conda activate unsupervised_learning
- The data will already be in the box link in the data folder.
- The first dataset is 'default_of_credit_card_clients.csv'
- The second dataset is 'phishing_dataset.csv'
- There is no need to navigate to the data folder as the jupyter notebooks will automatically access the necessary datasets
├── data
│ ├── default_of_credit_card_clients.csv
│ └── phishing_dataset.csv
├── images (will generate when notebooks are run)
├── notebooks
│ ├── default-payment-cluster.ipynb
│ ├── default-payment-ica.ipynb
│ ├── default-payment-isomap.ipynb
│ ├── default-payment-pca.ipynb
│ ├── default-payment-rp.ipynb
│ ├── phishing-cluster.ipynb
│ ├── phishing-ica.ipynb
│ ├── phishing-isomap.ipynb
│ ├── phishing-pca.ipynb
│ └── phishing-rp.ipynb
├── results
├── environment.yml
├── README.md
- data: holds the datasets
- images: holds all images generated from the notebooks. This will be originally empty due to the large amount of them and the size.
- notebooks: holds the jupyter notebook files
- results: holds .txt files for statistics about each algorithm, written from the jupyter notebooks
- Ensure that you complete the setup in section 1 and 2.
- Verify your conda environment is active. You should see (supervised_learning) at the beginning of your terminal line.
- From the
unsupervised-learning
directory launch Jupyter Notebook using the commandjupyter notebook
- It should automatically open 'http://localhost:8888/tree' in your browser but if not navigate there
- From there navigate to the notebooks folder. This folder contains all 8 (10) .ipynb files
- Open any notebook by double-clicking on it. (Note the process for running all the algorithms/notebooks is the same here on out)
- To run the code, click 'Run' in the toolbar then in that drop down and select 'Run All Cells'
- This will run all your cells and may take up to an hour+ depending on the algorithm and machine it's being run on
- You can view the progress in the output cells (most cells have a %%time to show how long each cell took and so you know when it's done)
- While it is running there will be an hour glass in Jupyter notebook tab. When it is done it will be a notebook icon
- Do this for all notebooks to run all 6 algorithms
The notebooks automatically save all text statistics to the results folder and all plots to the images folder, just navigate to the respective directory to view them
- https://www.kaggle.com/datasets/akashkr/phishing-website-dataset/data
- https://archive.ics.uci.edu/dataset/350/default+of+credit+card+clients
This code repository and all its contents are the intellectual property of the owner and are protected under applicable copyright laws. Unauthorized copying, distribution, or reproduction of any part of this repository, whether in whole or in part, is strictly prohibited. This includes, but is not limited to, duplicating, sharing, or deriving works without explicit written consent from the owner. By accessing this repository, you agree to respect these terms and refrain from any actions that violate intellectual property rights.