Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

wildonion/uniXerr

Repository files navigation

🌐 See the Wiki to understand how stuff works!

AI Core Development Guide

⚠️ If you are working on development part, remember to change the local host(127.0.0.1) inside /etc/hosts/ to api.unixerr.com and tensorboard.api.unixerr.com for API and TensorBoard server respectively.
⚠️ Remember to call /users/add/info and /users/add/positions routes of API server after the classification is done on csv file of input data.
⚠️ You can't create an environment if the environment was exported on a different platform than the target machine.
❗️ Both piper and infra folders can only be controlled using controller.py middleware.

Setup

  • Start an Apache Cassandra server and fill out .env file with necessary variables
  • Create an environment with a specific python version: conda create -n uniXerr python=3.8
  • Create the environment using the uniXerr.yml file: conda env create -f uniXerr.yml
  • Activate uniXerr environment: conda activate uniXerr
  • Update the environment using uniXerr.yml file: conda env update -f uniXerr.yml --prune
  • Export your active environment to uniXerr.yml file: conda env export | grep -v "^prefix: " > uniXerr.yml
  • Install pm2: wget -qO- https://getpm2.com/install.sh | bash
  • Install completion for typer-cli: typer --install-completion
  • Create a docs file from uniXerr CLI: typer app.py utils docs --name uniXerr --output uniXerr-cli.md

Usage

$ typer app.py run
Usage: typer run [OPTIONS] COMMAND [ARGS]...

  【  uniXerr CLI controller  】

Options:
  --help  Show this message and exit.

Commands:
  classify-positions
  cluster-positions
  develop

$ typer app.py run cluster-positions --help
Usage: typer run cluster-positions [OPTIONS]

Options:
  --generate-fake-samples      Generating fake samples for training.
  --epoch INTEGER RANGE        Number of epoch for training VAE.
  --batch-size INTEGER RANGE   Number of batch size for training VAE.
  --device TEXT                Training device. cpu or cuda
  --num-workers INTEGER RANGE  Number of workers for pytroch dataloader
                               object.

  --latent-dim INTEGER RANGE   Dimension of VAE latent space.
  --ddo                        Force deletion with confirmation for dataloader
                               object.

  --dpm                        Force deletion with confirmation for pre-
                               trained VAE model.

  --cluster-on-raw-data        Clustering on pc_features dataset, default is
                               set to VAE latent space

  --cluster-method TEXT        Clustering method. kmeans or hdbscan; hdbscan
                               is not suitable for latent space of VAE and has
                               some drawbacks for new dataset.

  --plot-method TEXT           Plotting method for data. pca or tsne; if you
                               want plot data before clustering on different
                               methods just remove the pc_dataloader.pth with
                               --ddo option.

  --help                       Show this message and exit.

$ typer app.py run classify-positions --help
Usage: typer run classify-positions [OPTIONS]

Options:
  --csv-path FILE              Path to labeled pc_features csv dataset.
  --input-data-csv-path FILE   Path to input data csv for classification.
  --ddo                        Force deletion with confirmation for dataloader
                               objects.

  --dpm                        Force deletion with confirmation for pre-
                               trained classifier model.

  --epoch INTEGER RANGE        Number of epoch for training classifier.
  --batch-size INTEGER RANGE   Number of batch size for training classifier.
  --device TEXT                Training device. cpu or cuda
  --num-workers INTEGER RANGE  Number of workers for pytroch dataloader
                               object.

  --help                       Show this message and exit.

$ typer app.py run develop --help
Usage: app.py run develop [OPTIONS]

Options:
  --workers INTEGER RANGE  Number of workers
  --help                   Show this message and exit.

Running in development mode: __ API docs

$ typer app.py run develop --workers 10

Export cassandra table into csv file:

$ cqlsh api.unixerr.com -u username -p password -e "copy unixerr.table_name to '/path/to/table_name.csv' with HEADER = true"

Import exported csv file into cassandra table:

$ cqlsh api.unixerr.com -u username -p password -e "copy unixerr.table_name from '/path/to/table_name.csv' with HEADER = true"

Running TensorBoard for visualization of training and testing DL models:

$ tensorboard --host=tensorboard.unixerr.com --logdir=runs

uniXerr CLI usage


Results

📌 Position Clustering Process

Dataloader Object - MinMax Scaler

Fake Dataset for Offline Training

📊 Plotted Dataset before Clustering using PCA - Standard Scaler

📊 Plotted Dataset before Clustering using TSNE - Standard Scaler

Clustered Dataset Based on Latent Space of Pre-trained VAE model

Clustered Dataset Based on Position Clustering data

VAE Pre-trained Model - Normal PDF

📊 Clusters Found by KMeans on Latent Space of Pre-trained VAE model

📊 Clusters Found by KMeans on Position Clustering Dataset - Plotted using PCA | Standard Scaler

📊 Clusters Found by KMeans on Position Clustering Dataset - Plotted using TSNE | Standard Scaler

📊 VAE Model Training Loss

📌 Position Classification Process

Training Dataloader Object of Clustered Dataset Based on Latent Space of Pre-trained VAE model

Testing Dataloader Object of Clustered Dataset Based on Latent Space of Pre-trained VAE model

Training Dataloader Object of Clustered Dataset Based on Position Clustering data

Testing Dataloader Object of Clustered Dataset Based on Position Clustering data

📊 Percentage of Positions before Classification on Clustered Dataset Based on Latent Space of Pre-trained VAE model

📊 Percentage of Positions before Classification on Clustered Dataset Based on Position Clustering data

Classifier Pre-trained Model - Trained and Tested on Clustered Dataset Based on Latent Space of Pre-trained VAE model

Classifier Pre-trained Model - Trained and Tested on Clustered Dataset Based on Position Clustering data

📊 Classifier Model Training Accuracy - Clustered Dataset Based on Latent Space of Pre-trained VAE model

📊 Classifier Model Testing Accuracy - Clustered Dataset Based on Latent Space of Pre-trained VAE model

📊 Classifier Model Training Loss - Clustered Dataset Based on Latent Space of Pre-trained VAE model

📊 Classifier Model Training Accuracy - Clustered Dataset Based on Position Clustering data

📊 Classifier Model Testing Accuracy - Clustered Dataset Based on Position Clustering data

📊 Classifier Model Training Loss - Clustered Dataset Based on Position Clustering data

Classification Results on Arbitrary Inputs - Classified using Pre-trained Model of Clustered Dataset Based on Latent Space of Pre-trained VAE model and Clustered Dataset Based on Position Clustering data