A beginner's project on "Automating Training, Evaluation, and Deploying Models using GitHub Actions" provided by Abid Ali Awan and datacamp.
Tutorial: https://www.datacamp.com/tutorial/ci-cd-for-machine-learning
Original repository: https://github.com/kingabzpro/CICD-for-Machine-Learning
Additions to the original project:
- Add local pre-commit hooks
- Add the GitHub Action 'pre-commit.ci lite'
- Add Docker
This project trains a random forest algorithm with scikit-learn pipelines to build a drug classifier. The evaluation is done automatically using CML (Continuous Machine Learning). A web application is build with gradio and deployed on the Hugging Face Hub.
From training to evaluation, the entire process is automated using GitHub Actions. Pushing code to the GitHub repository will trigger the training, evaluation and deployment, leading to an updated web application, model, and results on Hugging Face (see https://huggingface.co/spaces/jonas-luehrs/Drug-Classification).
The Makefile includes commands to install Python packages (install), format code (format), train scripts (train), and generate CML reports (eval), push the updated model and results to the "update" branch (update-branch), and upload the new model, results, and gradio app to the Hugging Face space (deploy).
The code quality is checked with pre-commit hooks. To install the pre-commit hooks run the following command. This is used to ensure that the code quality is consistent and that the code is formatted uniformly.
pip install pre-commit
pre-commit install
This will install the pre-commit hooks in your local repository. The pre-commit hooks will run automatically before each commit. If the hooks fail the commit will be aborted. You can skip the pre-commit hooks by adding the --no-verify
flag to your commit command.
The installed pre-commit hooks are:
black
- Code formatter (Line length 100)flake8
- Code linter (Selected rules)isort
- Import sorter
To check and autofix pull requests, the GitHub Action pre-commit.ci lite is used. To use it, you need to add it to this repository as a GitHub application. Here is an example of how the pre-commit-ci-lite bot autfixes a pull request.
Clone the repository:
git clone https://github.com/JonasLuehrs/mlops-workflow.git
Create a new virtual environment:
cd mlops-workflow
# Create a new virtual environment
python -m venv venv
# Activate environment for Linux
source venv/bin/activate
# Activate environment for Windows
source venv\Scripts\activate
# Install packages
pip install -r requirements.txt
The pipeline needs to be executed at least once, so that we have a model for drug classification available.
Run the app locally:
python ./App/drug_app.py
The Gradio app should now be accessible at http://localhost:7860.
Make sure that you have Docker installed, see here.
Execute the following commands
docker build -t gradio-app .
docker run -p 7860:7860 gradio-app
Similar to the first approach you should be able to access the Gradio app at http://localhost:7860.