The goal is to show how we can improve a very simple, notebook-based, project into a very readable, reusable and changeable one.
Steps go as follows :
- 0 : Initial project. Base code from janakiev made a lot worse on purpose.
- 1 : Improve the notebook itself : Add markdown, better and more pythonic code, etc.
- 2 : Add some extra files : README.md and requirements.txt and use an isolated environment.
- 3 : Separate notebooks as a DAG.
- 4 : Externalise some of the code for better readability.
- 5 : Unit test most of the externalised code.
- 6 : Using Papermill for parametrized execution.
The overall final project architecture is a free interpretation of the Cookie Cutter Data Science project.
Extra interesting documentation on this matter :
- Joel Grus - I don't like notebooks
- Dan Bader - Python Tricks
- Papermill
- Jupyter Notebook Extensions
- JupyterLab
- Working with Jupyter Notebooks in VSCode
- TQDM
- NbDime
- Ten Simple Rules for Reproducible Research in Jupyter Notebooks
- Jupyter Notebook Best Practices
- Working efficiently with JupyterLab Notebooks
- Bringing the best out of Jupyter Notebooks for Data Science
- Jupyter Notebook Manifesto: Best practices that can improve the life of any developer using Jupyter notebooks
- Jupyter Lab: Evolution of the Jupyter Notebook
- PEP8