Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate options for Starter restructure #2505

Closed
3 tasks
amandakys opened this issue Apr 12, 2023 · 2 comments
Closed
3 tasks

Investigate options for Starter restructure #2505

amandakys opened this issue Apr 12, 2023 · 2 comments
Assignees
Labels
Design: Research Issue: Feature Request New feature or improvement to existing feature

Comments

@amandakys
Copy link

amandakys commented Apr 12, 2023

Description

Our current approach to starters is not very cohesive with no clear strategy. Investigate ways to change this to help tackle adoption numbers. This builds off concept 2 in the motivation modular Kedro work desribed in #2388

Concept 2: Improve starter journey to increase accessibility of Kedro

Context

This ticket describes an alternative approach to starters, that is complementary to our Kedro Utilities proposal. As part of the Kedro utilities work, it came up that the proposed utilities and our starters had some similarities. Upon breaking down the structure of the existing starters, some patterns and inconsistencies started to emerge.

image

Furthermore, the current list of starters felt disparate and broad in their goals. There was a general leaning towards showcasing how Kedro could integrate with other libraries like pyspark, astro-airflow etc, through the use of a ‘example starter project’.

Possible Implementation

I propose that we combine our current concept of starters with our new utility modules workflow. At project creation, users will be asked to choose from different components that they want to add to their project.

Continuing with the theme of a simplified project starter, with Add-Ons, every resultant project would start from the same basic template. Building on this, if our team chooses to enforce a more consistent way to provide ‘example code’ i.e. default node and pipeline code, consistent test directory, this would also improve our user’s ability to mix and match examples.

Technical Details: cookiecutter allows you to initialise and add code based on booleans, this feature should enable us to adapt the ‘basic’ template based on a set of flags provided by the user on project creation.

Integration Add-Ons

Goal: allow Kedro to support third-party libraries

  • databricks
  • pyspark
  • astro-airflow
  • flake8, black, isort (linting)
  • pytest (testing)
  • Logging
  • Kedro-Viz (may have dependency on example pipeline)
    • experiment tracking
    • plotly integration
    • matplotlib integration

Example Projects

Goal: showcase Kedro features, as a team we show others how to use Kedro

  • Spaceflights
  • A complete data-processing pipeline
  • A complete project with Kedro-Viz experiment tracking set up

Initial Prototype (WIP)

Project Add-Ons 
================
Here you can select which add-ons you'd like to include. 
Don't worry if you change your mind you can always add/remove these later.
To read more about these utilities and what they do visit: kedro.org/

Add-Ons 
1) Linting :      Provides linting set up with Flake8, Black and isort 
2) Testing :      Provides testing set up with pytest 
3) Logging :      Provides more logging options
4) Documentation:      Provides documentation setup with Sphinx
5) Databricks:      Provides set up for working with Databricks
6) PySpark:       Provides set up configuration for working with PySpark
7) Airflow:       Provides minimal setup to deploy a pipeline to Airflow using Astronomer
8) Kedro-Viz:       Provides Kedro's native visualisation tool 
	 8a) Plotly:       Provides interactive pipeline visualisations 
	 8b) Experiment-Tracking:       Sets up experiment tracking, to compare runs 

Which add-ons would you like to include in your project? [1-4/all/1,3/none]: 

Would you like to include an example pipeline?[y/n]: 

Note: The flow for kedro-viz as an add-on needs further work and I am working with @NeroOkwa and the Viz team on this.

Beyond the Add-Ons

  • Add-ons will be supported by documentation about how to ‘insert these integrations manually
    • Phase 2 rollout can investigate ability to plug Add-Ons into existing projects
  • the starter repo can be used to showcase each ‘add-on’ and also facilitate the git-clone workflow
  • add-ons should be stackable
    • example pipeline will need to account for their integration choices
    • list of add-ons should grow as we support more third-party libraries
  • Create your own starter
    • community plugins can be installed then displayed an option on project creation
    • this would expand on our current ‘custom starter workflow’
    • Allow community participation (GetInData)
      • MLFlow
      • SnowFlake
      • We could keep a list of ‘approved’ community starters to be shared around. i.e. so we dont have to do all the integration work.

Design Next Steps

  • Prototype new creation flow options
  • Investigate details of Kedro-Viz as an add-on
  • Investigate standalone-datacatalog journey separately as part of incremental user journey work.
@amandakys amandakys added Issue: Feature Request New feature or improvement to existing feature Design: Research labels Apr 12, 2023
@amandakys amandakys added this to the Starter improvements milestone Apr 12, 2023
@amandakys amandakys self-assigned this Apr 12, 2023
@yetudada
Copy link
Contributor

yetudada commented Apr 13, 2023

Here are a list of officially supported starters, from the list, we'll see:

@amandakys amandakys changed the title [WIP] Investigate options for Starter restructure Investigate options for Starter restructure Apr 28, 2023
@yetudada yetudada added this to Roadmap May 3, 2023
@yetudada yetudada moved this to Discovery or Research - Now ⏳ in Roadmap May 3, 2023
@yetudada
Copy link
Contributor

yetudada commented Aug 4, 2023

I'll close this because #2838 exists 🥇

@yetudada yetudada closed this as completed Aug 4, 2023
@yetudada yetudada moved this from Discovery or Research - Now ⏳ to Shipped 🚀 in Roadmap Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Design: Research Issue: Feature Request New feature or improvement to existing feature
Projects
No open projects
Status: Shipped 🚀
Development

No branches or pull requests

2 participants