Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-3023] Enable plugins to extend starter-aliases default list #1040

Closed
Galileo-Galilei opened this issue Nov 15, 2021 · 8 comments · Fixed by #1592
Closed

[KED-3023] Enable plugins to extend starter-aliases default list #1040

Galileo-Galilei opened this issue Nov 15, 2021 · 8 comments · Fixed by #1592
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@Galileo-Galilei
Copy link
Member

Galileo-Galilei commented Nov 15, 2021

Description

Is your feature request related to a problem? A clear and concise description of what the problem is: "I'm always frustrated when ..."

We have at work a few kedro starters templates which enable to create kedro projects with a bunch of internal configuration (pre-commit, gitlab-ci, custom ConfigLoader...). We also have a custom plugin which modifies kedro's internal (especially the CONF_ROOT constant and the CLI).

When we distribute our template insisde the organisation, we have to give the entire VCS path to the project to our user, and it is not very convenient. I wish we were able to decalre a new starter-alias in the plugin, and when the plugin is installed and auto-discovered by kedro, users can user:

kedro new --starter=my_starter where my_starter VSC path was declared in the plugin, maybe like this:

_PLUGIN_STARTER_ALIAS: ​{my_starter: path/to/VCS}

Context

Why is this change important to you? How would you use it? How can it benefit other users?

The 2 main benefits I expect are:

  • it facilitates the distribution of templates, since it makes installation easier for end users
  • it facilitates maintenance: if the template path changes for whatever reason (regrouping all the templates in a single repo, moving to another gitlab/github instances for orgnaisations internal reasons), the modification is transparent for end users

Possible Implementation

Add a _PLUGIN_STARTER_ALIAS constant in plugins which will be user to extend the https://github.com/quantumblacklabs/kedro/blob/bce62c824a4786e7cf9cc355b31840879ec851bf/kedro/framework/cli/starters.py#L31-L38 set.

It likely needs a bit more refactoring to make it a dict accepting different _STARTERS_REPO path which is currently a hardcoded path to a single one github repo.

@Galileo-Galilei Galileo-Galilei added the Issue: Feature Request New feature or improvement to existing feature label Nov 15, 2021
@datajoely
Copy link
Contributor

I like this idea! Let me bring it up with the team

@antonymilne
Copy link
Contributor

antonymilne commented Nov 17, 2021

I also like this idea but am a little bit unsure about implementation. Thinking out loud here...

As you point out, we need to make it a bit more general than just allowing new starter names to be added, since you will need to accept different _STARTER_REPO paths as well. Since all of the logic to do with different starters is really just working out the right values of directory, template_path and checkout, let's make this as general as possible so that users can specify their own starters however they like rather than just a single custom repo URL. e.g. you could have one starter with alias my_starter-master and another with alias my_starter-develop that points to a different branch or even a different repo.

The default _STARTER_ALIASES would then become something like this. Maybe there's also space for a description or docs that appears when you do kedro starter list?

_STARTER_ALIASES = { 
     "astro-iris": {"template_path": "git+https://github.com/quantumblacklabs/kedro-starters.git", "directory": "astro-iris"}
     "mini-kedro": {"template_path": "git+https://github.com/quantumblacklabs/kedro-starters.git", "directory": "mini-kedro"}
     ...
}

So far so good, but what I don't understand and hopefully @Galileo-Galilei can explain is how the user extends the _STARTERS_ALIASES dictionary through a plugin? Effectively you want to do:

_STARTERS_ALIASES.update({"my_starter": {"template_path": "blahblahblah", directory="blah"}})

... but I don't see how that change propagates through for when you do kedro new. Would you need to overwrite the whole new command on the CLI to take in your modified _STARTERS_ALIASES or is there a nicer way? How are you changing CONF_ROOT from a plugin? Feel like I'm missing something here!

@Galileo-Galilei
Copy link
Member Author

Galileo-Galilei commented Nov 23, 2021

TBH, I haven't thought in details about the exact implementation. Actually I want to do exactly what you describe above (updating the _STARTER_ALIASES dictionary, provded you change the actual implementation to a dictionary).

My naive idea was that you might have a kedro.constants entrypoint:
setup(entry_points={"kedro.constants": ["plugin_name = plugin_name.plugin_constants:KEDRO_CONSTANTS"]})

and in a plugin_constants.py:

KEDRO_CONSTANTS={
_STARTER_ALIASES: {"my_starter": {"template_path": "blahblahblah", directory="blah"},
CONF_ROOT=[path/to/conf]
}

Which would enable you to access these constants when loading the entrypoints and override some of the constants.

Honestly, this feels quite hardcoded and not very flexible. Maybe dynaconf has the possibility to override settings.py (in a way similar to kedro environments) and we could create a settings.py file in the plugin with these constants? I am not very familiar with the mechanism so I can't quite tell if it is easy to do.

P.S: Specifically for CONF_ROOT, i think that in general it is more flexible to inject it at runtime through the CLI as suggestedin your user research on configuration, something like kedro run --conf-root=/home/ivan/my_new_conf/

@antonymilne
Copy link
Contributor

antonymilne commented Dec 3, 2021

My feeling is that things like CONF_ROOT that are in settings.py probably don't belong in an entrypoint. As I understand it, we are deliberately moving from the system of registration hooks (like register_catalog) to instead specify library components in settings.py so that you can't modify these things through a pip install. So I'm not sure it would make sense for a plugin to be able to modify variables like CONF_ROOT.

_STARTER_ALIASES is obviously a bit different though, given that it needs to be specified outside a kedro project. For this I don't immediately see another way to achieve it other than via entrypoints like you suggest.

@Galileo-Galilei
Copy link
Member Author

Galileo-Galilei commented Dec 9, 2021

Hi @AntonyMilneQB, just to clarify, I don't think CONF_ROOT belongs to an entrypoint neither (actually, all project-related constants should be declared in settings.py: plugins can already access/supercharge them if necessary juste by importing them if they want).

The hacky entrypoint solution is only necessery for global constants (like _STARTER_ALIASES) which need to be modified before the project is created. To be honest, I don't see any other use case for now apart from updating this starter dictionnary.

P.S: I really like the settings.py refactoring, I think it is both more secure (not executing random code on install) and more modular (basically, it seems that all the AbstractDatasets and the ConfigLoader could be in separated librairies since they are now completely independant from the rest of the framework)

@antonymilne
Copy link
Contributor

@Galileo-Galilei Cool, thanks very much for your comments. Completely agree with what you say, and pleased you like the new settings.py system 👍 Let's see if we can make extendible starters happen.

@antonymilne antonymilne changed the title Enable plugins to extend starter-aliases default list [KED-3023] Enable plugins to extend starter-aliases default list Dec 9, 2021
@lorenabalan
Copy link
Contributor

Linking here some alternative thinking on starter discoverability before we get tied up to a single solution/idea: kedro-org/kedro-starters#40 (review)

@Galileo-Galilei
Copy link
Member Author

Galileo-Galilei commented Dec 9, 2021

@lorenabalan I've read the discussion and I think that browsing github to find kedro-starters among community-developped examples may be a good idea, but comes with some caveats to keep in mind:

  • it does not work for enterprise support (My company starter is not shared publicly and will likely never be)
  • it ties the discovery to a single code-forge provider, i.e. github (gitlab or bitbucket, despite being less common, are serious contenders which should not be ignored IMHO)
  • it may become overcrowded really fast (dozens of starters alreayd exists and the number keeps growing. It may be tedious to find in the CLI genererated list the one you want from a UX perspective).
  • Discovery may be done either at runtime (=browse github when running the CLI command), but it assumes you have internet connection (which is not always the case on internal servers for security reasons) or at build time (=have a fixed list when you release a version, but it ties discovery to your realease process which sounds quite slow /bad)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants