Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Needs User Input!] Is there demand for a kedro-tools plugin? #1622

Closed
merelcht opened this issue Jun 16, 2022 · 9 comments
Closed

[Needs User Input!] Is there demand for a kedro-tools plugin? #1622

merelcht opened this issue Jun 16, 2022 · 9 comments
Labels
Stage: User Research 🔬 Ticket needs to undergo user research before implementation

Comments

@merelcht
Copy link
Member

merelcht commented Jun 16, 2022

One of the suggestions made after researching the use of kedro CLI commands that can be considered as tools (e.g. kedro lint, kedro jupyter convert..) was to move these CLI commands to a separate plugin: kedro-tools/kedro-with-tools (name is tbd)

We'd like to get users' opinions on the following proposal. Keep in mind that as was decided in the original research, all the following CLI commands will be removed: kedro lint, kedro test, kedro build-docs, kedro build-reqs, kedro jupyter convert, kedro activate nbstripout

(Below was originally written by @AntonyMilneQB on the research issue)

Proposal

We strip out all these commands from core kedro, all our current starters, remove packages from requirements.txt, etc.

We (or someone else) make a new plugin kedro-with-tools (name completely made up for now) that includes:

  • kedro tools group of commands (like the kedro dev group I said above) including lint, test, build-docs, whatever
  • dependencies to make all these work
  • a starter template (call it kedro with-tools-starter) that is heavier than the core one and includes any required configuration (e.g. pyproject.toml, .pre-commit-config.yaml). Along these lines: add simple starter kedro-starters#40

Important note: I say "we or someone else" because I think could be a very good candidate for an unofficial community-maintained plugin. Or we could get it started and then hand over, or we could maintain it but be much more relaxed in accepting updates than we are with the core kedro template.

User journey

Note that pip install immediately makes the new kedro-with-tools-starter alias available thanks to #1592.

Starting a new project:

pip install kedro-with-tools 
kedro new -s kedro-with-tools-starter  # Makes new project template with pre-commit etc. config
kedro tools lint etc. commands are now available

In existing project:

pip install kedro-with-tools 
kedro tools init  # Not sure if we'll actually need this command at all. But it would copy pre-commit etc. config to already existing project
kedro tools lint etc. commands are now available

Pros

  • The kedro CLI becomes very clean. We no longer have any little-used commands cluttering it up. The top-level groups like kedro pipeline, kedro catalog become much more discoverable because they're not hidden among lower-level commands all sitting at the same level
  • The core kedro template and starters become very clean and lighter weight
  • No need for any dev_requirements.txt. A project requirements.txt would include the bare minimum of what's actually needed. No need to ever touch these dependencies again (as with kedro-datasets, these project dev dependencies tend to move faster than than our core ones and cause a disproportionate number of conflicts and irritations, e.g. piptools has broken through conflicts with pip 3 times this year already)
  • We no longer need to worry so much about maintaining the kedro lint etc. commands. These are hard to keep up to date since best practice Python tooling changes (e.g. sphinx; this article. The current "recommendations" we make through the project template are kind of arbitrary (e.g. why do we do isort but not mypy in lint? Just because we haven't updated it for ages I think). We're not really best placed to judge what are the best tools, and it's always changing. If this is community maintained then we don't need to worry about staying on top of it at all!
  • It's consistent with the move to kedro-datasets, i.e. a lean core which is extendible
  • Very non-disruptive. Everyone who doesn't use these commands immediately benefits from a cleaner kedro; those who do want to use them can still do just by doing a single pip install. None of the current functionality is actually lost - it's just moved. In fact, user will have access to better, more up to date versions of these commands
  • Nicely showcases the power of plugins for providing custom CLI commands and starters

References that inspired this idea: #826; #844 (reply in thread); kedro-org/kedro-starters#40

Cons

  • Discoverability of these commands is harder than it currently is since wouldn't appear by default when you do kedro. On the other hand, discoverability of the "core" kedro commands is enhanced
  • We need to update our docs and messaging a bit. kedro is all about promoting good software engineer practice, and at the moment a (small) part of that is in our "recommendation" of the tools that we have in our project template and associated lint etc. commands. @yetudada above didn't see this is a problem. Given how few people use the commands, I don't think they can be a major part of the kedro value proposition
@merelcht merelcht added the Stage: User Research 🔬 Ticket needs to undergo user research before implementation label Jun 16, 2022
@merelcht merelcht added this to the Cleanup CLI commands milestone Jun 16, 2022
@WaylonWalker
Copy link
Contributor

This sounds very similar to how our template looks, we ripped out the entire cli from the default template, put it in a library, and replaced it with from <internal-lib> import cli. Pipeline Authors can still make custom cli functions, and we have only one place to change all of the built-in cli functions, and our own custom ones to get them in every project. If you do this the commands are just as discoverable and easy for those who don't want to rip it out.

I'm personally all in on this.

@deepyaman
Copy link
Member

deepyaman commented Jun 17, 2022

If the above were to be implemented, what users would do kedro new but not kedro new -s kedro-with-tools-starter? If everybody does indeed prefer kedro new -s kedro-with-tools-starter, you've essentially made the --starter option unavailable for anything else. Would you also now create kedro-spaceflights-with-tools-starter, kedro-pyspark-iris-with-tools-starter, etc.?

I'm also copying my full, initial response to the above proposal below for visibility:

Proposal

We strip out all these commands from core kedro, all our current starters, remove packages from requirements.txt, etc.

We (or someone else) make a new plugin kedro-with-tools (name completely made up for now) that includes:

  • kedro tools group of commands (like the kedro dev group I said above) including lint, test, build-docs, whatever
  • dependencies to make all these work
  • a starter template (call it kedro with-tools-starter) that is heavier than the core one and includes any required configuration (e.g. pyproject.toml, .pre-commit-config.yaml). Along these lines: add simple starter kedro-starters#40

@AntonyMilneQB--and everybody else in favor of killing kedro lint and kedro test, but kedro lint especially--I don't entirely see how this workflow would work.

I do think a non-trivial part of the value proposition of Kedro is introducing users to and enforcing basic software engineering best practices. Anecdotally, I remember being one of the first people on the client-facing side in QuantumBlack (at least in North America?) to enforce coding standards on projects over 3 years ago. More recently, I think a much larger percentage of projects do so--yes, in parallel to greater adoption of tools like Black and isort by the broader community, but also I think at higher rates due to it being bundled with Kedro. In my humble opinion, Kedro has always been opinionated, and part of the value comes from this opinionated nature.

I think removing the kedro lint command and documenting how to replicate its current behavior by leveraging the underlying tools (or something like pre-commit) directly is fine. Maybe this will help some users understand that there's no "magic".

However, going so far as to remove linter configuration needs to be considered much more carefully:

  • If there's to be any hope that 90% of Kedro users currently linting their code will still lint their projects, whatever Kedro template is spit out should satisfy said linters from day one. However, how do you do this if you don't have any canonical linter configuration?
  • Are most Kedro users (e.g. people under pressure to get projects rolling quickly, data scientists just wanting to do some experimentation) going to spend time looking up Black-compatible isort and Flake8 configs? How many of them even know this is a thing?

When it comes to kedro test, I honestly think most users still aren't writing tests (hence the low command usage), but making the config optional isn't going to make that any better. Again, totally fine with just pointing people to the equivalent pytest command here.

With regards to making these commands part of an optional plugin, if anything, I would make them part of a plugin installed by default. This doesn't reduce the maintenance burden, and you can't outsource the development of such a core plugin to the broader community, but it does make it optional for those who may have good reason to not want the functionality.

Finally, FWIW, I think the set of core tools included as part of the project template is actually pretty good. While I personally would include things like mypy and Prettier, I think the black/isort/Flake8 suite is a good representation of "the bare minimum" to say you're following software engineering best practices on this front, without being too difficult for the majority of users to follow. Outsourced to the community, I'm sure somebody will throw in random things like importlinter that may be nice to have, but definitely not at the same level as Black or isort in terms of getting somebody on board with better software development practices.

@deepyaman
Copy link
Member

On a slightly different note, I do think a kedro-tools plugin that offered a lot more would be good. For example, I think the Kedro end-to-end testing framework is something that could be exposed, to allow users to more easily write BDD tests. This is a pretty heavy lift right now, that involves understanding and copying code from the features/ dir of the Kedro repo.

@WaylonWalker
Copy link
Contributor

If the above were to be implemented, what users would do kedro new but not kedro new -s kedro-with-tools-starter? If everybody does indeed prefer kedro new -s kedro-with-tools-starter, you've essentially made the --starter option unavailable for anything else. Would you also now create kedro-spaceflights-with-tools-starter, kedro-pyspark-iris-with-tools-starter, etc.?

How different would these starters be in practice? Is it just a different cli.py? If it's just a different cli.py then you can do that with jinja and a cookiecutter question.

@merelcht
Copy link
Member Author

@deepyaman The research outcome shows that hardly any of our users really actively use the tool commands like kedro lint, so my guess is that a lot less people would end up using kedro new -s kedro-with-tools-starter. Of course this assumption could be wrong, so that's why in this issue we would like to know how much appetite there would be for a plugin that keeps all those CLI commands.

@deepyaman
Copy link
Member

@deepyaman The research outcome shows that hardly any of our users really actively use the tool commands like kedro lint, so my guess is that a lot less people would end up using kedro new -s kedro-with-tools-starter. Of course this assumption could be wrong, so that's why in this issue we would like to know how much appetite there would be for a plugin that keeps all those CLI commands.

@MerelTheisenQB But, from the way I read the research, you don't know why users aren't using commands like kedro lint, right? Are they running the underlying commands themsleves--pytest, black, isort, flake8, etc.--or are they not following these best practices at all? If they're just using the underlying commands, what I'm trying to say is that you don't want to make it harder to do the same by providing a template that doesn't satisfy those checks off the bat.

@AhdraMeraliQB
Copy link
Contributor

#2075 is also in favour of an external plugin (specifically for build-docs in this case)

@astrojuanlu
Copy link
Member

Yes, I think the point is: if we are to effectively move forward with the deprecation decision and remove these commands, it would be nice if we offered a migration path for folks that still want to use them.

@merelcht
Copy link
Member Author

merelcht commented Sep 5, 2023

This ticket hasn't attracted a lot of input from the community. It also overlaps a lot with the utilities workflow: #2388

I am closing this issue as something we're not building right now. Let's see what the feedback is on the revamped kedro new flow and if any tools like the ones described here are still needed then.

@merelcht merelcht closed this as not planned Won't fix, can't repro, duplicate, stale Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stage: User Research 🔬 Ticket needs to undergo user research before implementation
Projects
None yet
Development

No branches or pull requests

5 participants