Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kedro project commands work from any directory inside a project #1831

Closed
antonymilne opened this issue Sep 5, 2022 · 16 comments · Fixed by #3683
Closed

Make kedro project commands work from any directory inside a project #1831

antonymilne opened this issue Sep 5, 2022 · 16 comments · Fixed by #3683
Assignees
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Issue: Feature Request New feature or improvement to existing feature

Comments

@antonymilne
Copy link
Contributor

Note. This ticket does not handle the question of what happens if you're outside the project altogether. That is done in #1829. This ticket is just about what happens if you're inside the project but not at project root.

Warning. This shouldn't be difficult, but the correct implementation needs a bit of careful thought and testing to get right!

When a kedro project command (e.g. kedro run) is run from inside a project but not the project root it doesn't work. There doesn't seem to be any good reason for this. Other tools (.e.g black, git) work from project subfolders, so let's do the same with kedro.

  • iterate up through directory structure to see if it's a kedro project. If yes then use that as the project path; if no then behave as now (i.e. will say "run command not found")
  • we have code to do this already in _find_kedro_project in the ipython extension. This would likely be made more performant if it used iterated through current_dir.parents rather than the while loop that's currently there. Do a quick bit of profiling to see what the best way to do this is
  • once the best way is found, we should put the function in a common place like utils so the ipython extension and CLI can both use it
  • for a start, for kedro to show all the project commands from inside a subdirectory you'll need to change KedroCLI(project_path=Path.cwd()) to KedroCLI(project_path=_find_kedro_project(Path.cwd()))
  • in addition, anywhere that starts a KedroSession will need to use the correct project root. This could be done by providing the right project_path whenever KedroSession is instantiated or (probably better) altering the KedroSession itself from self._project_path = Path(project_path or Path.cwd()).resolve() to self._project_path = Path(project_path or _find_kedro_project(Path.cwd()).resolve())
  • there are very likely other places that need to change too! This needs some care to get right

Warning. The change shouldn't actually cd the user to a different directory. i.e. this is the correct behaviour:

pwd # /Users/antony_milne/kedro_stuff/projects/iris/data
kedro run # should work as if I ran it from /Users/antony_milne/kedro_stuff/projects/iris/
pwd # /Users/antony_milne/kedro_stuff/projects/iris/data - hasn't changed!
@astrojuanlu
Copy link
Member

astrojuanlu commented May 4, 2023

I think we can push this a little bit, we already have a semi working solution, it should not be too hard to do it. We can at least merge the one provide better error message.

Originally posted by @noklam in #2541 (comment)

@noklam is that solution in some unmerged pull request? Maybe #1720 ?

@noklam
Copy link
Contributor

noklam commented May 6, 2023

@astrojuanlu there are two part of it.

  1. Provide a better warning when Kedro command is not found (not running from project root)
  2. Make command works anywhere within a project

The PR you mentioned is the draft, after technical design we decide to split it in two the two tickets.
The PR is mainly for 1. Which I think we can push a little bit as we already have most of the implementation.
2. Is this ticket.

@noklam
Copy link
Contributor

noklam commented Aug 15, 2023

Maybe we can frame this as making kedro more usable in standalone or non standard kedro project in some way.

@astrojuanlu
Copy link
Member

I think this is going to be a usability improvement for everybody, even for normal Kedro users. It happens very often, even to myself. @antonymilne already left some ideas on how this could be done.

@noklam
Copy link
Contributor

noklam commented Aug 16, 2023

Yes, I started this initially in #1720

We already have the PoC for implementation so it shouldn't be too difficult to implement.

@datajoely
Copy link
Contributor

At the very least giving a helpful error message that the command the user typed is known, but they need to be in the right directory to do it

@noklam
Copy link
Contributor

noklam commented Aug 18, 2023

I think the only concern for this is how far should it try to look for the pyproject.toml, and may have issues with some mono kedro repo.

@astrojuanlu
Copy link
Member

At the moment, kedro.ipython it's iterating until the root of the filesystem:

def _find_kedro_project(current_dir: Path): # pragma: no cover
while current_dir != current_dir.parent:
if _is_project(current_dir):
return current_dir
current_dir = current_dir.parent
return None

we could use .parents instead https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parents

@noklam
Copy link
Contributor

noklam commented Aug 18, 2023

I think I just found a good example that we need to unit test, potentially issue even with current kedro ipython
Noted: notebook is outside of analysis (the kedro project)

Trimmed part of the tree
├── analysis <- KEDRO PROJECT FOLDER
│   ├── README.md
│   ├── conf
│   │   ├── base
│   ├── pyproject.toml <--
│   ├── setup.cfg
│   ├── src
│   │   ├── analysis
│   │   │   ├── settings.py
│   │   ├── setup.py
├── notebooks
│   ├── test.ipynb <- NOTEBOOK 
├── parameters.yaml
├── pyproject.toml <--

https://linen-slack.kedro.org/t/14152840/hi-there-i-have-a-jupyter-notebook-on-a-managed-service-not-#8c50d8bd-6808-4b92-92c9-a978d2bc0e18

@noklam
Copy link
Contributor

noklam commented Aug 18, 2023

Technically this fall out of scope because it's not inside kedro project, but I think it is a good example that we need to test it throughly and think outside of a "standard kedro project" structure.

@astrojuanlu
Copy link
Member

Also maybe we should make them work anywhere?

For example, I wanted to do kedro docker --help, but it wouldn't even let me do it unless I was on a Kedro project. Which, I have to say, was quite annoying.

@astrojuanlu
Copy link
Member

Getting out of detecting a Kedro project makes this task even easier, I'd say (and then, the moment you want to actually run the command, if there's no pyproject.toml in the CWD, we just raise an error)

@datajoely
Copy link
Contributor

In my kedro-rich prototype I had this:
https://github.com/datajoely/kedro-rich/blob/bc0313f0bd909a52abbcd730a4995d211082dded/kedro_rich/utilities/kedro_override_utils.py#L179

which at the very least gave you a helpful message if you were out of a project directory.

@noklam
Copy link
Contributor

noklam commented Mar 4, 2024

I'd add that testing this with a more complicated repository structure, where it contains a normal python project and a kedro project. The key is to make sure it functions properly when there are multiple pyproject.toml or at least fail gracefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants