-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing the project context inside hooks implementations #66
Comments
I suggest that we go with the first solution (call |
I digged into the implementation details of the first solution, and I can tell you it's going to be very hacky. The One of the many side effets of using global variable as storage, is that the order of the hooks list given by the user in his kedro project will matter. The before_pipeline_run has to be always the first on the list. I suggest to go for the second solution, while having in mind the migration to the third one, as soon as the kedro 0.17.0 is released. The second solution look like this Do you see another way to implement the first solution ? I'll create a pull request for this. |
I understand that the second solution is much cleaner than the first (mostly because the first create another context object for each node) but i don't why you need it to create a global variable for the first. I think that creating an attribute on the fly is enough for our use cases (see here): def before_pipeline_run(
self, run_params: Dict[str, Any], pipeline: Pipeline, catalog: DataCatalog
) -> None:
"""Hook to be invoked before a pipeline runs.
Args:
run_params: The params used to run the pipeline.
Should be identical to the data logged by Journal with the following schema::
{
"run_id": str
"project_path": str,
"env": str,
"kedro_version": str,
"tags": Optional[List[str]],
"from_nodes": Optional[List[str]],
"to_nodes": Optional[List[str]],
"node_names": Optional[List[str]],
"from_inputs": Optional[List[str]],
"load_versions": Optional[List[str]],
"pipeline_name": str,
"extra_params": Optional[Dict[str, Any]]
}
pipeline: The ``Pipeline`` that will be run.
catalog: The ``DataCatalog`` to be used during the run.
"""
self.context=load_context(run_params["project_path"])
self.env=run_params["env"] # is it necessary?
self.extra_params=run_params["extra_params"] # is it necessary? The informations will be accessible later in the hook. For our use cases, it is even simpler:
kedro-mlflow/kedro_mlflow/framework/hooks/node_hook.py Lines 21 to 40 in e0afaf4
something like: config=kedroMlflowConfig.from_dict(self.context.config_loader("mlflow.yml"))
...
kedro-mlflow/kedro_mlflow/framework/hooks/pipeline_hook.py Lines 65 to 67 in e0afaf4
And it should be ok. The reason for why I prefer solution 1 over solution 2 is that the first one is a non breaking change, while the second modifies the hooks arguments and consequently is a breaking change. Since none of them is the target solution, I think we can deal with it for a few months. Does it sounds good for you? |
…Galilei#30 Galileo-Galilei#31 Galileo-Galilei#72 Galileo-Galilei#29 Galileo-Galilei#62 - context access - kedro 16.5 - hook auto registration
Hi,
I'm a colleague of @Galileo-Galilei. We discussed the issue of accessing the project template and properties from a plugin. We raised the point in the kedro project
What will it solve :
No more Template assumptions. We will ask the kedro context where the template component resides (project_paths, conf_paths, src path, ...). That lead to less kedro updates regression, because we will no longer manage a template at our side.
No more configs loading logic assumptions. We will use the kedro context configLoader and get_crendentials (that can be overrided by user).
That solve : #64 #54 #30 #31 (User will use kedro credentials mechanisms, i'll put details in another issue) and a part of #62
How can we implement it (For now, i see three possibilities):
Calling a load_context inside before_pipeline_run hook and setting at the same time a global project_path variable, hoping that the before_pipeline_run hook is always the first hook that will be called in the futur. --> So hacky
Declaring the mlflow hooks as a project context property. Which means that the hooks can easily access to the current context --> No more auto registration
Waiting for the release of the kedro session. It will apparently allows access to the current context --> We don't know when will it land
Keep up the good work guys, I'm happy to join you :)
The text was updated successfully, but these errors were encountered: