-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configuration to disable hooks tracing #4504
Comments
Hello @marcelopio, thanks for reporting the issue. We have some questions that will help us to proceed with the issue:
|
Sure! 1- It tries to format the dataset, on my case, calling pandas tostring. It doesn't print any logs because debug is not enabled. So it is calling tostring and thus taking a long time unnecessarily. 2- I think the instance I was debugging had an input of 8mb pickled of a pandas dataset, and it was breaking it to a dict of series and doing some formatting. I probably can share the cProfile logs for both the 0.18.8 and 0.19.10, but I need to check if there isn't any confidential information so it may take a while |
Thank you, @marcelopio After debugging, I can also confirm the following behaviour for my tests:
The current workaround to mitigate this suggested by @astrojuanlu is: # settings.py
from pluggy._callers import _multicall
from kedro.framework.cli.hooks.manager import get_cli_hook_manager
_cli_hook_manager = get_cli_hook_manager()
_cli_hook_manager._inner_hookexec = _multicall @marcelopio please let us know if it worked for you |
Since we already have issues with tracing (#2630), a possible solution is to add an option to enable/disable tracing or make it only when the Setting kedro/kedro/framework/hooks/manager.py Line 29 in 9c70bae
|
I am migrating from kedro 0.18.8 to 0.19.10 and suddenly all my pipelines are slower. In one instance a pipeline that was taking 20min is now taking 1h.
After a lot of investigation I narrowed the problem to these two lines:
kedro/kedro/framework/hooks/manager.py
Lines 29 to 30 in 9c70bae
These enable tracing to hooks which on pluggy will go to this algorithm:
https://github.com/pytest-dev/pluggy/blob/4eb41bb532fe1edc4efe756367d145f626b82a95/src/pluggy/_tracing.py#L37-L38
I have some datasets that are very big, and when the 'after_node_run' hook is called, this try to log the whole dataset even when I don't have DEBUG enabled.
There should be a config to enable hook tracing only when I need, and disabling this should be the default for production environments.
The text was updated successfully, but these errors were encountered: