Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp Logging documentations in 0.19 #2665

Merged
merged 30 commits into from
Jun 29, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
dcc15e3
update release note
noklam Jun 9, 2023
874f1a2
update logging conf/base mention and filebase logging
noklam Jun 9, 2023
62abdfd
Reorder the logging page
noklam Jun 9, 2023
e6787d5
reorder and update logging docs
noklam Jun 14, 2023
d97198c
update docs
noklam Jun 14, 2023
1823dcb
Apply suggestions from code review
noklam Jun 19, 2023
7a91f11
add logging.yml template and reorder section
noklam Jun 19, 2023
8b4c0e5
Update logging and add logging.yml template
noklam Jun 19, 2023
0764cbf
Apply suggestions from code review
noklam Jun 22, 2023
4ecf24a
Merge branch 'develop' of https://github.com/kedro-org/kedro into nok…
noklam Jun 23, 2023
5eefee3
Structure docs paragraphs (#2717)
AhdraMeraliQB Jun 23, 2023
e27e334
Remove the broken aws step function link
noklam Jun 23, 2023
c12ddc6
Fix docs according to comments
noklam Jun 26, 2023
c234f11
Update docs/source/logging/logging.md
noklam Jun 26, 2023
29ffd05
update index
noklam Jun 26, 2023
517fc2f
Apply suggestions from code review
noklam Jun 27, 2023
1a04aeb
Merge branch 'develop' into noklam/revamp-logging-documentation-2664
noklam Jun 27, 2023
dad0012
add <code> block to render the logging.yml example
noklam Jun 27, 2023
f8ecc01
reorganise
noklam Jun 27, 2023
a383b4d
Make the style of logging consistent
noklam Jun 27, 2023
223e4a9
Merge branch 'develop' into noklam/revamp-logging-documentation-2664
noklam Jun 27, 2023
c4780dc
Revising logging to move page into index
stichbury Jun 28, 2023
4400f0f
Few minor tweaks
stichbury Jun 28, 2023
605b5e1
Fix broken internal linking
stichbury Jun 28, 2023
ad1fe98
fix default logging mention
noklam Jun 28, 2023
73651b3
Merge branch 'develop' into noklam/revamp-logging-documentation-2664
noklam Jun 28, 2023
2465bdd
Merge branch 'develop' into noklam/revamp-logging-documentation-2664
noklam Jun 28, 2023
02584d8
Merge branch 'develop' into noklam/revamp-logging-documentation-2664
noklam Jun 29, 2023
78b2979
add more description to the change needed
noklam Jun 29, 2023
16557e6
Fix the wrong `diff` to show debug logging
noklam Jun 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

## Breaking changes to the API


## Migration guide from Kedro 0.18.* to 0.19.*

# Upcoming Release 0.18.11
Expand All @@ -15,7 +16,7 @@
## Bug fixes and other changes

## Breaking changes to the API

* Logging is decoupled from `ConfigLoader`, use `KEDRO_LOGGING_CONFIG` to configure logging.
## Upcoming deprecations for Kedro 0.19.0

# Release 0.18.10
Expand Down
3 changes: 0 additions & 3 deletions docs/source/deployment/databricks/databricks_workspace.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@

This tutorial uses the [PySpark Iris Kedro Starter](https://github.com/kedro-org/kedro-starters/tree/main/pyspark-iris) to illustrate how to bootstrap a Kedro project using Spark and deploy it to a [Databricks cluster on AWS](https://databricks.com/aws).

```{note}
If you are using [Databricks Repos](https://docs.databricks.com/repos/index.html) to run a Kedro project then you should [disable file-based logging](../../logging/logging.md#disable-file-based-logging). This prevents Kedro from attempting to write to the read-only file system.
```

```{note}
If you are a Kedro contributor looking for information on deploying a custom build of Kedro to Databricks, see the [development guide](../../contribution/development_for_databricks.md).
Expand Down
6 changes: 3 additions & 3 deletions docs/source/logging/index.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Logging


Kedro uses [Python's `logging` library](https://docs.python.org/3/library/logging.html). Configuration is provided as a dictionary according to the [Python logging configuration schema](https://docs.python.org/3/library/logging.config.html#logging-config-dictschema) in two places:
1. [Default configuration built into the Kedro framework](https://github.com/kedro-org/kedro/blob/main/kedro/framework/project/default_logging.yml). This cannot be altered.
2. Your project-side logging configuration. Every project generated using Kedro's CLI `kedro new` command includes a file `conf/base/logging.yml`. You can alter this configuration and provide different configurations for different run environment according to the [standard Kedro mechanism for handling configuration](../configuration/configuration_basics.md).
Kedro uses [Python's `logging` library](https://docs.python.org/3/library/logging.html). Configuration is provided as a dictionary according to the [Python logging configuration schema](https://docs.python.org/3/library/logging.config.html#logging-config-dictschema) in [Kedro's default logging configuration](https://github.com/kedro-org/kedro/blob/main/kedro/framework/project/default_logging.yml).

You can alter this configuration and provide different configurations for different run environment according to the [standard Kedro mechanism for handling configuration](../configuration/configuration_basics.md).

```{note}
Providing project-side logging configuration is entirely optional. You can delete the `conf/base/logging.yml` file and Kedro will run using the framework's built in configuration.
Expand Down
163 changes: 112 additions & 51 deletions docs/source/logging/logging.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,124 @@
# Default logging configuration
Kedro's [default logging configuration](https://github.com/kedro-org/kedro/blob/main/kedro/framework/project/default_logging.yml) defines a handler called `rich` that uses the Rich logging handler to format messages. We also use the Rich traceback handler to render exceptions.

# Default framework-side logging configuration
By default, Python only shows logging messages at level WARNING and above. Kedro's logging configuration specifies that INFO level messages from Kedro should also be emitted. This makes it easier to track the progress of your pipeline when you perform a kedro run.

Kedro's [default logging configuration](https://github.com/kedro-org/kedro/blob/main/kedro/framework/project/default_logging.yml) defines a handler called `rich` that uses the [Rich logging handler](https://rich.readthedocs.io/en/stable/logging.html) to format messages. We also use the [Rich traceback handler](https://rich.readthedocs.io/en/stable/traceback.html) to render exceptions.
## Perform logging in your project
To add logging to your own code (e.g. in a node), you are advised to do as follows:

```python
import logging

By default, Python only shows logging messages at level `WARNING` and above. Kedro's logging configuration specifies that `INFO` level messages from Kedro should also be emitted. This makes it easier to track the progress of your pipeline when you perform a `kedro run`.
log = logging.getLogger(__name__)
log.warning("Issue warning")
log.info("Send information")
log.debug("Useful information for debugging")
```

## Project-side logging configuration
```{note}
Copy link
Contributor

@stichbury stichbury Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I realised I have no idea whatsoever what this means:

"The name of a logger corresponds to a key in the loggers section of the logging configuration file (e.g. kedro). See Python’s logging documentation for more information."

Does this have an relevance to the reader at this point or shall we remove it? Should it actually be a comment in the Kedro default_logging.yml file?

I think most readers at this point just want to know how to add logging to their code so won't need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is explaining the line log = logging.getLogger(__name__), here is how you define the name of a logger, and if you need configure it later what level of log you would like to see, you will do something like that. This is not kedro specific.

loggers:
  kedro:
    level: INFO
  name_of_my_logger:
    level: ERROR

If someone use a random name like log = logging.getLogger("random_name"), the above configuration will not have any effect. I think there is some value to educate user or provide a pointer to user, but we shouldn't try to explain too much about how Python logging work.

The name of a logger corresponds to a key in the `loggers` section in `logging.yml` (e.g. `kedro`). See [Python's logging documentation](https://docs.python.org/3/library/logging.html#logger-objects) for more information.
```

In addition to the `rich` handler defined in Kedro's framework, the [project-side `conf/base/logging.yml`](https://github.com/kedro-org/kedro/blob/main/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/conf/base/logging.yml) defines two further logging handlers:
* `console`: show logs on standard output (typically your terminal screen) without any rich formatting
* `info_file_handler`: write logs of level `INFO` and above to `info.log`
You can take advantage of rich's [console markup](https://rich.readthedocs.io/en/stable/markup.html) when enabled in your logging calls:
```python
log.error("[bold red blink]Important error message![/]", extra={"markup": True})
```

The logging handlers that are actually used by default are `rich` and `info_file_handler`.
### Show DEBUG level messages
To see your `DEBUG` level message, you can change the level of log messages that you wish to see in `logging.yml`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging.yml is no longer included by default, before referencing it in the docs maybe we should include some section/instructions on how to overwrite the default configuration, including where to keep the file and how to use the new env variable KEDRO_LOGGING_CONFIG


The project-side logging configuration also ensures that [logs emitted from your project's logger](#perform-logging-in-your-project) should be shown if they are `INFO` level or above (as opposed to the Python default of `WARNING`).
```yml
loggers:
kedro:
level: INFO

We now give some common examples of how you might like to change your project's logging configuration.
your_python_pacakge:
level: DEBUG # Change this to DEBUG
```

### Using `KEDRO_LOGGING_CONFIG` environment variable
By changing the level value to `DEBUG` for the desired logger (e.g., <your_python_package>), you will start seeing `DEBUG` level messages in the log output.

`KEDRO_LOGGING_CONFIG` is an optional environment variable that you can use to specify the path of your logging configuration file, overriding the default Kedro's `default_logging.yml`.
## Customise Logging

To use this environment variable, set it to the path of your desired logging configuration file before running any Kedro commands. For example, if you have a logging configuration file located at `/path/to/logging.yml`, you can set `KEDRO_LOGGING_CONFIG` as follows:
### Using `KEDRO_LOGGING_CONFIG` environment variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hierarchy of this section feels wrong because there's no other way to customise logging now.

"How to customise Kedro logging" is basically the same as "Using KEDRO_LOGGING_CONFIG environment variable", and then "Show DEBUG level messages" is really just another example of a sort of customisation you want to make. All the other examples are currently given under "Advanced logging". So I think we should reorganise this a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we remove the sub-title "Using KEDRO_LOGGING_CONFIG environment variable" and just say that you need to use it for customisation. I'd like to keep the "How to customise Kedro logging" section heading but see what you're saying about the logic of the hierarchy. I've suggested the changes needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have incorporate this comment partially. I still think it is valuable to put the DEBUG level example first, since this is a common thing to do compare to other niche advance use cases. I've reorder things a bit so it is more consistent.

In order to customise logging, you need to specify the path of your logging configuration file via setting the environment variable `KEDRO_LOGGING_CONFIG`, which overrides the default Kedro's `default_logging.yml`. We recommend to put your `logging.yml` inside the `conf` folder. For example, you can set `KEDRO_LOGGING_CONFIG` as follows:

```bash
export KEDRO_LOGGING_CONFIG=/path/to/logging.yml
export KEDRO_LOGGING_CONFIG=<project_root>/conf/logging.yml
```

After setting the environment variable, any subsequent Kedro commands will use the logging configuration file at the specified path.

```{note}
If the `KEDRO_LOGGING_CONFIG` environment variable is not set, Kedro will default to using the logging configuration file at the project's default location of Kedro's `default_logging.yml`.
```
### Disable file-based logging

You might sometimes need to disable file-based logging, e.g. if you are running Kedro on a read-only file system such as [Databricks Repos](https://docs.databricks.com/repos/index.html). The simplest way to do this is to delete your `conf/base/logging.yml` file. With no project-side logging configuration specified, Kedro uses the default framework-side logging configuration, which does not include any file-based handlers.

Alternatively, if you would like to keep other configuration in `conf/base/logging.yml` and just disable file-based logging, then you can remove the file-based handlers from the `root` logger as follows:
```diff
root:
- handlers: [console, info_file_handler]
+ handlers: [console]
# Advance Logging
In addition to the `rich` handler defined in Kedro's framework, we provide a `logging.yml` template that you can use to start with.

<details>
<summary><b>Click to expand the `logging.yml` template</b></summary>
```yml
version: 1

disable_existing_loggers: False

formatters:
simple:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: simple
stream: ext://sys.stdout

info_file_handler:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: simple
filename: info.log
maxBytes: 10485760 # 10MB
backupCount: 20
encoding: utf8
delay: True

rich:
class: kedro.logging.RichHandler
rich_tracebacks: True
# Advance options for customisation.
# See https://docs.kedro.org/en/stable/logging/logging.html#project-side-logging-configuration
# tracebacks_show_locals: False

loggers:
kedro:
level: INFO

{{ cookiecutter.python_package }}:
level: INFO

root:
handlers: [rich]
```
</details>

* `console`: show logs on standard output (typically your terminal screen) without any rich formatting
* `info_file_handler`: write logs of level `INFO` and above to `info.log`

The logging handlers that are actually used by default is `rich`.

The default logging configuration also ensures that [logs emitted from your project's logger](#perform-logging-in-your-project) should be shown if they are `INFO` level or above (as opposed to the Python default of `WARNING`).

### Customise the `rich` Handler
Now, let's provide some common examples of how you might like to change your project's logging configuration.

Kedro's `kedro.extras.logging.RichHandler` is a subclass of [`rich.logging.RichHandler`](https://rich.readthedocs.io/en/stable/reference/logging.html#rich.logging.RichHandler) and supports the same set of arguments. By default, `rich_tracebacks` is set to `True` to use `rich` to render exceptions. However, you can disable it by setting `rich_tracebacks: False`.
## Customise the `rich` Handler

Kedro's `kedro.logging.RichHandler` is a subclass of [`rich.logging.RichHandler`](https://rich.readthedocs.io/en/stable/reference/logging.html#rich.logging.RichHandler) and supports the same set of arguments. By default, `rich_tracebacks` is set to `True` to use `rich` to render exceptions. However, you can disable it by setting `rich_tracebacks: False`.

```{note}
If you want to disable `rich`'s tracebacks, you must set `KEDRO_LOGGING_CONFIG` to point to your local config i.e. `conf/base/logging.yml`.
If you want to disable `rich`'s tracebacks, you must set `KEDRO_LOGGING_CONFIG` to point to your local config i.e. `conf/logging.yml`.
```

When `rich_tracebacks` is set to `True`, the configuration is propagated to [`rich.traceback.install`](https://rich.readthedocs.io/en/stable/reference/traceback.html#rich.traceback.install). If an argument is compatible with `rich.traceback.install`, it will be passed to the traceback's settings.
Expand All @@ -57,25 +127,37 @@ For instance, you can enable the display of local variables inside `logging.yml`

```yaml
rich:
class: kedro.extras.logging.RichHandler
class: kedro.logging.RichHandler
rich_tracebacks: True
tracebacks_show_locals: True
```

A comprehensive list of available options can be found in the [RichHandler documentation](https://rich.readthedocs.io/en/stable/reference/logging.html#rich.logging.RichHandler).

## Enable file-based logging

File-based logging in Python projects aids troubleshooting and debugging. It offers better visibility into application's behavior and it's easy to search. However, it does not work well with read-only system such as [Databricks Repos](https://docs.databricks.com/repos/index.html).

To enable file-based logging, add `info_file_handler` in your `root` logger as follows in your `conf/logging.yml` as follow:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions out of date as conf/logging.yml will be removed

Copy link
Contributor Author

@noklam noklam Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a must but we recommend user to keep their loggging.yml in conf. Is it confusing to reference it as conf/logging.yml or should I just reference it as logging.yml? @AhdraMeraliQB

Copy link
Contributor

@antonymilne antonymilne Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting this file in conf is natural, but we should make it clear that users need to create that file themselves and that it doesn't already exist.

Edit: reading through the generated docs, I think this is clear already 👍

```diff
root:
- handlers: [rich]
+ handlers: [rich, info_file_handler]
```

By default it only tracks `INFO` level message, but it can be configured to capture any level of logs.

### Use plain console logging
## Use plain console logging

To use plain rather than rich logging, swap the `rich` handler for the `console` one as follows:

```diff
root:
- handlers: [rich, info_file_handler]
+ handlers: [console, info_file_handler]
- handlers: [rich]
+ handlers: [console]
```

### Rich logging in a dumb terminal
## Rich logging in a dumb terminal

Rich [detects whether your terminal is capable](https://rich.readthedocs.io/en/stable/console.html#terminal-detection) of displaying richly formatted messages. If your terminal is "dumb" then formatting is automatically stripped out so that the logs are just plain text. This is likely to happen if you perform `kedro run` on CI (e.g. GitHub Actions or CircleCI).

Expand All @@ -89,27 +171,6 @@ export COLUMNS=120 LINES=25
You must provide a value for both `COLUMNS` and `LINES` even if you only wish to change the width of the log message. Rich's default values for these variables are `COLUMNS=80` and `LINE=25`.
```

### Rich logging in Jupyter
## Rich logging in Jupyter

Rich also formats the logs in JupyterLab and Jupyter Notebook. The size of the output console does not adapt to your window but can be controlled through the `JUPYTER_COLUMNS` and `JUPYTER_LINES` environment variables. The default values (115 and 100 respectively) should be suitable for most users, but if you require a different output console size then you should alter the values of `JUPYTER_COLUMNS` and `JUPYTER_LINES`.

## Perform logging in your project

To perform logging in your own code (e.g. in a node), you are advised to do as follows:

```python
import logging

log = logging.getLogger(__name__)
log.warning("Issue warning")
log.info("Send information")
```

```{note}
The name of a logger corresponds to a key in the `loggers` section in `logging.yml` (e.g. `kedro`). See [Python's logging documentation](https://docs.python.org/3/library/logging.html#logger-objects) for more information.
```

You can take advantage of rich's [console markup](https://rich.readthedocs.io/en/stable/markup.html) when enabled in your logging calls:
```python
log.error("[bold red blink]Important error message![/]", extra={"markup": True})
```