Skip to content
This repository has been archived by the owner on May 16, 2024. It is now read-only.

How to handle multi process logging in Python? #7

Open
scarecrow1123 opened this issue Sep 26, 2019 · 0 comments
Open

How to handle multi process logging in Python? #7

scarecrow1123 opened this issue Sep 26, 2019 · 0 comments

Comments

@scarecrow1123
Copy link
Owner

scarecrow1123 commented Sep 26, 2019

Python's logging module provides a list of super useful handlers to handle/redirect log messages to required target destinations. For instance FileHandler sends the messages to a file, DatagramHandler sends to UDP ports, etc. In a multi process setup however, using FileHandler to redirect logs to the same file from different processes would only corrupt the log file. Explicitly acquiring locks to the same file is a bad thing to do.

An out-of-the-box way is to use this package called multiprocess-logging. This pip package implements a custom log handler. The handler receives all the messages, puts them in an internal queue and emits by dequeuing messages from the queue.

Aggregating the log messages to a queue is the only way to handle this scenario. But there is a graceful alternative way to achieve this. Python's logging module provides built in mechanisms to handle queue based logging with QueueHandler and QueueListener classes.

Similar to how FileHandler provides a way to do file based logging, QueueHandler helps to log to a queue object. This queue can be from the queue module or multiprocessing.Queue.

import logging
from logging.handlers import QueueHandler
from queue import SimpleQueue


logging.warning("Test this warning in the console")


q = SimpleQueue()
queue_handler = QueueHandler(q)

root_logger = logging.getLogger()

# remove the StreamHandler instance that is set by default
root_logger.handlers.pop()

# add queue handler
# this will stop the log messages from printing in the console
root_logger.addHandler(queue_handler)

logging.warning("Test this warning in the queue")

print(q.get())

This prints:

WARNING:root:Test this warning in the console
<LogRecord: root, 30, test.py, 14, "Test this warning in the queue">

Note the second printed line is only because of the print function. The second logging.warning function has consumed the message and put it in the q object.

QueueListener is used along with the QueueHandler to collect messages from the queue and push them to various other targets such as a file, UDP port, etc.

...
file_handler = FileHandler("out.log")
listener = QueueListener(q, file_handler)
listener.start()
...

The listener looks for messages in q and pushes them to file_handler which writes to out.log. Let's use the same flow in a multi process setup.

Assume a master process that spawns 4 children and all log messages are to be written to out.log file. Any logger message from the children will be written to a multiprocessing.Queue. The children need to initialize a QueueHandler and attach it to their root logger as seen above to achieve this. The same queue is listened at the master process's QueueListener.

Here's how the master process's logging is setup:

def setup_primary_logging():
    log_queue = mp.Queue(-1)

    # Handlers for stream/file logging
    output_file_log_handler = logging.FileHandler(filename="out.log")
    formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
    output_file_log_handler.setFormatter(formatter)
    output_file_log_handler.setLevel(logging.INFO)

    # This listener listens to the `log_queue` and pushes the messages to the list of
    # handlers specified.
    listener = QueueListener(log_queue, output_file_log_handler, error_file_log_handler, respect_handler_level=True)
    listener.start()

    return log_queue

And the worker's logging setup:

def setup_worker_logging(log_queue):
    queue_handler = QueueHandler(log_queue)
    queue_handler.setLevel(logging.INFO)

    root_logger = logging.getLogger()
    root_logger.addHandler(queue_handler)

    # Default logger level is WARNING, hence the change. Otherwise, any worker logs
    # are not going to get bubbled up to the parent's logger handlers from where the
    # actual logs are written to the output
    root_logger.setLevel(logging.INFO)

Note that log_queue object has to be sent from the master to all the children. So the master process looks something like this:

def worker_fn(log_queue):
    setup_worker_logging(log_queue)


def master_fn():
     log_queue = setup_primary_logging()
     mp.spawn(worker_fn, args=(log_queue,), n_procs=4)

Any messages from the worker will be logged to out.log file with the above setup. A complete working example can be seen in this gist.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant