Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log emit performance degradation in RotatingFileHandlers due to filesystem checks #90365

Closed
dfritz mannequin opened this issue Dec 30, 2021 · 3 comments
Closed

Log emit performance degradation in RotatingFileHandlers due to filesystem checks #90365

dfritz mannequin opened this issue Dec 30, 2021 · 3 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@dfritz
Copy link
Mannequin

dfritz mannequin commented Dec 30, 2021

BPO 46207
Nosy @vsajip
Files
  • logtest.py: Basic repro script
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-12-30.19:31:56.919>
    labels = ['3.11', 'library', '3.9', '3.10', 'performance']
    title = 'Log emit performance degradation in RotatingFileHandlers due to filesystem checks'
    updated_at = <Date 2021-12-30.21:34:08.125>
    user = 'https://bugs.python.org/dfritz'

    bugs.python.org fields:

    activity = <Date 2021-12-30.21:34:08.125>
    actor = 'zach.ware'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2021-12-30.19:31:56.919>
    creator = 'dfritz'
    dependencies = []
    files = ['50529']
    hgrepos = []
    issue_num = 46207
    keywords = []
    message_count = 1.0
    messages = ['409381']
    nosy_count = 2.0
    nosy_names = ['vinay.sajip', 'dfritz']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue46207'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @dfritz
    Copy link
    Mannequin Author

    dfritz mannequin commented Dec 30, 2021

    I believe this also impacts 3.10 and 3.11 based on the original bpo-45401 which led to this change. Prior to commit ac421c3 in the 3.9 branch there were no additional os.path checks in the shouldRollover() methods of RotatingFileHandler and TimedRotatingFileHandler, which led to maximum performance regardless of where log files existed. With the new "os.path.exists(self.baseFilename) and not os.path.isfile(self.baseFilename)" checks added to these functions, a varying performance degradation is seen if the log lives on a filesystem that is not a local disk. It is hard for me to strictly call this a "regression", since I understand the reason of the original change (only rollover regular files), but I feel this is something developers should be aware of, or possibly have a way to disable (without making their own subclass) since this cost was not incurred before.

    For example, let's say we have user home folders as mounts via S3FS, and each user has a log stored in their home folders. Since there are now os.path checks on every single emit call, performance of logging is now directly impacted by networking latency if using one of these native logging handlers. This came as quite a surprise when upgrading to 3.9.8+, when operations that were previously taking ~2 seconds were now taking ~2 1/2 minutes. This is reproducible (once you have a similar external filesystem mounted) using the following script (also attached):

    # vi logtest.py
    import logging
    import logging.handlers
    
    logging.basicConfig(filename="test.log", level=logging.INFO)
    logger = logging.getLogger("test_logger")
    log_handler = logging.handlers.TimedRotatingFileHandler("test.log", when='D', backupCount=2)
    logger.addHandler(log_handler)
    
    for i in range(0, 10000):
        logger.info("iteration: %d", i)

    As seen between the commits, cost spikes due to the new calls (double the amount of log emit calls):

    $ time /build_397/bin/python3 -m cProfile logtest.py |awk '$5>0.00'
    ...[truncated output to highest percall items]...
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      12/1    0.000    0.000    0.549    0.549 {built-in method builtins.exec} # 3.9.7_d57d33c234 (no posix.stat calls)
      12/1    0.000    0.000   70.125   70.125 {built-in method builtins.exec} # 3.9.7_ac421c348b
     20100   68.972    0.003   68.972    0.003 {built-in method posix.stat}    # 3.9.7_ac421c348b

    This performance degradation is further compounded if for example you have logging being performed inside threads, where locks have to be obtained. This is actually how I noticed it originally, because I was attempting to use the cProfile module to find what was taking so long. I ended up testing several python versions to make sure I was not crazy, before identifying the exact commit where the change in performance occurred.

    ncalls tottime percall cumtime percall filename:lineno(function)
    67 0.661 0.010 0.661 0.010 {method 'acquire' of '_thread.lock' objects} # 3.7.8
    67 1.237 0.018 1.237 0.018 {method 'acquire' of '_thread.lock' objects} # 3.9.0
    67 1.293 0.019 1.293 0.019 {method 'acquire' of '_thread.lock' objects} # 3.9.4
    67 0.756 0.011 0.756 0.011 {method 'acquire' of '_thread.lock' objects} # 3.9.6
    67 1.218 0.018 1.218 0.018 {method 'acquire' of '_thread.lock' objects} # 3.9.7
    67 152.411 2.275 152.411 2.275 {method 'acquire' of '_thread.lock' objects} # 3.9.8
    67 152.382 2.274 152.382 2.274 {method 'acquire' of '_thread.lock' objects} # 3.9.9

    Ultimately I do not have a recommendation for a native fix that keeps both the improvement from the commit, and the performance from before the commit. Perhaps caching the results of the checks until there is a rollover? Only perform the check on initial handler creation? Otherwise it appears the only solution is to subclass and override the shouldRollover() to not include the os.path checks and match <=3.9.7 behavior. This is the solution I am implementing in the codebase/environment where this was discovered while this report is discussed further. This feels less than ideal, but if that is the consensus then hopefully this report can help the next person to come across it. Thank you for your time and all the hard work put into Python!

    @dfritz dfritz mannequin added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage labels Dec 30, 2021
    @zware zware added stdlib Python modules in the Lib dir labels Dec 30, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @vsajip
    Copy link
    Member

    vsajip commented Aug 26, 2022

    This should be fixed in #96159.

    @vsajip vsajip closed this as completed Aug 26, 2022
    @vsajip vsajip moved this to Done in Logging issues 🪵 Aug 26, 2022
    @serhiy-storchaka
    Copy link
    Member

    While the example uses TimedRotatingFileHandler, the title and the report text mention RotatingFileHandler for which the issue is still not fixed. The solution for RotatingFileHandler was provided in the duplicate #105623 issue.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    Development

    No branches or pull requests

    3 participants