Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Guideline for log message level #4751

Closed
jo-so opened this issue Feb 26, 2019 · 7 comments
Closed

Guideline for log message level #4751

jo-so opened this issue Feb 26, 2019 · 7 comments
Labels
A-Logging Synapse's logs (structured or otherwise). Not metrics. z-p2 (Deprecated Label)

Comments

@jo-so
Copy link

jo-so commented Feb 26, 2019

Description

I think it would be helpful to have a guideline that defines, which log level should be used for which kind of messages. I'm getting tons of messages in my system log on error or warning level where I can't do anything about it. This makes supervising of Synapse impossible, because it raises the flag too often.

Proposal

Error

  • Target: Service Admin
  • Reason: Bad events, where no fallback or auto-recovery is possible and manual action from the Admin is required.
  • Example:
    • unhandled exception
    • database connection failed
    • invalid config setting

Warn

  • Target: Service Admin
  • Reason: Bad events, where fallback or auto-recovery exists, but the admin should care about, because they should not happen too often.
  • Example:
    • database query failed due to lock contention and a rerun of the query happens

Notice

  • Target: Service Admin
  • Reason: Interesting actions which are helpful for reporting and examination of user error reports; syslog says »normal, but significant, condition«
  • Example:
    • login/logout of users
    • new rooms
    • new accounts
    • broken user requests

Info

  • Target: Developer, Reporting system, someone who analyses problems
  • Reason: informational message
  • Example:
    • request … handled within … seconds
    • save attachment to file …
    • connecting to …
    • federation_transaction_transmission_loop TimeoutError('',)
    • federation_transaction_transmission_loop HttpResponseException("403: b'Forbidden'",)

Debug

  • Target: Developer
  • Reason: messages showing some interesting behaviour what the service is doing, but mostly useful for developers.
  • Example:
    • Lock acquired/released
    • Value of variables on some points
    • Number of written bytes
@neilisfragile
Copy link
Contributor

Thanks @jo-so at first glance this look very sensible, we've recently made changes to reduce the number of error level debug logs, so hopefully you should start to see some difference right away.

@neilisfragile neilisfragile added A-Logging Synapse's logs (structured or otherwise). Not metrics. z-p2 (Deprecated Label) labels Mar 5, 2019
@aaronraimist
Copy link
Contributor

I feel like every new homeowner complains about getting so many logs by default, specifically a lot of federation_transaction_transmission_loop from servers that are offline that they don't really care about. Should the default log level be changed from WARNING to ERROR for 1.0?

@4nd3r
Copy link
Contributor

4nd3r commented Jun 25, 2019

any sane admin is usually not interested running service with "too" verbose logging, because this can create noticeable I/O spikes and even exhaustion to the point that other processes wait in D-state and from there you can easily have cascading failure. YMMV.

I have no idea how many people actually run Synapse on Raspberry Pi with sdcard or similar low powered devices (which isn't good idea, imho), but considering how often some people ask about that in public rooms...

Synapse should have smarter error logging: only log errors and stacktraces by default and do not pollute log with warnings and informational messages which aren't useful when running production service.

if I get asked (again) "can you please make issue with logs", I have to turn loglevel up and wait before I get any meaningful information and same time generate loads of I/O. yes, I understand that having verbose logging makes stuff easier and of course I can forward logs, before even touching disk, to remote syslog, but that's not always an option.

@ara4n showed document yesterday about room complexity metric and displaying warning before joining resource hungy rooms. imho getting logging smarter by default helps a lot with this kind of "problems" too.

@richvdh
Copy link
Member

richvdh commented Jun 25, 2019

if I get asked (again) "can you please make issue with logs", I have to turn loglevel up and wait

This is exactly why the default log level is INFO, not WARNING or ERROR. You can turn the level down if you want to save on disk space or I/O, but don't be surprised if we can't debug any problems without at least the INFO logs.

Certainly the situation isn't perfect at the moment. For instance, there is stuff that ends up logged at ERROR that shouldn't be, either because our error handling is poor, or that of twisted or some other library is. Feel free to open issues for any specific cases that you see (though please check for open issues first).

In the specific case of unreachable homeservers: the reason it's logged at WARNING is because we believe it is useful information if you can't send messages to your friend's server. I appreciate that, if you're in one of the big public rooms, the useful information gets drowned out by the hundreds of servers which have permanently gone away, but I think the solution to that is something like #5113.

I'm not quite sure what to do with this issue. We could add some guidelines to our coding style documentation, but I'm not entirely sure it would make any difference. We already apply some amount of thought to what we log at what level, and it's guided by our experience of what goes wrong and the information that we need to debug it. It's not clear how a document describing the difference between "warning" and "info" would improve this. Rather, I think it would be more instructive to discuss individual cases which are either useless or at the wrong log level.

@erebion
Copy link

erebion commented Dec 16, 2022

@richvdh Please re-open.

On the current release (1.73.0) I still get around 1 GB of logs like this every day:

2022-12-16 14:14:05,426 - synapse.handlers.device - 1191 - WARNING - _maybe_retry_device_resync-572 - Failed to handle device list update for @user:host.invalid: 403: Forbidden
2022-12-16 14:14:05,591 - synapse.http.matrixfederationclient - 709 - WARNING - _maybe_retry_device_resync-572 - {GET-O-6088473} [ded.haus] Request failed: GET matrix://host.invalid/_matrix/federation/v1/user/devices/%40user%3Ahost.invalid: HttpResponseException('404: Not Found')

This should not be in the log as "WARNING".
Yesterday I had 1.2 GB of just messages like that.

@richvdh
Copy link
Member

richvdh commented Dec 16, 2022

@richvdh Please re-open.

No. As I said before:

Certainly the situation isn't perfect at the moment. For instance, there is stuff that ends up logged at ERROR that shouldn't be, either because our error handling is poor, or that of twisted or some other library is. Feel free to open issues for any specific cases that you see (though please check for open issues first).

@erebion
Copy link

erebion commented Dec 16, 2022

The log is not much use in its current form. Messages like these two should be INFO or DEBUG, but not WARN.

EDIT: Or maybe even in the Notices category.

@matrix-org matrix-org locked as resolved and limited conversation to collaborators Dec 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Logging Synapse's logs (structured or otherwise). Not metrics. z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

6 participants