Guideline for log message level #4751

jo-so · 2019-02-26T21:35:51Z

Description

I think it would be helpful to have a guideline that defines, which log level should be used for which kind of messages. I'm getting tons of messages in my system log on error or warning level where I can't do anything about it. This makes supervising of Synapse impossible, because it raises the flag too often.

Proposal

Error

Target: Service Admin
Reason: Bad events, where no fallback or auto-recovery is possible and manual action from the Admin is required.
Example:
- unhandled exception
- database connection failed
- invalid config setting

Warn

Target: Service Admin
Reason: Bad events, where fallback or auto-recovery exists, but the admin should care about, because they should not happen too often.
Example:
- database query failed due to lock contention and a rerun of the query happens

Notice

Target: Service Admin
Reason: Interesting actions which are helpful for reporting and examination of user error reports; syslog says »normal, but significant, condition«
Example:
- login/logout of users
- new rooms
- new accounts
- broken user requests

Info

Target: Developer, Reporting system, someone who analyses problems
Reason: informational message
Example:
- request … handled within … seconds
- save attachment to file …
- connecting to …
- federation_transaction_transmission_loop TimeoutError('',)
- federation_transaction_transmission_loop HttpResponseException("403: b'Forbidden'",)

Debug

Target: Developer
Reason: messages showing some interesting behaviour what the service is doing, but mostly useful for developers.
Example:
- Lock acquired/released
- Value of variables on some points
- Number of written bytes

neilisfragile · 2019-03-05T13:16:43Z

Thanks @jo-so at first glance this look very sensible, we've recently made changes to reduce the number of error level debug logs, so hopefully you should start to see some difference right away.

aaronraimist · 2019-04-16T00:02:27Z

I feel like every new homeowner complains about getting so many logs by default, specifically a lot of federation_transaction_transmission_loop from servers that are offline that they don't really care about. Should the default log level be changed from WARNING to ERROR for 1.0?

4nd3r · 2019-06-25T17:02:49Z

any sane admin is usually not interested running service with "too" verbose logging, because this can create noticeable I/O spikes and even exhaustion to the point that other processes wait in D-state and from there you can easily have cascading failure. YMMV.

I have no idea how many people actually run Synapse on Raspberry Pi with sdcard or similar low powered devices (which isn't good idea, imho), but considering how often some people ask about that in public rooms...

Synapse should have smarter error logging: only log errors and stacktraces by default and do not pollute log with warnings and informational messages which aren't useful when running production service.

if I get asked (again) "can you please make issue with logs", I have to turn loglevel up and wait before I get any meaningful information and same time generate loads of I/O. yes, I understand that having verbose logging makes stuff easier and of course I can forward logs, before even touching disk, to remote syslog, but that's not always an option.

@ara4n showed document yesterday about room complexity metric and displaying warning before joining resource hungy rooms. imho getting logging smarter by default helps a lot with this kind of "problems" too.

richvdh · 2019-06-25T22:44:47Z

if I get asked (again) "can you please make issue with logs", I have to turn loglevel up and wait

This is exactly why the default log level is INFO, not WARNING or ERROR. You can turn the level down if you want to save on disk space or I/O, but don't be surprised if we can't debug any problems without at least the INFO logs.

Certainly the situation isn't perfect at the moment. For instance, there is stuff that ends up logged at ERROR that shouldn't be, either because our error handling is poor, or that of twisted or some other library is. Feel free to open issues for any specific cases that you see (though please check for open issues first).

In the specific case of unreachable homeservers: the reason it's logged at WARNING is because we believe it is useful information if you can't send messages to your friend's server. I appreciate that, if you're in one of the big public rooms, the useful information gets drowned out by the hundreds of servers which have permanently gone away, but I think the solution to that is something like #5113.

I'm not quite sure what to do with this issue. We could add some guidelines to our coding style documentation, but I'm not entirely sure it would make any difference. We already apply some amount of thought to what we log at what level, and it's guided by our experience of what goes wrong and the information that we need to debug it. It's not clear how a document describing the difference between "warning" and "info" would improve this. Rather, I think it would be more instructive to discuss individual cases which are either useless or at the wrong log level.

erebion · 2022-12-16T14:04:16Z

@richvdh Please re-open.

On the current release (1.73.0) I still get around 1 GB of logs like this every day:

2022-12-16 14:14:05,426 - synapse.handlers.device - 1191 - WARNING - _maybe_retry_device_resync-572 - Failed to handle device list update for @user:host.invalid: 403: Forbidden
2022-12-16 14:14:05,591 - synapse.http.matrixfederationclient - 709 - WARNING - _maybe_retry_device_resync-572 - {GET-O-6088473} [ded.haus] Request failed: GET matrix://host.invalid/_matrix/federation/v1/user/devices/%40user%3Ahost.invalid: HttpResponseException('404: Not Found')

This should not be in the log as "WARNING".
Yesterday I had 1.2 GB of just messages like that.

richvdh · 2022-12-16T15:04:02Z

@richvdh Please re-open.

No. As I said before:

Certainly the situation isn't perfect at the moment. For instance, there is stuff that ends up logged at ERROR that shouldn't be, either because our error handling is poor, or that of twisted or some other library is. Feel free to open issues for any specific cases that you see (though please check for open issues first).

erebion · 2022-12-16T15:37:05Z

The log is not much use in its current form. Messages like these two should be INFO or DEBUG, but not WARN.

EDIT: Or maybe even in the Notices category.

neilisfragile added A-Logging Synapse's logs (structured or otherwise). Not metrics. z-p2 (Deprecated Label) labels Mar 5, 2019

spantaleev mentioned this issue Jul 1, 2019

Reduce journal logging spantaleev/matrix-docker-ansible-deploy#213

Closed

richvdh closed this as completed Jan 3, 2020

matrix-org locked as resolved and limited conversation to collaborators Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guideline for log message level #4751

Guideline for log message level #4751

jo-so commented Feb 26, 2019

neilisfragile commented Mar 5, 2019

aaronraimist commented Apr 16, 2019

4nd3r commented Jun 25, 2019 •

edited

Loading

richvdh commented Jun 25, 2019

erebion commented Dec 16, 2022 •

edited

Loading

richvdh commented Dec 16, 2022

erebion commented Dec 16, 2022 •

edited

Loading

Guideline for log message level #4751

Guideline for log message level #4751

Comments

jo-so commented Feb 26, 2019

Description

Proposal

Error

Warn

Notice

Info

Debug

neilisfragile commented Mar 5, 2019

aaronraimist commented Apr 16, 2019

4nd3r commented Jun 25, 2019 • edited Loading

richvdh commented Jun 25, 2019

erebion commented Dec 16, 2022 • edited Loading

richvdh commented Dec 16, 2022

erebion commented Dec 16, 2022 • edited Loading

4nd3r commented Jun 25, 2019 •

edited

Loading

erebion commented Dec 16, 2022 •

edited

Loading

erebion commented Dec 16, 2022 •

edited

Loading