Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log categories #3241

Merged
merged 18 commits into from
Nov 18, 2019
Merged

Conversation

rustyrussell
Copy link
Contributor

@rustyrussell rustyrussell commented Nov 4, 2019

Based on top of #3228 so Travis doesn't lost its mind.
Rebased on top of #3245
[ On master now ]

This logging cleanup has the unfortunate side effect of potentially breaking log grep, but the long-term benefits of uniform logging I think outweigh that.

@rustyrussell rustyrussell added this to the 0.7.4 milestone Nov 4, 2019
@rustyrussell rustyrussell force-pushed the guilt/log-categories branch 8 times, most recently from 6100225 to de53870 Compare November 10, 2019 23:43
@rustyrussell rustyrussell force-pushed the guilt/log-categories branch 2 times, most recently from ca25527 to 94b646c Compare November 13, 2019 00:47
Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, just some minor comments:

  • The commit message on the first commit has filenames swapped (lightningd-config.5.txt is replaced by lightningd-config.5.md, not the other way around)
  • e276f18 is a really nice cleanup 👍
  • e8d7843 this makes logs so much more manageable, multiline logs are hard to handle (maybe we should rework some of the crash dump formatting to indent and indicate continuations that way as well?)

Comment on lines +15 to +18
/* If there's not already a node_id (global subdirs), they can
* set it */
if (!node_id)
node_id = suggested_node_id;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maybe result in a warning? A node-subdaemon pretending to be another node might indicate an error.

Copy link
Contributor Author

@rustyrussell rustyrussell Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Returning false here gets it properly logged as a malformed msg.

Comment on lines +2905 to +2906
status_peer_broken(peer ? &peer->id : NULL,
"Unable to parse local_add_channel message: %s",
tal_hex(msg, msg));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when we local_add_channel triggered by a funding_locked, we will have the peer_id in the logs, while when loading from gossip_store we won't, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

@@ -903,7 +902,7 @@ static struct io_plan *jcon_connected(struct io_conn *conn,
list_head_init(&jcon->commands);

/* We want to log on destruction, so we free this in destructor. */
jcon->log = new_log(ld->log_book, ld->log_book, NULL, "jcon fd %i:",
jcon->log = new_log(ld->log_book, ld->log_book, NULL, "jsonrpc #%i",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change, jcon was confusing some users 👍

Comment on lines 121 to 127
The following subsystems exist for each channel, where N is an incrementing
internal integer id assigned for the lifetime of the channel:
* *openingd-chan #N*: Each opening / idling daemon
* *channeld-chan #N*: Each channel management daemon
* *closingd-chan #N*: Each closing negotiation daemon
* *onchaind-chan #N*: Each onchain close handling daemon
* *json #FD*: Each JSONRPC connection, FD = file descriptor number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is really clear: do I have to specify --log-level="debug:openingd-chan #1" or can I set the log-level for all openingds by using the following --log-level="debug:openingd. The latter makes more sense, but the description seems to say the former.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From line 188 it seems that we just do a substring check. This is ok since otherwise we'd have to deal with # and spaces in a command line argument, however it also allows us to do some strange things like --log-level=debug:an and it'll set the level for plugin-manager, openingd-chan, channeld-chan, closingd-chan, and onchaind-chan which is a weird mix. Maybe we should just check for prefixes instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want only prefixes, because what if you want to watch channel #1 (or, as I was thinking of adding later, a specific peerid?). You don't want to tie it to a particular daemon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also explicitly says:

If *SUBSYSTEM* is supplied, this sets the logging level for any subsystem containing that string.

I added the following paragraph:

So, **log-level=debug:plugin** would set debug level logging on all
plugins and the plugin manager.  **log-level=io:chan #55** would set
IO logging on channel number 55 (or 550, for that matter).

lightningd/log.c Outdated
Comment on lines 644 to 737
/* Catch up, since before we were only printing BROKEN msgs */
list_for_each(&lr->log, l, list) {
if (l->level >= filter_level(lr, l->prefix))
log_to_file(l->prefix, l->level, l->node_id, false,
&l->time, l->log,
l->io, tal_bytelen(l->io), lr->outf);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this result in duplicate BROKEN messages, once when they were originally added, and then during this catch up, or do we expect BROKEN messages to kill the process anyway, thus never reaching this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always could; generally BROKEN messages are fatal though, as you anticipated.

I figured the duplication here is clearer, since now they'll be in context.


if (pseudorand(1000) > delete_threshold(i->level)) {
i->skipped += skipped;
skipped = 0;
/* Move down if necesary. */
log->log[dst++] = *i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is effectively a merge-sort with itself, skipping deleted entries, that's very interesting 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's cheap :)

@rustyrussell
Copy link
Contributor Author

OK, changed appended (and first commit message thinko fixed, which unf. made it look like ALL NEW WORK!).

I don't think they need to be folded in, but feel free to do so if you prefer.

Replaced by doc/lightningd-config.5.md.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We'll use this for logging it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A log can have a default node_id, which can be overridden on a per-entry
basis.  This changes the format of logging, so some tests need rework.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is ignored in subdaemons which are per-peer, but very useful for
multi-peer daemons like connectd and gossipd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@cdecker
Copy link
Member

cdecker commented Nov 17, 2019

ACK 15eaa22

@cdecker cdecker force-pushed the guilt/log-categories branch from 15eaa22 to d0a57c2 Compare November 17, 2019 16:21
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simply better encapsulation.   We still need to expose log_entry, since the
notification hook uses it though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had a separate logbook for each peer, and copy log entries above
the printable log level into the master logbook.  This didn't always
work well, since we didn't dump it on crash for example.

Keep a single global logbook instead, and remove this infrastructure.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Printed form is always "[<nodeid>-]<prefix>: <string>"
2. "jcon fd %i" becomes "jsonrpc #%i".
3. "jsonrpc" log is only used once, and is removed.
4. "database" log prefix is use for db accesses.
5. "lightningd(%i)" becomes simply "lightningd" without the pid.
6. The "lightningd_" prefix is stripped from subd log prefixes, and pid removed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-changed: Logging: formatting made uniform: [NODEID-]SUBSYSTEM: MESSAGE
Changelog-removed: `lightning_` prefixes removed from subdaemon names, including in listpeers `owner` field.
This allows finegrained logging control of particular subdaemons or
subsystems.

To do this, we defer setting the logging levels for each log object
until after early argument parsing (since e.g. "bitcoind" log object
is created early).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-changed: Options: log-level can now specify different levels for different subsystems.
This simplifies our tests, too, since we don't need a magic option to
enable io logging in subdaemons.

Note that test_bad_onion still takes too long, due to a separate minor
bug, so that's marked and left dev-only for now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simple refcount FTW.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Don't prune the last 10%.
2. Be more aggressive on pruning IO and DEBUG.
3. Account for skipped entries correctly across multiple prunes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They added complexity, and were only used in a few places.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The tal overhead of 5 pointers, the linked list node is 2; and we also
tal'd the string.  That's 96 bytes per entry.

Use a simple array instead, though it means more work on deletion
since each log_entry is no longer a tal object.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can definitely get a pong from l1 (should test be slow enough):
it's l3 we are concerned about.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As suggested by @cdecker, but also did a bit of minor reformatting.

I don't like the excessive indenting in our man pages though,
but that's a separate problem.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Spaces just make life a little harder for everyone.

(Plus, fix documentation: it's 'jsonrpc' not 'json' subsystem).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell
Copy link
Contributor Author

(Trivial rebase)

Ack 3831eb4

@rustyrussell rustyrussell merged commit 709c98f into ElementsProject:master Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants