Log categories #3241

rustyrussell · 2019-11-04T03:37:22Z

~~Based on top of #3228 so Travis doesn't lost its mind.~~
~~Rebased on top of #3245~~
[ On master now ]

This logging cleanup has the unfortunate side effect of potentially breaking log grep, but the long-term benefits of uniform logging I think outweigh that.

cdecker

Looks good to me overall, just some minor comments:

The commit message on the first commit has filenames swapped (lightningd-config.5.txt is replaced by lightningd-config.5.md, not the other way around)
e276f18 is a really nice cleanup 👍
e8d7843 this makes logs so much more manageable, multiline logs are hard to handle (maybe we should rework some of the crash dump formatting to indent and indicate continuations that way as well?)

cdecker · 2019-11-14T12:21:50Z

lightningd/log_status.c

+		/* If there's not already a node_id (global subdirs), they can
+		 * set it */
+		if (!node_id)
+			node_id = suggested_node_id;


Should this maybe result in a warning? A node-subdaemon pretending to be another node might indicate an error.

Agreed. Returning false here gets it properly logged as a malformed msg.

cdecker · 2019-11-14T12:26:23Z

gossipd/routing.c

+		status_peer_broken(peer ? &peer->id : NULL,
+				  "Unable to parse local_add_channel message: %s",
+				   tal_hex(msg, msg));


So when we local_add_channel triggered by a funding_locked, we will have the peer_id in the logs, while when loading from gossip_store we won't, right?

cdecker · 2019-11-14T12:47:48Z

lightningd/jsonrpc.c

@@ -903,7 +902,7 @@ static struct io_plan *jcon_connected(struct io_conn *conn,
 	list_head_init(&jcon->commands);

 	/* We want to log on destruction, so we free this in destructor. */
-	jcon->log = new_log(ld->log_book, ld->log_book, NULL, "jcon fd %i:",
+	jcon->log = new_log(ld->log_book, ld->log_book, NULL, "jsonrpc #%i",


Nice change, jcon was confusing some users 👍

cdecker · 2019-11-14T12:55:28Z

doc/lightningd-config.5.md

+The following subsystems exist for each channel, where N is an incrementing
+internal integer id assigned for the lifetime of the channel:
+* *openingd-chan #N*: Each opening / idling daemon
+* *channeld-chan #N*: Each channel management daemon
+* *closingd-chan #N*: Each closing negotiation daemon
+* *onchaind-chan #N*: Each onchain close handling daemon
+* *json #FD*: Each JSONRPC connection, FD = file descriptor number


I'm not sure this is really clear: do I have to specify --log-level="debug:openingd-chan #1" or can I set the log-level for all openingds by using the following --log-level="debug:openingd. The latter makes more sense, but the description seems to say the former.

From line 188 it seems that we just do a substring check. This is ok since otherwise we'd have to deal with # and spaces in a command line argument, however it also allows us to do some strange things like --log-level=debug:an and it'll set the level for plugin-manager, openingd-chan, channeld-chan, closingd-chan, and onchaind-chan which is a weird mix. Maybe we should just check for prefixes instead?

You don't want only prefixes, because what if you want to watch channel #1 (or, as I was thinking of adding later, a specific peerid?). You don't want to tie it to a particular daemon.

It also explicitly says:

If *SUBSYSTEM* is supplied, this sets the logging level for any subsystem containing that string.

I added the following paragraph:

So, **log-level=debug:plugin** would set debug level logging on all plugins and the plugin manager. **log-level=io:chan #55** would set IO logging on channel number 55 (or 550, for that matter).

cdecker · 2019-11-14T12:59:42Z

lightningd/log.c

+	/* Catch up, since before we were only printing BROKEN msgs */
+	list_for_each(&lr->log, l, list) {
+		if (l->level >= filter_level(lr, l->prefix))
+			log_to_file(l->prefix, l->level, l->node_id, false,
+				    &l->time, l->log,
+				    l->io, tal_bytelen(l->io), lr->outf);
+	}
+}


Could this result in duplicate BROKEN messages, once when they were originally added, and then during this catch up, or do we expect BROKEN messages to kill the process anyway, thus never reaching this point?

It always could; generally BROKEN messages are fatal though, as you anticipated.

I figured the duplication here is clearer, since now they'll be in context.

cdecker · 2019-11-14T13:21:36Z

lightningd/log.c


 		if (pseudorand(1000) > delete_threshold(i->level)) {
 			i->skipped += skipped;
 			skipped = 0;
+			/* Move down if necesary. */
+			log->log[dst++] = *i;


So this is effectively a merge-sort with itself, skipping deleted entries, that's very interesting 🙂

Well, it's cheap :)

rustyrussell · 2019-11-15T03:06:55Z

OK, changed appended (and first commit message thinko fixed, which unf. made it look like ALL NEW WORK!).

I don't think they need to be folded in, but feel free to do so if you prefer.

Replaced by doc/lightningd-config.5.md. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

We'll use this for logging it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

A log can have a default node_id, which can be overridden on a per-entry basis. This changes the format of logging, so some tests need rework. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

This is ignored in subdaemons which are per-peer, but very useful for multi-peer daemons like connectd and gossipd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

cdecker · 2019-11-17T16:20:31Z

ACK 15eaa22

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Simply better encapsulation. We still need to expose log_entry, since the notification hook uses it though. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

We had a separate logbook for each peer, and copy log entries above the printable log level into the master logbook. This didn't always work well, since we didn't dump it on crash for example. Keep a single global logbook instead, and remove this infrastructure. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

1. Printed form is always "[<nodeid>-]<prefix>: <string>" 2. "jcon fd %i" becomes "jsonrpc #%i". 3. "jsonrpc" log is only used once, and is removed. 4. "database" log prefix is use for db accesses. 5. "lightningd(%i)" becomes simply "lightningd" without the pid. 6. The "lightningd_" prefix is stripped from subd log prefixes, and pid removed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-changed: Logging: formatting made uniform: [NODEID-]SUBSYSTEM: MESSAGE Changelog-removed: `lightning_` prefixes removed from subdaemon names, including in listpeers `owner` field.

This allows finegrained logging control of particular subdaemons or subsystems. To do this, we defer setting the logging levels for each log object until after early argument parsing (since e.g. "bitcoind" log object is created early). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-changed: Options: log-level can now specify different levels for different subsystems.

This simplifies our tests, too, since we don't need a magic option to enable io logging in subdaemons. Note that test_bad_onion still takes too long, due to a separate minor bug, so that's marked and left dev-only for now. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Simple refcount FTW. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

1. Don't prune the last 10%. 2. Be more aggressive on pruning IO and DEBUG. 3. Account for skipped entries correctly across multiple prunes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

They added complexity, and were only used in a few places. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

The tal overhead of 5 pointers, the linked list node is 2; and we also tal'd the string. That's 96 bytes per entry. Use a simple array instead, though it means more work on deletion since each log_entry is no longer a tal object. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

We can definitely get a pong from l1 (should test be slow enough): it's l3 we are concerned about. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

@cdecker

Suggested-by: @cdecker Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

@cdecker

As suggested by @cdecker, but also did a bit of minor reformatting. I don't like the excessive indenting in our man pages though, but that's a separate problem. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Spaces just make life a little harder for everyone. (Plus, fix documentation: it's 'jsonrpc' not 'json' subsystem). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell · 2019-11-18T04:50:10Z

(Trivial rebase)

Ack 3831eb4

rustyrussell added the logging label Nov 4, 2019

rustyrussell added this to the 0.7.4 milestone Nov 4, 2019

rustyrussell requested review from cdecker and ZmnSCPxj as code owners November 4, 2019 03:37

rustyrussell force-pushed the guilt/log-categories branch 8 times, most recently from 6100225 to de53870 Compare November 10, 2019 23:43

rustyrussell force-pushed the guilt/log-categories branch 2 times, most recently from ca25527 to 94b646c Compare November 13, 2019 00:47

cdecker reviewed Nov 14, 2019

View reviewed changes

rustyrussell force-pushed the guilt/log-categories branch from 94b646c to 15eaa22 Compare November 15, 2019 03:06

rustyrussell added 4 commits November 17, 2019 22:09

doc/lightningd-config.5.txt: remove

380b85c

Replaced by doc/lightningd-config.5.md. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

lightningd: have optional node_id associated with subdaemons.

0024b6e

We'll use this for logging it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

lightningd: have logging include an optional node_id for each entry.

c76db47

A log can have a default node_id, which can be overridden on a per-entry basis. This changes the format of logging, so some tests need rework. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

common: allow subdaemons to specify the node_id in status messages.

1d6b502

This is ignored in subdaemons which are per-peer, but very useful for multi-peer daemons like connectd and gossipd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

cdecker force-pushed the guilt/log-categories branch from 15eaa22 to d0a57c2 Compare November 17, 2019 16:21

rustyrussell added 7 commits November 18, 2019 10:56

connectd, gossipd: use per-peer logging.

4be82e4

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

lightningd: move log structs into log.c.

24722c9

Simply better encapsulation. We still need to expose log_entry, since the notification hook uses it though. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

lightningd: avoid keeping multiple copies of nodeid.

4ad3998

Simple refcount FTW. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell added 7 commits November 18, 2019 10:57

lightningd: perform better log pruning.

7bfadd2

1. Don't prune the last 10%. 2. Be more aggressive on pruning IO and DEBUG. 3. Account for skipped entries correctly across multiple prunes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

lightningd: remove log_add functions.

5b66c6d

They added complexity, and were only used in a few places. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

pytest: deflake test_htlc_send_timeout

c836171

We can definitely get a pong from l1 (should test be slow enough): it's l3 we are concerned about. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

lightningd: kill per-peer daemons if they claim a different peer id.

f9b20ce

Suggested-by: @cdecker Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

logging: add examples for logging.

52f3719

As suggested by @cdecker, but also did a bit of minor reformatting. I don't like the excessive indenting in our man pages though, but that's a separate problem. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

logging: remove spaces from subsystem names.

3831eb4

Spaces just make life a little harder for everyone. (Plus, fix documentation: it's 'jsonrpc' not 'json' subsystem). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell force-pushed the guilt/log-categories branch from d0a57c2 to 3831eb4 Compare November 18, 2019 00:28

rustyrussell merged commit 709c98f into ElementsProject:master Nov 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log categories #3241

Log categories #3241

rustyrussell commented Nov 4, 2019 •

edited

Loading

cdecker left a comment

cdecker Nov 14, 2019

rustyrussell Nov 14, 2019 •

edited

Loading

cdecker Nov 14, 2019

rustyrussell Nov 14, 2019

cdecker Nov 14, 2019

cdecker Nov 14, 2019

cdecker Nov 14, 2019

rustyrussell Nov 15, 2019

rustyrussell Nov 15, 2019

cdecker Nov 14, 2019

rustyrussell Nov 15, 2019

cdecker Nov 14, 2019

rustyrussell Nov 15, 2019

rustyrussell commented Nov 15, 2019

cdecker commented Nov 17, 2019

rustyrussell commented Nov 18, 2019

Log categories #3241

Log categories #3241

Conversation

rustyrussell commented Nov 4, 2019 • edited Loading

cdecker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rustyrussell Nov 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rustyrussell commented Nov 15, 2019

cdecker commented Nov 17, 2019

rustyrussell commented Nov 18, 2019

rustyrussell commented Nov 4, 2019 •

edited

Loading

rustyrussell Nov 14, 2019 •

edited

Loading