Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add "slog": swingset logging for replay-based debugging #1535

Closed
warner opened this issue Aug 17, 2020 · 3 comments
Closed

add "slog": swingset logging for replay-based debugging #1535

warner opened this issue Aug 17, 2020 · 3 comments
Labels
enhancement New feature or request logging

Comments

@warner
Copy link
Member

warner commented Aug 17, 2020

What is the Problem Being Solved?

One feature described in our #1318 survey is the ability to fetch historical turn data from one swingset (i.e. the chain) and rebuild one vat locally, where regular JS debugging tools can be applied. The same data can be used to simply visualize the sequence of operations in a swingset kernel. I'm currently working on new message-pattern tests (for #1404 and #1534) which trigger a dozen-ish vat messages across two kernels, and it would be tremendously useful if I could run one command and dump the sequence of events, with some support for tracing a single object/promise reference across the various vats and comms layers involved.

Description of the Design

My working title is the slog (short for Swingset Log, but of course it also describes the debugging process you're probably going through). The core is an append-only ordered set of records, notionally stored in a SQLite database (but I haven't really committed to anything on the IO end yet).

Each crank gets a pair of records, describing the start (was it a delivery or a notify? which vat was the recipient? message target, method name, arguments, result promise ID) and end (delivery consequences: was the vat killed? metering fault?) of the crank. Within the crank, each syscall made by the vat gets a pair of records (syscall name, arguments, result). Also within the crank, any calls to console.log will have their contents recorded.

All of these records are correlated, so an interactive display tool can tell you which delivery caused the vat to emit a given console.log message. Or it can show you all the deliveries that arrived at a given vat, and then expand that into a list of syscalls and log messages. Also, a message sent by one vat in a syscall can be correlated with the delivery made to the target vat on some later crank.

I also plan to correlate the object/promise IDs across the different vats, and the kernel. If you mouse over o+3 in one vat, any kernel log which mentions the matching ko21 should be highlighted, as well as any other vat deliveries that include it, along with the recipient vat-local o-7 which the c-list matches.

The kernel will have an object named kernelSlog to which the deliveries can be added. This will return a vatDeliverySlog object, which will be added to the vatManager.deliver call (making it vatManager.deliver(vatDeliveryObject, vatDeliverySlog), as well as a finish() function that the kernel should call with the delivery results. The vatDeliverySlog will have methods for recording syscalls, as well as console log messages. Each vatManager needs to arrange for the console endowment they provide to vats to send a copy of their messages to this slog function, as well as to the normal console.

Security Considerations

For now I plan to just write these log records to a local file, so any attacker in a position to read the swingset's secrets would also be able to just read the entire kernelDB. This is irrelevant for a transparent public chain, but private swingsets should be able to keep secrets, and the slog should not present a vector for confidentiality loss.

But in the future, I want a way to point a tool at a running process and extract the records remotely. Back in Tahoe-LAFS I used Foolscap and a webkey-style cap-as-password URL for this. I think I'll need to develop a new networking layer for our purposes, maybe just websockets with an access token for now (only safe when run over the loopback/localhost interface). But eventually I'd like to do the crypto properly, and that might overlap a lot with the encrypted protocol we use for solo-to-solo VatTP communication.

Test Plan

I don't intend to add a lot of unit tests for this right now. As the benefit is for debugging, I'll be content with knowing that the new code doesn't make anything crash. Later I'll add some basic tests to run a simple vat and make sure the emitted events contain the data I expect.

I'll probably put the analysis/dumping tools in a misc/ directory to start, but then they'll move to a separate package. I'm planning on a web-based UI for the cross-vat tracing, which might eventually want some modern web frameworks (D3 might be useful/appropriate), and I wouldn't want to add those dependencies to swingset itself. I'll probably do the bare minimum of automated testing on these tools (partially because I don't have a lot of experience with those sorts of tests).

@warner warner added the enhancement New feature or request label Aug 17, 2020
@FUDCo
Copy link
Contributor

FUDCo commented Aug 17, 2020

The debugging pattern described here is more or less the same process I go through manually now, running the kernel in verbose mode and capturing the results to a log file, then massaging the log file in emacs to boil out the information that's relevant to the problem I'm trying to diagnose. In more extreme cases, taking a series of dumps of the kernel database (particularly the transcript) has been helpful, particularly when I was working on the retirement of settled promises. However, the fact that I've been able to do this at all suggests that a lot of the underlying machinery we need for this is already in place, and that much of the work will be in systematizing and codifying things (and, of course, probably a lot of effort on the UI side). Having more automation here would be a tremendous time savings. This seems a like a very good application of the "observe what you already do the hard way, then automate that" principle.

warner added a commit that referenced this issue Aug 18, 2020
This adds a set of logging hooks which record swingset deliveries (the kernel
sending messages and resolution notifications into vats), syscalls (from the
vat into the kernel), and console messages. All these events can be
correlated by their crank number, vatID, delivery number, and syscall number.

The output function is not yet implemented; my notional plan is to log to
lines of JSON for now, and maybe move to a SQLite based store for later. I've
not written any analysis tools yet either.

I made a few refactorings to make things smoother:

* vats get their console object from `managerOptions.vatConsole`, rather than
  having the manager build it themselves, which makes it easier to provide
  one that writes to both the regular console and the slog
* as a result, all vats get the same basic `vatEndowments` (with the console
  added on later), rather than calling `makeVatEndowments(vatID)`
* in kernel.js, `deliverToVat` and `processNotify` had their common code
  factored out into `deliverAndLogToVat()`, which now writes to the slog
  in addition to actually performing the `manager.deliver`. The names aren't
  great and want to be changed at some point.
* `buildVatSyscallHandler` function was moved out of `addVatManager` to make
  things easier to read. The new function writes syscalls to the slog.
* console messages written by vats in top-level code, or during the execution
  of `buildRootObject`, are slogged in the "startup" phase. All other calls
  to `console.log` and friends are recorded in a "delivery" phase.

refs #1535
warner added a commit that referenced this issue Aug 18, 2020
This adds a set of logging hooks which record swingset deliveries (the kernel
sending messages and resolution notifications into vats), syscalls (from the
vat into the kernel), and console messages. All these events can be
correlated by their crank number, vatID, delivery number, and syscall number.

The output function is not yet implemented; my notional plan is to log to
lines of JSON for now, and maybe move to a SQLite based store for later. I've
not written any analysis tools yet either.

I made a few refactorings to make things smoother:

* vats get their console object from `managerOptions.vatConsole`, rather than
  having the manager build it themselves, which makes it easier to provide
  one that writes to both the regular console and the slog
* as a result, all vats get the same basic `vatEndowments` (with the console
  added on later), rather than calling `makeVatEndowments(vatID)`
* in kernel.js, `deliverToVat` and `processNotify` had their common code
  factored out into `deliverAndLogToVat()`, which now writes to the slog
  in addition to actually performing the `manager.deliver`. The names aren't
  great and want to be changed at some point.
* `buildVatSyscallHandler` function was moved out of `addVatManager` to make
  things easier to read. The new function writes syscalls to the slog.
* console messages written by vats in top-level code, or during the execution
  of `buildRootObject`, are slogged in the "startup" phase. All other calls
  to `console.log` and friends are recorded in a "delivery" phase.

The slog is disabled during test-kernel.js and test-vpid-kernel.js, because
these tests make syscalls outside the context of a delivery, which is
normally illegal, and the slog complains bitterly when it happens.

refs #1535
warner added a commit that referenced this issue Aug 18, 2020
This adds a set of logging hooks which record swingset deliveries (the kernel
sending messages and resolution notifications into vats), syscalls (from the
vat into the kernel), and console messages. All these events can be
correlated by their crank number, vatID, delivery number, and syscall number.

The output function is not yet implemented; my notional plan is to log to
lines of JSON for now, and maybe move to a SQLite based store for later. I've
not written any analysis tools yet either.

I made a few refactorings to make things smoother:

* vats get their console object from `managerOptions.vatConsole`, rather than
  having the manager build it themselves, which makes it easier to provide
  one that writes to both the regular console and the slog
* as a result, all vats get the same basic `vatEndowments` (with the console
  added on later), rather than calling `makeVatEndowments(vatID)`
* in kernel.js, `deliverToVat` and `processNotify` had their common code
  factored out into `deliverAndLogToVat()`, which now writes to the slog
  in addition to actually performing the `manager.deliver`. The names aren't
  great and want to be changed at some point.
* `buildVatSyscallHandler` function was moved out of `addVatManager` to make
  things easier to read. The new function writes syscalls to the slog.
* console messages written by vats in top-level code, or during the execution
  of `buildRootObject`, are slogged in the "startup" phase. All other calls
  to `console.log` and friends are recorded in a "delivery" phase.

The slog is disabled during test-kernel.js and test-vpid-kernel.js, because
these tests make syscalls outside the context of a delivery, which is
normally illegal, and the slog complains bitterly when it happens.

refs #1535
warner added a commit that referenced this issue Aug 18, 2020
This adds a set of logging hooks which record swingset deliveries (the kernel
sending messages and resolution notifications into vats), syscalls (from the
vat into the kernel), and console messages. All these events can be
correlated by their crank number, vatID, delivery number, and syscall number.

The output function is not yet implemented; my notional plan is to log to
lines of JSON for now, and maybe move to a SQLite based store for later. I've
not written any analysis tools yet either.

I made a few refactorings to make things smoother:

* vats get their console object from `managerOptions.vatConsole`, rather than
  having the manager build it themselves, which makes it easier to provide
  one that writes to both the regular console and the slog
* as a result, all vats get the same basic `vatEndowments` (with the console
  added on later), rather than calling `makeVatEndowments(vatID)`
* in kernel.js, `deliverToVat` and `processNotify` had their common code
  factored out into `deliverAndLogToVat()`, which now writes to the slog
  in addition to actually performing the `manager.deliver`. The names aren't
  great and want to be changed at some point.
* `buildVatSyscallHandler` function was moved out of `addVatManager` to make
  things easier to read. The new function writes syscalls to the slog.
* console messages written by vats in top-level code, or during the execution
  of `buildRootObject`, are slogged in the "startup" phase. All other calls
  to `console.log` and friends are recorded in a "delivery" phase.

The slog is disabled during test-kernel.js and test-vpid-kernel.js, because
these tests make syscalls outside the context of a delivery, which is
normally illegal, and the slog complains bitterly when it happens.

refs #1535
warner added a commit that referenced this issue Aug 18, 2020
This adds a set of logging hooks which record swingset deliveries (the kernel
sending messages and resolution notifications into vats), syscalls (from the
vat into the kernel), and console messages. All these events can be
correlated by their crank number, vatID, delivery number, and syscall number.

The output function is not yet implemented; my notional plan is to log to
lines of JSON for now, and maybe move to a SQLite based store for later. I've
not written any analysis tools yet either.

I made a few refactorings to make things smoother:

* vats get their console object from `managerOptions.vatConsole`, rather than
  having the manager build it themselves, which makes it easier to provide
  one that writes to both the regular console and the slog
* as a result, all vats get the same basic `vatEndowments` (with the console
  added on later), rather than calling `makeVatEndowments(vatID)`
* in kernel.js, `deliverToVat` and `processNotify` had their common code
  factored out into `deliverAndLogToVat()`, which now writes to the slog
  in addition to actually performing the `manager.deliver`. The names aren't
  great and want to be changed at some point.
* `buildVatSyscallHandler` function was moved out of `addVatManager` to make
  things easier to read. The new function writes syscalls to the slog.
* console messages written by vats in top-level code, or during the execution
  of `buildRootObject`, are slogged in the "startup" phase. All other calls
  to `console.log` and friends are recorded in a "delivery" phase.

The slog is disabled during test-kernel.js and test-vpid-kernel.js, because
these tests make syscalls outside the context of a delivery, which is
normally illegal, and the slog complains bitterly when it happens.

refs #1535
warner added a commit that referenced this issue Aug 18, 2020
This adds a set of logging hooks which record swingset deliveries (the kernel
sending messages and resolution notifications into vats), syscalls (from the
vat into the kernel), and console messages. All these events can be
correlated by their crank number, vatID, delivery number, and syscall number.

The output function is not yet implemented; my notional plan is to log to
lines of JSON for now, and maybe move to a SQLite based store for later. I've
not written any analysis tools yet either.

I made a few refactorings to make things smoother:

* vats get their console object from `managerOptions.vatConsole`, rather than
  having the manager build it themselves, which makes it easier to provide
  one that writes to both the regular console and the slog
* as a result, all vats get the same basic `vatEndowments` (with the console
  added on later), rather than calling `makeVatEndowments(vatID)`
* in kernel.js, `deliverToVat` and `processNotify` had their common code
  factored out into `deliverAndLogToVat()`, which now writes to the slog
  in addition to actually performing the `manager.deliver`. The names aren't
  great and want to be changed at some point.
* `buildVatSyscallHandler` function was moved out of `addVatManager` to make
  things easier to read. The new function writes syscalls to the slog.
* console messages written by vats in top-level code, or during the execution
  of `buildRootObject`, are slogged in the "startup" phase. All other calls
  to `console.log` and friends are recorded in a "delivery" phase.

The slog is disabled during test-kernel.js and test-vpid-kernel.js, because
these tests make syscalls outside the context of a delivery, which is
normally illegal, and the slog complains bitterly when it happens.

refs #1535
warner added a commit that referenced this issue Aug 18, 2020
This adds a set of logging hooks which record swingset deliveries (the kernel
sending messages and resolution notifications into vats), syscalls (from the
vat into the kernel), and console messages. All these events can be
correlated by their crank number, vatID, delivery number, and syscall number.

The output function is not yet implemented; my notional plan is to log to
lines of JSON for now, and maybe move to a SQLite based store for later. I've
not written any analysis tools yet either.

I made a few refactorings to make things smoother:

* vats get their console object from `managerOptions.vatConsole`, rather than
  having the manager build it themselves, which makes it easier to provide
  one that writes to both the regular console and the slog
* as a result, all vats get the same basic `vatEndowments` (with the console
  added on later), rather than calling `makeVatEndowments(vatID)`
* in kernel.js, `deliverToVat` and `processNotify` had their common code
  factored out into `deliverAndLogToVat()`, which now writes to the slog
  in addition to actually performing the `manager.deliver`. The names aren't
  great and want to be changed at some point.
* `buildVatSyscallHandler` function was moved out of `addVatManager` to make
  things easier to read. The new function writes syscalls to the slog.
* console messages written by vats in top-level code, or during the execution
  of `buildRootObject`, are slogged in the "startup" phase. All other calls
  to `console.log` and friends are recorded in a "delivery" phase.

The slog is disabled during test-kernel.js and test-vpid-kernel.js, because
these tests make syscalls outside the context of a delivery, which is
normally illegal, and the slog complains bitterly when it happens.

refs #1535
warner added a commit that referenced this issue Aug 28, 2020
This overhauls the comms vat to use a new reference-tracking model. It should
fix a number of gaps where the previous code would throw an error:

* pass an already-resolved Promise through any message, this would make
  the comms vat re-use a retired VPID, which is a vat-fatal error
* resolve a remotely-sourced promise to a local object (rather than a remote
  one), then pipeline a message to it (so the message should be reflected back
  into the kernel)
* resolve a local promise to a remote object, then the remote pipelines a
  message to it (so the message should be reflected back out to the remote)
* passing one remote's references to a different remote

The last case is the "three-party handoff". The full solution (with
shortening, grant-matching, and automatic connection establishment) is quite
a ways off, but this change ought to implement "three-party forwarding". In
this form, when A sends B a reference to C, what B gets is effectively an
object on A that automatically forwards all messages off to C. This "object"
is hidden inside the comms vat and is not reified as a real object.

Three-party forwarding is not tested yet.

refs #1535
refs #1404
warner added a commit that referenced this issue Aug 28, 2020
This overhauls the comms vat to use a new reference-tracking model. It should
fix a number of gaps where the previous code would throw an error:

* pass an already-resolved Promise through any message, this would make
  the comms vat re-use a retired VPID, which is a vat-fatal error
* resolve a remotely-sourced promise to a local object (rather than a remote
  one), then pipeline a message to it (so the message should be reflected back
  into the kernel)
* resolve a local promise to a remote object, then the remote pipelines a
  message to it (so the message should be reflected back out to the remote)
* passing one remote's references to a different remote

The last case is the "three-party handoff". The full solution (with
shortening, grant-matching, and automatic connection establishment) is quite
a ways off, but this change ought to implement "three-party forwarding". In
this form, when A sends B a reference to C, what B gets is effectively an
object on A that automatically forwards all messages off to C. This "object"
is hidden inside the comms vat and is not reified as a real object.

Three-party forwarding is not tested yet.

refs #1535
refs #1404
warner added a commit that referenced this issue Aug 28, 2020
This overhauls the comms vat to use a new reference-tracking model. It should
fix a number of gaps where the previous code would throw an error:

* pass an already-resolved Promise through any message, this would make
  the comms vat re-use a retired VPID, which is a vat-fatal error
* resolve a remotely-sourced promise to a local object (rather than a remote
  one), then pipeline a message to it (so the message should be reflected back
  into the kernel)
* resolve a local promise to a remote object, then the remote pipelines a
  message to it (so the message should be reflected back out to the remote)
* passing one remote's references to a different remote

The last case is the "three-party handoff". The full solution (with
shortening, grant-matching, and automatic connection establishment) is quite
a ways off, but this change ought to implement "three-party forwarding". In
this form, when A sends B a reference to C, what B gets is effectively an
object on A that automatically forwards all messages off to C. This "object"
is hidden inside the comms vat and is not reified as a real object.

Three-party forwarding is not tested yet.

refs #1535
refs #1404
@Tartuffo
Copy link
Contributor

Tartuffo commented Feb 4, 2022

@warner Is this done or still relevant? Let's close if not needed.

@dckc dckc added the logging label Jun 16, 2022
@dckc
Copy link
Member

dckc commented Jan 19, 2024

this is pretty clearly done

@dckc dckc closed this as completed Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request logging
Projects
None yet
Development

No branches or pull requests

4 participants