Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

ledger-tool: stream output of accounts subcommand #28461

Merged
merged 1 commit into from
Nov 17, 2022

Conversation

t-nelson
Copy link
Contributor

Problem

ledger-tool accounts buffers all accounts before printing them to console. with the number of accounts on some of the public cluster, this consumes an unreasonable amount of memory, requiring absurd hardware to do what should be a fairly trivial task

Summary of Changes

stream the output while we scan instead of buffering

@t-nelson
Copy link
Contributor Author

t-nelson commented Oct 20, 2022

Note to self: if backporting, need #28501, #28502 and #28533

@t-nelson t-nelson force-pushed the ltsa branch 4 times, most recently from dbe34e2 to f229169 Compare October 21, 2022 23:54
@t-nelson t-nelson added CI Pull Request is ready to enter CI and removed noCI Suppress CI on this Pull Request labels Oct 21, 2022
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Oct 21, 2022
@t-nelson
Copy link
Contributor Author

t-nelson commented Oct 21, 2022

seems to work. former is master, latter this PR

test:

./cargo run --release --bin solana-ledger-tool -- --ledger ${LEDGER_DIR} accounts --halt-at-slot ${SNAPSHOT_SLOT} --encoding base64 --output json > /dev/null

where SNAPSHOT_SLOT is the slot height at which the most recent snapshot in LEDGER_DIR was taken. snapshot was from mb, slot 155512607.

Screenshot from 2022-10-21 17-37-51

@t-nelson t-nelson marked this pull request as ready for review October 22, 2022 03:04
@t-nelson
Copy link
Contributor Author

@jeffwashington would you expect such memory growth (~20GB, 16-36GB) during a full accounts scan under normal operation? i reran the above tests with an empty scan_func and observed the same growth as the second run (with this PR)

Screenshot from 2022-10-24 18-57-55

gonna try again with two scan passes and see how that behaves

@t-nelson
Copy link
Contributor Author

gonna try again with two scan passes and see how that behaves

Screenshot from 2022-10-25 00-51-36

flat on the second run... must be some cache priming 🤷‍♂️

@jeffwashington
Copy link
Contributor

try this metric:
SELECT mean("count_in_mem") AS "mean_count_in_mem" FROM ":db:"."autogen"."accounts_index"
it is possible/likely that the entire accounts index is being loaded into memory during this scan.

Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't take it for a spin yet, but lgtm!
Not that this wold ever come up, but I assume that json encoding with zero loadable accounts would print some version of []?

@t-nelson
Copy link
Contributor Author

t-nelson commented Nov 16, 2022

Not that this wold ever come up, but I assume that json encoding with zero loadable accounts would print some version of []?

yep! creating the serializer_seq then .end()ing it does the wrapping

Copy link
Contributor

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@t-nelson t-nelson merged commit 53a579b into solana-labs:master Nov 17, 2022
@t-nelson t-nelson deleted the ltsa branch November 17, 2022 19:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants