Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trie log pruning triggered after trie log persist #6026

Merged
merged 31 commits into from
Nov 17, 2023

Conversation

siladu
Copy link
Contributor

@siladu siladu commented Oct 12, 2023

First piece of #5390

Feature enabled with --Xbonsai-trie-log-pruning-enabled renamed to --Xbonsai-limit-trie-logs-enabled

TrieLogPruner loads all a limited number of trie logs on startup to preload the pruner cache queue, based on loadingLimit with a default value of 30,000 blocks (configured with --Xbonsai-trie-log-prune-limit renamed to --Xbonsai-trie-logs-pruning-window-size). We immediately prune anything that is eligible for pruning following the load.

Each time a trie log is persisted it is added to the pruner cache queue and then the pruner is run against the cache queue, which will prune trie logs associated with block numbers below the --Xbonsai-trie-log-retention-threshold (defaults to 0 === disabled) now using --bonsai-historical-block-limit.
The retention-threshold is approximately the size of the trie log pruning cache queue (it is measured in blocks but the multimap cache queue may contain forks, so they can differ).

For PoS networks, we read the finalizedBlock hash from the blockchain, load the header and use that to ensure we don't prune non-finalized blocks which might be subject to reorg. Since the minimum retention is currently 512, there would have to be a long non-finality event and reorg for this check to be needed. It is a nice safety net to have though and opens to possibility to prune closer to head in the future.

Trie log backlog management

There are two reasons the database might contain old trie logs that don't exist in the cache queue:

  1. Pruning was enabled on an existing node that already has trie logs in the database (backlog could be quite large in this case).
  2. Besu was restarted, clearing the pruner cache queue but leaving the database with the trie logs associated with the blocks above the retention-threshold.

The backlog is gradually pruned: a single set of trie logs are loaded each time besu is started up, which ensures pruneable trie logs aren't forgotten when besu gets restarted, whilst avoiding any complicated logic for progressively managing the backlog.

A loadingLimit > retention-threshold is desirable since if besu is restarted, we want to at least preload the amount of trie logs that were forgotten by the cache queue.

Once backlog is cleared, each prune run should just be a single trie log.

Depending on the size of the backlog, it is possible that it will never be cleared. This is particularly true for nodes that have been running before this feature was enabled. To mitigate this, we can offer one of two options:

  1. A one-off subcommand to trim the trie logs down to the retention-threshold (future PR, see the WIP [WIP] Trie log pruning  #6000)
  2. Resync besu

Tasks:

@github-actions
Copy link

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I thought about the changelog and included a changelog update if required.
  • If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

@@ -203,6 +204,10 @@ public Optional<byte[]> getTrieLog(final Hash blockHash) {
return trieLogStorage.get(blockHash.toArrayUnsafe());
}

public Stream<byte[]> streamTrieLogs() {
return trieLogStorage.streamKeys();
Copy link
Contributor Author

@siladu siladu Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to preload the pruner. Need to test the feasibility of this on real nodes with lots of trie logs, might need to limit it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes seems to be dangerous to do that without limit

Copy link
Contributor

@garyschulte garyschulte Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. If we do not close the RocksIterator, that is going to lead to problems. I would suggest passing in a Consumer or something like that so we can compartmentalize the iterator open/close. Or do like streamFlat* methods and supply a Predicate or row limit, collect to a map and return.

Copy link
Contributor Author

@siladu siladu Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm simply using streamKeys().limit(..), hopefully that is safe!

8658aec (#6026)

More info about new approach in the updated description

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class TrieLogPruner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather see the pruning implemented in a subclass of CachedWorldStorageManager. This feels like a management function that could benefit from knowing the current state of cache. we could ensure that DEFAULT_MAX_BLOCKS_TO_PRUNE doesn't conflict with AbstractTrieLogManager.RETAINED_LAYERS, etc.

Having a separate pruner class breaks encapsulation imo.

Copy link
Contributor Author

@siladu siladu Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main problem with this is that at the CachedWorldStorageManager level, the cachedWorldStatesByHash cache only maintains canonical blocks, not forks...in other words CacheWorldStorageManager.addCachedLayer (and therefore scrubCachedLayers) is only called for canonical blocks (when the head is moved during an FCU AFAICT).

So we'd miss out on pruning forks this way.

Am still investigating potential benefits of the pruning being aware of this cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in person, these concepts (and therefore caches) are separated out now. There may still be a task to align these in code rather than relying on configuration.

Comment on lines 93 to 94
trieLogPruner.rememberTrieLogKeyForPruning(
forBlockHeader.getNumber(), forBlockHeader.getBlockHash().toArrayUnsafe());
Copy link
Contributor

@garyschulte garyschulte Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since block number isn't unique for trielogs, we will get collisions during reorgs that could result in stale trielogs hanging around. depending on how the initial/startup cleanup is implemented, this might not be a major concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, the data structure hold forks for pruning too (it's a multimap)

@siladu siladu added the TeamGroot GH issues worked on by Groot Team label Oct 12, 2023
@usmansaleem usmansaleem removed the TeamGroot GH issues worked on by Groot Team label Oct 12, 2023
@siladu
Copy link
Contributor Author

siladu commented Oct 16, 2023

Test results for this first iteration (with no startup loadingLimit) on some ~3 week old nodes with ~150K trie logs using
numBlocksToRetain = 512;
pruningWindowSize = 1000;

Most of the work is in streaming all the trie log entries into the knownTrieLogKeysByDescendingBlockNumber cache at startup, which currently blocks from besu starting up e.g.

{"@timestamp":"2023-10-16T01:04:07,335","level":"INFO","thread":"main","class":"TrieLogPruner","message":"Loading trie logs from database...","throwable":""}
{"@timestamp":"2023-10-16T01:04:47,761","level":"INFO","thread":"main","class":"TrieLogPruner","message":"Loaded 134958 trie logs from database","throwable":""}
Network Number of trie log Loading Time (s)
Sepolia 165213 32
Goerli 141391 30
Mainnet 134956 40
Mainnet 515 24
Mainnet 516 6

The pruning windows themselves were performant since RocksDB marks for deletion, the range of time taken:
pruningWindowSize = 1000: 10 - 27 ms
pruningWindowSize = 1: 2 - 4ms (normal operation once backlog is cleared)

TrieLogPruner loads all trie logs on startup.
Each time a trie log is persisted, the pruner is run and uses a pruning window (currently hardcoded to 1000) to chip away at an initially large backlog of trie logs
Once backlog is cleared, each prune run should just be a single trie log.

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Defaults to 0 = unlimited threshold === pruning disabled
Wire in the TrieLogPruner based on this.

Initialize the pruner on startup to preload the cache using the loadingLimit (currently hardcoded as 1000)

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@siladu
Copy link
Contributor Author

siladu commented Oct 31, 2023

Test mainnet to determine reasonable pruningLimit, which will impact besu startup time.

Testing various number of trie log limits for preloading pruning cache.
Tested on mainnet nodes ~5 weeks old with ~240,000 trie logs in the database (14 GB).

Instance Loading Limit Loading Time avg (s)
dev-elc-besu-nimbus-mainnet-simon-memory-23.7.2-1 500 0.646
dev-elc-besu-nimbus-mainnet-simon-memory-23.7.2-1 5000 1.4
dev-elc-besu-nimbus-mainnet-simon-memory-23.7.2-1 50000 8
dev-elc-besu-nimbus-mainnet-simon-memory-23.7.2-1 25000 3.6
dev-elc-besu-nimbus-mainnet-simon-memory-23.7.2-2 30000 4.4
dev-elc-besu-nimbus-mainnet-simon-memory-23.7.3-RC-1 30000 4.6

UPDATE: Tested on prd-elc-besu-teku-mainnet-bonsai-snap with 1,400,000 trie logs (60 GB) and got a similar load time for 30000 trie logs on mainnet: 4.8 seconds. This prd-elc-besu-teku-mainnet-bonsai-snap test is invalid due to some previous block production simulations that were run on it, resulting in lots of orphaned trie logs.

Tested on the 3 month old prd-elc-besu-lodestar-mainnet-bonsai-snap with 631,705 trie logs (37 GB) and got an average of 7s. Here was the breakdown of canonical/forks/orphaned trie logs...

trieLog count: 631705
 - canonical count: 629934
 - fork count: 1771
 - orphaned count: 0

(Throwaway code for this test #6108)

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Rename method
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@siladu
Copy link
Contributor Author

siladu commented Nov 1, 2023

Test engine API perf impact

IMPORTANT: this was not a definitive test because there are too many differences between the nodes. I just used what was already available and had pruning enabled for a while. A more definitive test is currently underway.

This is using the first iteration of this code that has been running for a while...

besu-trielog-pruning-perf

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@siladu siladu requested review from garyschulte and matkt November 2, 2023 09:30
@siladu siladu added the TeamGroot GH issues worked on by Groot Team label Nov 2, 2023
@siladu
Copy link
Contributor Author

siladu commented Nov 2, 2023

Ready for code review, but leaving in draft until testing is complete.

(...instead of --Xbonsai-trie-log-retention-threshold=0)
Use 512 as default and also minimum value for --Xbonsai-trie-log-retention-threshold
Validate that --Xbonsai-trie-log-prune-limit is positive
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
…imit

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Renames and logging
Refactor test
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Separate list is for logging reasons
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@siladu
Copy link
Contributor Author

siladu commented Nov 7, 2023

Second and third iteration tested on the following deployments:

Instance Description Commit Ref
dev-elc-besu-teku-mainnet-simon-6026-control-* Last merged in change from main 236779d
dev-elc-besu-teku-mainnet-simon-6026-prune-* Without finalized block check: no extra db calls due to pruner cache (except on startup) bd53bdf
dev-elc-besu-teku-mainnet-simon-6026-prune-finalized-* Same as -prune but with extra finalized block check: two extra db calls 6d22bf7

Last 12 hours:

New Payload:

6026-newPayload-performance

FCU:

6026-newPayload-performance

Java Memory Used:

6026-used-heap-nonheap

Java Memory Committed:

6026-committed-heap-nonheap

The mean response times and memory usage appear to be randomly spread across the different deployment versions so I don't think there's a significant difference between -control, -prune and -prune-finalized.

@siladu siladu marked this pull request as ready for review November 7, 2023 08:45
Copy link
Contributor

@matkt matkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as it's experimental feature, we will have to do other PRs to do flag renaming and call db optimization

Comment on lines 313 to 317
lines.add("Trie log pruning enabled:");
lines.add(" - retention threshold: " + trieLogRetentionThreshold + " blocks");
if (trieLogPruningLimit != null) {
lines.add(" - prune limit: " + trieLogPruningLimit + " blocks");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems a little verbose to be using three lines in the config logging. can to be shortened to one line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done this here: 6b27522
Looks like this:

# Trie log pruning enabled: retention: 512; prune limit: 30000                                     #

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
final
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@siladu
Copy link
Contributor Author

siladu commented Nov 15, 2023

I setup a reorg test based on this hive test, with some modifications to send finalized blocks trailing the head by 10 blocks. Note it uses a mock CL.

In the abbreviated logs below, we:

  • build a chain and start finalizing it
  • simultaneously build a sidechain starting from 6
  • reorg to a sidechain at Step 16
  • reorg back to the original chain at Step 26.

The test was setup with an artificially low --Xbonsai-trie-log-retention-threshold=1 to ensure that we were relying on the finalizedBlock check to prevent premature pruning.

Step | FCU | Block | FinalizedBlock
## 0. FCU 0
## 1. FCU 1
## 2. FCU 2
## 3. FCU 3
## 4. FCU 4
## 5. FCU 5 (common ancestor)
## 6. FCU 6
## 7. FCU 7
## 8. FCU 8
## 9. FCU 9
## 10. FCU 10
## 11. FCU 11 FINALIZED 1
## 12. FCU 12 FINALIZED 2
## 13. FCU 13 FINALIZED 3
## 14. FCU 14 FINALIZED 4
## 15. FCU 15 FINALIZED 5
## 16. FCU 6' FINALIZED 5
## 17. FCU 7' FINALIZED 5
## 18. FCU 8' FINALIZED 5
## 19. FCU 9' FINALIZED 5
## 20. FCU 10' FINALIZED 5
## 21. FCU 11' FINALIZED 5
## 22. FCU 12' FINALIZED 5
## 23. FCU 13' FINALIZED 5
## 24. FCU 14' FINALIZED 5
## 25. [INVALID] FCU 15' [INVALID] FINALIZED 5
## Resend the latest correct fcU
## 26. FCU 15 FINALIZED 5
## 27. FCU 16 FINALIZED 6
## 28. FCU 17 FINALIZED 7

(I intend to turn this into a besu AT in a future PR)

Copy link
Contributor

@matkt matkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM seems to be fine with finalized block modification

@BeforeEach
public void setup() {
Configurator.setLevel(LogManager.getLogger(TrieLogPruner.class).getName(), Level.TRACE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this one is needed for the test ?

Copy link
Contributor Author

@siladu siladu Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed - purely convenience for comprehending the tests when you run them. I thought it was useful enough to leave in but happy to remove.

.addArgument(loadingLimit)
.log();
try {
final Stream<byte[]> trieLogKeys = rootWorldStateStorage.streamTrieLogKeys(loadingLimit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should be done in a try-with-resources so that we close the stream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 13f0040

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
@siladu
Copy link
Contributor Author

siladu commented Nov 16, 2023

Test pruner cache/queue size, with default value of 512 retention, size of the cache/queue is ~140KB
queue-size-512-trielog-keys

140KB / 512 = ~275 bytes per trielog

In a non-finality event, this queue is unbounded, however since the size is small I don't think this will be a problem. More likely other parts of besu would break first.

For context, the recent non-finality event lasted a max of 9 epochs. Max of 32 blocks per epoch = 9 * 32 = 288 blocks which is less than the default bound of 140KB.

To reach Megabyte magnitude, the non-finality event would have to be 3,636 blocks long (1 MB / 275 bytes) = 114 epochs = 12 hours of non-finality
To reach Gigabyte magnitude, the non-finality event would have to be 3,636,363 blocks long (1 GB / 275 bytes) = 113,636 epochs = 1.4 years of non-finality

Copy link
Contributor

@jframe jframe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahamlat
Copy link
Contributor

ahamlat commented Nov 16, 2023

Checking the performance and the implementation, I think this should be done asynchronously to not have an overhead on the block processing time. Also, I think handling this case on either OnblockAdded event or onTrieLogAdded event, and process the event asynchronously.
Nonetheless, I didn't observe any performance degradation. Given that this is hidden behind an experimental flag, I approve the PR from a performance overhead perspective. The async optimization can be done in another PR.

@siladu
Copy link
Contributor Author

siladu commented Nov 17, 2023

Test on existing validator canary once code is finalized (which should have orphaned trielogs)

TL;DR it works 🎉

Tested on prd-elc-besu-lighthouse-sepolia-bonsai-snap

Trie log count before enabling pruning:

trieLog count: 742311
 - canonical count: 681039
 - fork count: 217
 - orphaned count: 61055

As expected, there are orphaned trielogs due to block creation. The number is high because this sepolia node is 20 / 1972 validators = 1% of the network so does an unusual amount of block creation compared to mainnet nodes.

Log snippets

# Besu version 23.10.3-dev-81115bd9                                                                #
...
# Trie log pruning enabled: retention: 512; prune limit: 30000                                     #

Example of the initial load picking up a batch of trielogs. Note, we don't control the order of the trielog loading due to the way they are stored - it appears to be quite random. The upshot is that not all of them are eligible for pruning immediately because they may fall inside the retention window...

2023-11-16T09:19:55,439 INFO TrieLogPruner "Loading first 30000 trie logs from database..."
2023-11-16T09:20:04,694 INFO TrieLogPruner "Loaded 27528 trie logs from database"
2023-11-16T09:20:04,928 DEBUG TrieLogPruner "pruned 27470 trie logs from 27469 blocks"

After that (and every restart), there's a period of no pruning while the queue builds up its 512 block retention window. This period lasts ~2 hours. This is only the case for pre-existing nodes with a larger backlog of trielogs. If total number of trielogs is <= 30,000 then we will prune most on load and have a full retention window to begin. Note pruning triggers on both block processing thread and block building thread...

{"@timestamp":"2023-11-16T09:20:21,302","level":"TRACE","thread":"vert.x-worker-thread-0","class":"TrieLogPruner","message":"pruned 0 trie logs for blocks {}","throwable":""}
...
{"@timestamp":"2023-11-16T09:32:56,171","level":"TRACE","thread":"EthScheduler-BlockCreation-0","class":"TrieLogPruner","message":"pruned 0 trie logs for blocks {}","throwable":""}
...
{"@timestamp":"2023-11-16T11:12:12,820","level":"TRACE","thread":"vert.x-worker-thread-0","class":"TrieLogPruner","message":"pruned 0 trie logs for blocks {}","throwable":""}

During this time we prune the odd trielog, which I believe occurs when an queue entry that was preloaded during startup becomes eligible.

Once the queue size reaches 512 retention window, we start pruning ~1 block every block (one in, one out).
At this point you will notice the trielogs are pruned in order since it's a sliding window by block number...

{"@timestamp":"2023-11-16T11:12:24,376","level":"TRACE","thread":"vert.x-worker-thread-0","class":"TrieLogPruner","message":"pruned 1 trie logs for blocks {4704297=[0x1e335684e0c7d41a1a3bc13b321e2f27b73672c6a37515e1c735bc8ac213c393]}","throwable":""}
{"@timestamp":"2023-11-16T11:12:36,385","level":"TRACE","thread":"vert.x-worker-thread-0","class":"TrieLogPruner","message":"pruned 1 trie logs for blocks {4704298=[0x2875008c2ee12ab366d7969e6682a9928260c359e93333e0513ca87015df3a56]}","throwable":""}
{"@timestamp":"2023-11-16T11:13:00,985","level":"TRACE","thread":"vert.x-worker-thread-0","class":"TrieLogPruner","message":"pruned 1 trie logs for blocks {4704299=[0x913742463b968f41cc4e101844493c34e87cd444def91ca386ed6c09b218cb16]}","throwable":""}

When we propose a block, we actually save multiple trie logs, but only the latest one makes it into the chain (hence the orphaned trielogs). During pruning (512 blocks later), these will show up as a list of trie logs stored against the same block number (similar to forks).

{"@timestamp":"2023-11-16T09:32:56,018","level":"INFO","thread":"vert.x-worker-thread-0","class":"MergeCoordinator","message":"Start building proposals for block 4704354 identified by 0x00688f3cd98f37a0","throwable":""}
...
// FCU thread creating the initial empty block...
{"@timestamp":"2023-11-16T09:32:56,017","level":"TRACE","thread":"vert.x-worker-thread-0","class":"TrieLogPruner","message":"adding trie log to queue for later pruning blockNumber 4704354; blockHash 0xd2b16e03e6d312a9989da949ac06a7cf6319e575f63ef1fbedba2b0ce9ce275b","throwable":""}
// Block creation loop...
{"@timestamp":"2023-11-16T09:32:56,171","level":"TRACE","thread":"EthScheduler-BlockCreation-0","class":"TrieLogPruner","message":"adding trie log to queue for later pruning blockNumber 4704354; blockHash 0x6724630e7d47e454de20b95a913a4997a3ce8bf138c04e413045bab896435aa9","throwable":""}
{"@timestamp":"2023-11-16T09:32:56,677","level":"TRACE","thread":"EthScheduler-BlockCreation-0","class":"TrieLogPruner","message":"adding trie log to queue for later pruning blockNumber 4704354; blockHash 0x656bae99dd673404531b9fe95b09bf0a281df9c803d0c11b5b054838046ecd3f","throwable":""}
{"@timestamp":"2023-11-16T09:32:57,144","level":"TRACE","thread":"EthScheduler-BlockCreation-0","class":"TrieLogPruner","message":"adding trie log to queue for later pruning blockNumber 4704354; blockHash 0x4379fe26a8e77cfaa4668065fceca9cefe2fbb64ee0213eedc3993196177743d","throwable":""}
{"@timestamp":"2023-11-16T09:32:57,728","level":"TRACE","thread":"EthScheduler-BlockCreation-0","class":"TrieLogPruner","message":"adding trie log to queue for later pruning blockNumber 4704354; blockHash 0xc87cdd96177387b20327573e682a23147215c99597599a9280848e9c54152622","throwable":""}
{"@timestamp":"2023-11-16T09:32:58,262","level":"TRACE","thread":"EthScheduler-BlockCreation-0","class":"TrieLogPruner","message":"adding trie log to queue for later pruning blockNumber 4704354; blockHash 0x3e3a13b87a3ef3b342581fab6d3794a563aae79f5c11feb73b2415355da341d8","throwable":""}
{"@timestamp":"2023-11-16T09:32:58,744","level":"TRACE","thread":"EthScheduler-BlockCreation-0","class":"TrieLogPruner","message":"adding trie log to queue for later pruning blockNumber 4704354; blockHash 0x39b3f1ca18ca4a6c5a15dd3b8076e03724393412953c4d378cbb4caf61b213c8","throwable":""}
{"@timestamp":"2023-11-16T09:32:59,251","level":"TRACE","thread":"EthScheduler-BlockCreation-0","class":"TrieLogPruner","message":"adding trie log to queue for later pruning blockNumber 4704354; blockHash 0x9cf0aea39d630ffcf79458a89348ccb584991051ba80e4672df83fca40fa8bcc","throwable":""}
{"@timestamp":"2023-11-16T09:32:59,756","level":"TRACE","thread":"EthScheduler-BlockCreation-0","class":"TrieLogPruner","message":"adding trie log to queue for later pruning blockNumber 4704354; blockHash 0x0ec8491df4952c450503933d43e8c8163881a335f00816ed37b8bd0f42bb3085","throwable":""}
...
{"@timestamp":"2023-11-16T11:24:24,577","level":"TRACE","thread":"vert.x-worker-thread-0","class":"TrieLogPruner","message":"pruned 9 trie logs for blocks {4704354=[0x0ec8491df4952c450503933d43e8c8163881a335f00816ed37b8bd0f42bb3085, 0x39b3f1ca18ca4a6c5a15dd3b8076e03724393412953c4d378cbb4caf61b213c8, 0x3e3a13b87a3ef3b342581fab6d3794a563aae79f5c11feb73b2415355da341d8, 0x4379fe26a8e77cfaa4668065fceca9cefe2fbb64ee0213eedc3993196177743d, 0x656bae99dd673404531b9fe95b09bf0a281df9c803d0c11b5b054838046ecd3f, 0x6724630e7d47e454de20b95a913a4997a3ce8bf138c04e413045bab896435aa9, 0x9cf0aea39d630ffcf79458a89348ccb584991051ba80e4672df83fca40fa8bcc, 0xc87cdd96177387b20327573e682a23147215c99597599a9280848e9c54152622, 0xd2b16e03e6d312a9989da949ac06a7cf6319e575f63ef1fbedba2b0ce9ce275b]}","throwable":""}

@siladu siladu merged commit 8c35ce1 into hyperledger:main Nov 17, 2023
jflo pushed a commit to jflo/besu that referenced this pull request Nov 20, 2023
- Toggled with --Xbonsai-trie-log-pruning-enabled

- Introduces TrieLogPruner which loads a limited number of trie logs on startup to preload the pruner queue, based on loadingLimit with a default value of 30,000 blocks (configured with --Xbonsai-trie-log-pruning-limit).

- Each time a trie log is persisted it is added to the pruner queue and then the pruner is run against the queue, which will prune trie logs associated with block numbers below the --Xbonsai-trie-log-retention-threshold (default 512).

- Once the retention threshold is reached, each prune run should just be a single trie log.

- Prune any orphaned trielogs that were created during block creation.

- Don't prune non-finalized blocks for PoS chains.

---------

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Justin Florentine <justin+github@florentine.us>
jflo pushed a commit to jflo/besu that referenced this pull request Dec 4, 2023
- Toggled with --Xbonsai-trie-log-pruning-enabled

- Introduces TrieLogPruner which loads a limited number of trie logs on startup to preload the pruner queue, based on loadingLimit with a default value of 30,000 blocks (configured with --Xbonsai-trie-log-pruning-limit).

- Each time a trie log is persisted it is added to the pruner queue and then the pruner is run against the queue, which will prune trie logs associated with block numbers below the --Xbonsai-trie-log-retention-threshold (default 512).

- Once the retention threshold is reached, each prune run should just be a single trie log.

- Prune any orphaned trielogs that were created during block creation.

- Don't prune non-finalized blocks for PoS chains.

---------

Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Justin Florentine <justin+github@florentine.us>
@siladu siladu deleted the trie-log-pruning branch January 15, 2024 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TeamGroot GH issues worked on by Groot Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants