Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Defer script extraction until we know it's needed #6984

Merged
merged 2 commits into from
Feb 21, 2024

Conversation

cdecker
Copy link
Member

@cdecker cdecker commented Jan 9, 2024

We were always allocating space and copying the scripts from the
outputs, and then immediately discarding them again if they weren't
P2WSH outputs. This patch adds an introspection method, and then uses
it to determine whether we're interested in the output before cloning
the script. This should save us almost 1 allocation and 1 deallocation
for each output in a block (thousands).

Builds on top of #6983

Copy link
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK efca299

Review the last two commits! and reviewed the #6983 as well

I do not know if you are finishing to work on this or it is just a draft because the CI is failing.

Regarding the CI looks like there is just a server flakyness

Secret source: Actions
Prepare workflow directory
Prepare all required actions
Getting action download info
Error: Internal server error occurred while resolving "actions/checkout@v3". Internal server error occurred while resolving "actions/download-artifact@v3". Internal server error occurred while resolving "actions/setup-python@v4"

bitcoin/tx.c Outdated Show resolved Hide resolved
lightningd/chaintopology.c Outdated Show resolved Hide resolved
@jgriffiths
Copy link
Contributor

Also worth noting that all Elements/Liquid txs will have a fee output which passes the policy asset check and fails the script check, which you can skip by ignoring outputs with a zero script_len before the asset check.

@jgriffiths
Copy link
Contributor

One final note, I believe that this code in get_new_block:

        /* Annotate all transactions with the chainparams */
        for (size_t i = 0; i < tal_count(blk->tx); i++)
                blk->tx[i]->chainparams = chainparams;

Is redundant because its already set in bitcoin_block_from_hex:

       for (i = 0; i < num; i++) {
                ...
                b->tx[i]->chainparams = chainparams;
                ...
        }

@cdecker
Copy link
Member Author

cdecker commented Jan 10, 2024

Thanks @jgriffiths that is very helpful. I spent the night running benchmarks, and for some reason the performance of both of our patches ends up worse than just yours, despite allocating less (???). It could very well be that I mistakenly picked ripemd160 and therefore end up doing the majority of the allocations again, but I'm still looking 👍

@jgriffiths
Copy link
Contributor

@cdecker The incorrect push size check means that the utxos of interest won't be being tracked so its an apples to oranges comparison.

Thinking more about this code though, and given that fee outputs likely make up somewhere between 30-50% of Elements outputs, an overall faster and simpler approach is probably just to early-out based on the script length, and avoid adding any new introspection function. We can also skip pegin outputs for free when checking for coinbase txs, and remove the tal_count overhead as described elsewhere.

I've added the above changes to #6983 and rebased it, so you can directly benchmark that PR against master.

@cdecker
Copy link
Member Author

cdecker commented Jan 16, 2024

I finally got around to benchmarking the changes in #6983 and this PR. And the results are quite nice, thanks to @jgriffiths. The benchmarks were run 5 times each, on the same range of blocks from 824000 to 825000, cached by a reverse proxy to minimize time we spend asking bitcoind for block data, and we pick the median run as representative:

Name Blocks Time [s] Blocks / s Speedup [%]
Baseline 1000 193 5.164 0%
#6983 1000 151 6.611 28.0%
#6983 & #6984 1000 127 7.818 51.3%

This is still with the older version of #6983, that did not cache the tal_count() calls and skip empty scripts, so we might even get out a bit more perf, but I think these results are sufficient to show their usefulness :-)

@cdecker cdecker force-pushed the 202402-chaintopo-speedup branch 2 times, most recently from 2cba7de to 0d139d9 Compare January 16, 2024 16:00
@cdecker cdecker marked this pull request as ready for review January 16, 2024 16:07
@cdecker cdecker removed this from the v24.02 milestone Jan 16, 2024
@cdecker cdecker marked this pull request as draft January 16, 2024 17:01
@cdecker
Copy link
Member Author

cdecker commented Jan 16, 2024

Converted back to Draft, since we appear to accidentally break some tests.

@cdecker cdecker force-pushed the 202402-chaintopo-speedup branch 2 times, most recently from 8d58500 to 8f33846 Compare February 13, 2024 13:10
bitcoin/tx.c Outdated

return output->script_len == BITCOIN_SCRIPTPUBKEY_P2WSH_LEN &&
output->script[0] == OP_0 &&
output->script[1] != OP_PUSHBYTES(sizeof(struct sha256));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be == .

I don't think adding this function is beneficial though TBH - there is no encapsulation benefit to adding this call and no need to extract the script as a tal array when you already have it and its length available and already reference the fields from the tx output that holds it. I would replace this entire PR with the following, which removes all allocations and requires no ancillary functions/redundant asserts etc:

diff --git a/lightningd/chaintopology.c b/lightningd/chaintopology.c
index 4b127999b..f7beb8288 100644
--- a/lightningd/chaintopology.c
+++ b/lightningd/chaintopology.c
@@ -958,20 +958,20 @@ static void topo_add_utxos(struct chain_topology *topo, struct block *b)
 		for (size_t n = 0; n < tx->wtx->num_outputs; n++) {
 			if (tx->wtx->outputs[n].features & skip_features)
 				continue;
-			if (tx->wtx->outputs[n].script_len != BITCOIN_SCRIPTPUBKEY_P2WSH_LEN)
-				continue; /* Cannot possibly be a p2wsh utxo */
+			if (tx->wtx->outputs[n].script_len != BITCOIN_SCRIPTPUBKEY_P2WSH_LEN ||
+                            tx->wtx->outputs[n].script[0] != OP_O ||
+                            tx->wtx->outputs[n].script[1] != OP_PUSHBYTES(sizeof(struct sha256)))
+				continue; /* Not a p2wsh utxo */
 
 			struct amount_asset amt = bitcoin_tx_output_get_amount(tx, n);
 			if (!amount_asset_is_main(&amt))
 				continue; /* Ignore non-policy asset outputs */
 
-			const u8 *script = bitcoin_tx_output_get_script(tmpctx, tx, n);
-			if (!is_p2wsh(script, NULL))
-				continue; /* We only care about p2wsh utxos */
-
 			struct bitcoin_outpoint outpoint = { b->txids[i], n };
 			wallet_utxoset_add(topo->ld->wallet, &outpoint,
-					   b->height, i, script,
+					   b->height, i,
+					   tx->wtx->outputs[n].script,
+					   tx->wtx->outputs[n].script_len,
 					   amount_asset_to_sat(&amt));
 		}
 	}
diff --git a/wallet/wallet.c b/wallet/wallet.c
index da9d789c8..c29b88dd6 100644
--- a/wallet/wallet.c
+++ b/wallet/wallet.c
@@ -4343,8 +4343,8 @@ bool wallet_outpoint_spend(struct wallet *w, const tal_t *ctx, const u32 blockhe
 
 void wallet_utxoset_add(struct wallet *w,
 			const struct bitcoin_outpoint *outpoint,
-			const u32 blockheight,
-			const u32 txindex, const u8 *scriptpubkey,
+			const u32 blockheight, const u32 txindex,
+                        const u8 *scriptpubkey, const size_t scriptpubkey_len,
 			struct amount_sat sat)
 {
 	struct db_stmt *stmt;
@@ -4363,7 +4363,7 @@ void wallet_utxoset_add(struct wallet *w,
 	db_bind_int(stmt, blockheight);
 	db_bind_null(stmt);
 	db_bind_int(stmt, txindex);
-	db_bind_talarr(stmt, scriptpubkey);
+	db_bind_blob(stmt, scriptpubkey, scriptpubkey_len);
 	db_bind_amount_sat(stmt, &sat);
 	db_exec_prepared_v2(take(stmt));
 
diff --git a/wallet/wallet.h b/wallet/wallet.h
index 904cd2301..6dad806a9 100644
--- a/wallet/wallet.h
+++ b/wallet/wallet.h
@@ -1140,8 +1140,8 @@ struct outpoint *wallet_outpoint_for_scid(struct wallet *w, tal_t *ctx,
 
 void wallet_utxoset_add(struct wallet *w,
 			const struct bitcoin_outpoint *outpoint,
-			const u32 blockheight,
-			const u32 txindex, const u8 *scriptpubkey,
+			const u32 blockheight, const u32 txindex,
+                        const u8 *scriptpubkey, const size_t scriptpubkey_len,
 			struct amount_sat sat);
 
 /**

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that process_getfilteredblock_step1 has the same non-optimal script copying behaviour. In fact, every call to bitcoin_tx_output_get_script in the code is redundant. Take a look at calc_tx_fee, for example:

                oscript = bitcoin_tx_output_get_script(NULL, tx, i);
                scriptlen = tal_bytelen(oscript);
                tal_free(oscript);

                if (chainparams->is_elements && scriptlen == 0)
                        continue;

This should just be:

                if (chainparams->is_elements && !tx->wtx->outputs[i].script_len)
                        continue; /* Fee output, ignore */

(Ignoring that the code here ignores the fee, and should probably reconcile against it).

Processing blocks is rather slow at the moment, but one thing we can
do, is to prevent copying all output scripts, when really all we are
interested in are the couple of outputs that are P2WSH.

This builds the foundation of that by adding a method to introspect
the script without having to clone it first, saving us some
allocations, and deallocations.

Changelog-Changed: core: Processing blocks should now be faster
We were extracting the output script for all outputs, and discarding
them immediately again if they were not P2WSH outputs which are the
ones of interest to us. This patch move the extraction until after we
have determined it is useful, and so we should save a couple thousand
`tal()` and `tal_free()` calls.

Changelog-Changed: lightningd: Speed up blocksync by not parsing unused parts of the transactions
@cdecker cdecker force-pushed the 202402-chaintopo-speedup branch from 8f33846 to 0a18732 Compare February 20, 2024 17:12
@vincenzopalazzo vincenzopalazzo self-requested a review February 21, 2024 09:05
@cdecker cdecker marked this pull request as ready for review February 21, 2024 10:59
@cdecker cdecker merged commit d185b0f into master Feb 21, 2024
37 checks passed
@cdecker cdecker deleted the 202402-chaintopo-speedup branch February 21, 2024 10:59
@jgriffiths jgriffiths mentioned this pull request Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants