txview: run status and age checks on incoming transactions #4506

apfitzge · 2025-01-16T20:51:16Z

Problem

view transaction parsing option in banking stage does not run status/age checks before inserting transactions into the buffer
this means that aged out, high priority, transactions may push out lower-priority but still valid transactions

Summary of Changes

Insert all valid transactions into the map on receive (map only, not priority queue)
Check ages in batches (batch size is equal to the extra capacity we reserved in the map)
Insert only valid transaction's id into priority queue, drop lowest priority if at capacity

Fixes #

apfitzge · 2025-01-16T20:53:01Z

core/src/banking_stage/transaction_scheduler/receive_and_buffer.rs

+        let mut run_status_age_checks =
+            |container: &mut TransactionViewStateContainer,
+             transaction_ids: &mut ArrayVec<usize, 64>| {
+                // Temporary scope so that transaction references are immediately


complexity here can go away once we have Bytes backed transactions coming from upstream, since we do not need to do the weird "insert to map only" pattern.

apfitzge · 2025-01-16T21:22:28Z

core/src/banking_stage/transaction_scheduler/transaction_state_container.rs

-                vacant_entry.insert(state);
-
-                // Push the transaction into the queue.
-                self.inner


No longer push into the queue here. We just let up to 64 (see EXTRA_CAPACITY const) additional transactions to live in the map. Once we reach end of incoming tx stream or hit 64 packets we run age checks and only THEN do we insert into the priority queue.
This means that txs that are already processed or too old will not push out transactions that can be processed.

I think fn remaining_capacity needs to be reworked because with this change we will often have 0 remaining capacity and always be popping from the priority queue even if the priority queue length was actually less than its capacity.

If the priority capacity was 100 and we only had 36 elements in the transaction slab, then a new batch of 64 transactions would get added to the slab and remaining_capacity would return 0 when inserting into the priority queue. This means we would always pop an element and end up with only 36 elements in the queue when we should have 100.

If I'm reading this change correctly, we used to get the remaining capacity before inserting but no longer do that, causing this new issue to emerge.

Yeah 100% right; I need to add some tests for this and fix the issue with remaining capacity.

I don't think we'd end up with 36 packets though? We'd end up with 99 (obviously still not ideal).

Initial capacity = 100. Extra capacity = 64 => Slab capacity = 164

36 packets in queue/slab. => 64 remanining_capacity at start

receive 64 packets, all entered into the slab. Slab length now 100.

we run checks

when we go to insert to queue in a loop. First tx insert, remaining_capacity = 0. So we drop a packet. Now slab length is 99.

rest of the loop the slab length will be 99, and remaining packets will be inserted.

but we've also possibly dropped the wrong packet here. everything already in queue and first tx in in received batch could be high priority, with remaining 63 being low priority.

I think we could solve w/ the following:

store desired capacity separately (or derive by - EXTRA_CAPACITY)

insert all into slab

insert all into queue

pop min from queue until desired size

that way we are always popping the lowest known item(s) instead of strictly keeping at capacity. because even if fixing the off-by-one issue, we'd still potentially be dropping non-optimally for the received batch.

I don't think we'd end up with 36 packets though? We'd end up with 99 (obviously still not ideal).

Yeah this looks correct for the example I gave, my mistake. I should have given this example:

If the priority capacity was 100 and we had 68 elements in the transaction slab, after inserting a new batch of 64 pending transactions into the slab, the remaining_capacity would return 0 when inserting into the priority queue. This is an issue because we actually have capacity to add 32 new transactions to the priority queue. But for the first 32 pending transactions that we try to insert into the priority queue, we will pop the min value of (next-pending-tx, set of priority queue txs). But that's not great because it could be that the first 32 transactions in the batch of 64 pending transactions are all high priority and we would end up dropping 31 of them.

That said, I think your proposed solution makes sense. We could have a new method on TransactionStateContainer for pushing a batch of transactions into the priority queue and then drain from the container until remaining_capacity is within the max capacity.

apfitzge · 2025-01-16T23:12:27Z

Follow-up to #3820

jstarry

Can you add some tests as well?

jstarry · 2025-01-20T13:49:59Z

core/src/banking_stage/transaction_scheduler/receive_and_buffer.rs

            }
        }

+        // Any remaining packets undergo status/age checks
+        run_status_age_checks(container, &mut transaction_ids);


Let's check if transactions_ids is empty first

jstarry · 2025-01-20T13:51:17Z

core/src/banking_stage/transaction_scheduler/receive_and_buffer.rs

+                    }));
+                    working_bank.check_transactions::<RuntimeTransaction<_>>(
+                        &transactions,
+                        &lock_results,


It's currently fine to call check_transactions with lock_results that are longer than the transactions slice but I think it would be better to pass a slice that is exactly the same length as transactions. What do you think?

core/src/banking_stage/transaction_scheduler/receive_and_buffer.rs

jstarry · 2025-01-20T14:30:53Z

core/src/banking_stage/transaction_scheduler/transaction_state_container.rs

-                vacant_entry.insert(state);
-
-                // Push the transaction into the queue.
-                self.inner


I think fn remaining_capacity needs to be reworked because with this change we will often have 0 remaining capacity and always be popping from the priority queue even if the priority queue length was actually less than its capacity.

If the priority capacity was 100 and we only had 36 elements in the transaction slab, then a new batch of 64 transactions would get added to the slab and remaining_capacity would return 0 when inserting into the priority queue. This means we would always pop an element and end up with only 36 elements in the queue when we should have 100.

If I'm reading this change correctly, we used to get the remaining capacity before inserting but no longer do that, causing this new issue to emerge.

jstarry · 2025-02-11T15:04:11Z

core/src/banking_stage/transaction_scheduler/transaction_state_container.rs

+        for id in priority_ids {
+            self.priority_queue.push(id);
+        }
+        let num_dropped = self.priority_queue.len().saturating_sub(self.capacity);


I think we also want to enforce that the slab (id_to_transaction_state) length is also not more than self.capacity after pushing ids into the queue right? If we use priority_queue length, we might actually already have some transactions that are in the slab but aren't currently in the priority queue because they're scheduled. So this could mean that after pushing 64 txs into the slab and all checks pass, we could have at least 64 transactions scheduled and so there is space for all of those transactions in the priority queue. So ReceiveAndBuffer would run another pass and could fill an extra 64 txs into the slab again.

Yeah you're right about the scheduled txs not in queue - we should be using the slab's len here because it should always be >= to priority_queue len. i.e. we never have something in queue that is not in slab.

788f144

Added test to test this as well, with all txs in queue popped but not removed from map.
Realistically that will not happen due to our CU throttling, we don't have everything scheduled.

jstarry · 2025-02-11T15:05:45Z

core/src/banking_stage/transaction_scheduler/transaction_state_container.rs

+    /// To avoid allocating, the caller should not push more than
+    /// [`EXTRA_CAPACITY`] ids in a call.
+    /// Returns the number of dropped transactions.


Is it enough to enforce that you cannot push any ids that aren't already in the tx container slab? As long as the tx container never goes over self.capacity + EXTRA_CAPACITY we can never get into a situation where we over allocate here I think?

We shouldn't be pushing anything to priority queue that's not in the map already.

Cool, then we don't really need this comment right? We can push however many ids that we want (even more than EXTRA_CAPACITY ids) in a call because we know that the resulting length will never be more than the tx container map size.

I was just concerned that if EXTRA_CAPACITY was truly a hard limit, we're not enforcing it anywhere and if batch size was ever higher than EXTRA_CAPACITY, we could start pushing more than EXTRA_CAPACITY into the priority queue when a full batch of transactions fails and needs to be requeued.

apfitzge · 2025-02-11T22:23:32Z

core/src/banking_stage/transaction_scheduler/receive_and_buffer.rs

+                    let transaction = &container
+                        .get_transaction_ttl(priority_id.id)
+                        .expect("transaction must exist")
+                        .transaction;
+                    *result = Consumer::check_fee_payer_unlocked(
+                        &working_bank,
+                        transaction,
+                        &mut error_counters,
+                    );
+                    if result.is_err() {
+                        num_dropped_on_status_age_checks += 1;
+                        container.remove_by_id(priority_id.id);
+                    }
+                }
+                // Push non-errored transaction into queue.


@jstarry, functional change here to apply similar change as #4865.
In commit: c5fc7f6

Great, looks like the error type is mismatched FYI (CI failure). And we should update the name of num_dropped_on_status_age_checks or add a separate counter for txs dropped due to invalid fee payer

yeah it should be renamed, keeping it in the same one for now since that's what the other PR does. Will separate the metric in a separate PR, changing for both ingest paths.

Great, looks like the error type is mismatched FYI (CI failure). And we should update the name of num_dropped_on_status_age_checks or add a separate counter for txs dropped due to invalid fee payer

ugh yeah - pushed to CI for testing since the rocksdb update was causing a bunch of issues in my dev environment yesterday w/ not finding libclang. Can actually build now. Should have just been patient and waited, sorry for the waste of time reviewing that!

jstarry · 2025-02-12T04:33:42Z

Implementation looks correct now, will approve after last few things are cleaned up

apfitzge self-assigned this Jan 16, 2025

apfitzge commented Jan 16, 2025

View reviewed changes

apfitzge marked this pull request as ready for review January 16, 2025 23:11

apfitzge requested a review from jstarry January 16, 2025 23:12

jstarry reviewed Jan 20, 2025

View reviewed changes

apfitzge force-pushed the view_receiving_age_checks branch 2 times, most recently from 1f4a99d to 0d74aa4 Compare February 6, 2025 23:18

apfitzge requested a review from jstarry February 10, 2025 22:32

jstarry reviewed Feb 11, 2025

View reviewed changes

apfitzge added 8 commits February 11, 2025 16:19

run status and age checks on incoming transactions

8b3db55

lock_results slice length limit

0245ada

check_and_push_to_queue

1a35ec5

try_insert_map_only_with_data

c90bebc

push ids

bf9107b

test_view_push_ids_to_queue

c8c0f1c

fix test, remove remaining_capacity

7f2addf

bound by map len, add test

9094202

apfitzge force-pushed the view_receiving_age_checks branch from 788f144 to c5fc7f6 Compare February 11, 2025 22:22

apfitzge commented Feb 11, 2025

View reviewed changes

earlier fee check

95bb4fd

apfitzge force-pushed the view_receiving_age_checks branch from c5fc7f6 to 95bb4fd Compare February 12, 2025 15:36

jstarry approved these changes Feb 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

txview: run status and age checks on incoming transactions #4506

txview: run status and age checks on incoming transactions #4506

apfitzge commented Jan 16, 2025

apfitzge Jan 16, 2025

apfitzge Jan 16, 2025

jstarry Jan 20, 2025

apfitzge Jan 31, 2025

apfitzge Jan 31, 2025

apfitzge Jan 31, 2025

jstarry Feb 5, 2025

apfitzge commented Jan 16, 2025

jstarry left a comment

jstarry Jan 20, 2025

jstarry Jan 20, 2025

jstarry Jan 20, 2025

jstarry Feb 11, 2025

apfitzge Feb 11, 2025

apfitzge Feb 11, 2025

jstarry Feb 11, 2025

apfitzge Feb 11, 2025

jstarry Feb 12, 2025

apfitzge Feb 11, 2025

jstarry Feb 12, 2025

apfitzge Feb 12, 2025

apfitzge Feb 12, 2025

jstarry commented Feb 12, 2025

txview: run status and age checks on incoming transactions #4506

Are you sure you want to change the base?

txview: run status and age checks on incoming transactions #4506

Conversation

apfitzge commented Jan 16, 2025

Problem

Summary of Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apfitzge commented Jan 16, 2025

jstarry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jstarry commented Feb 12, 2025