Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: use epochs to gc eth tx hashes from chain indexer #12516

Conversation

akaladarshi
Copy link
Contributor

@akaladarshi akaladarshi commented Sep 26, 2024

Fixes issue: #12465, it changes:

  • Update the remove eth_tx_hash query to delete before epoch instead of days
  • refactor chain indexer gc function.

@aarshkshah1992
Copy link
Contributor

@akaladarshi Will review this on Monday. Thanks for raising this !

@BigLep BigLep requested a review from aarshkshah1992 October 1, 2024 15:40
@aarshkshah1992
Copy link
Contributor

aarshkshah1992 commented Oct 2, 2024

@akaladarshi Please can you raise this PR against #12521 and change the GC test we already have at gc_test.go ?

@akaladarshi akaladarshi changed the base branch from feat/implement-index-validation-api to feat/tests-for-the-chainindexer October 2, 2024 07:43
@akaladarshi akaladarshi force-pushed the akaladarshi/refactor-tx-hash-gc branch from d57f643 to 3495c30 Compare October 2, 2024 12:25
@akaladarshi akaladarshi force-pushed the akaladarshi/refactor-tx-hash-gc branch from 3495c30 to 6587808 Compare October 2, 2024 13:06

si.gc(ctx)

// tipset at height 1 and 10 should be removed
// tipset at height 1 data should be removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont understand this comment. what's "height 1 data" ?

err = si.stmts.getNonRevertedTipsetEventEntriesCountStmt.QueryRow(tsKeyCid2.Bytes()).Scan(&count)
require.NoError(t, err)
require.Equal(t, 0, count)

// tipset at height 50 should not be removed
err = si.db.QueryRow("SELECT COUNT(*) FROM tipset_message WHERE height = 50").Scan(&count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lot of duplication here. Can we wrap this and the two sql statements below into a function and re-use it everywhere ?

return
}
currHeadTime := time.Unix(int64(head.MinTimestamp()), 0)
retentionDuration := time.Duration(si.gcRetentionEpochs*builtin.EpochDurationSeconds) * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining this calculation here ?

Why do we need time.Duration(si.gcRetentionEpochs*builtin.EpochDurationSeconds) * time.Second ?

@aarshkshah1992
Copy link
Contributor

@akaladarshi Some quick comments.

@akaladarshi akaladarshi force-pushed the akaladarshi/refactor-tx-hash-gc branch from 4f59281 to da05e17 Compare October 2, 2024 16:26
return
}
// Calculate the retention duration based on the number of epochs to retain.
// retentionDuration represents the total duration (in nano seconds) for which data should be retained before considering it for garbage collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is it in nanoseconds ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be seconds 👍🏾 (I will update it)
so what is happening here is we are converting EpochDurationSeconds(30) to its time duration by multiplying with time.seconds.

Copy link
Contributor Author

@akaladarshi akaladarshi Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 So time.Duration takes data in nanoseconds (that's why I added nano seconds)
so if I just multiply time.Duration(rententionEpoch * EpochsDurationSeconds), it will give 600 (rententionEpoch = 20, EpochsDurationSeconds=30) but that will be in nanoseconds, to get proper time in seconds we have to multiply with time.Seconds

totalRetentionDuration := retentionDuration + graceDuration
currHeadTime := time.Unix(int64(head.MinTimestamp()), 0)
// gcTime is the time that is (gcRetentionEpochs + graceEpochs) in nano seconds before currHeadTime
gcTime := currHeadTime.Add(-totalRetentionDuration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is less than or equal to 0, return without doing anything. Also please can we had a test for when gcTime <= 0 ?


log.Infof("gc'ing eth hashes older than %d days", gcRetentionDays)
res, err = si.stmts.removeEthHashesOlderThanStmt.ExecContext(ctx, "-"+strconv.Itoa(int(gcRetentionDays))+" day")
res, err = si.stmts.removeEthHashesBeforeTimeStmt.ExecContext(ctx, gcTime.Unix())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to do .Unix() here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to provide as accurate time as possible, so I choose unix.

@aarshkshah1992
Copy link
Contributor

Thanks !

@aarshkshah1992 aarshkshah1992 merged commit f2bff6f into filecoin-project:feat/tests-for-the-chainindexer Oct 4, 2024
73 of 75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging this pull request may close these issues.

2 participants