Fixes: slow entity purging - added couple of indices #1417
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue: Purging of Entity takes too long when there are many rows in entities table. In our staging server there are 1.6M+ Entities, and purging query at max took 611 minutes to execute.
Solution: Create an index on "deletedAt" column in
entities
table, and create an index on "sourceId" in "entity_defs" table.Diagnosis: I tried running parts of CTE query individually but even the simplest query
DELETE FROM entities WHERE "deletedAT" IS NOT NULL
was taking more than 5 mins (I cancelled the execution).Since simple query was taking so much time, I hypothesized that creating an index on "deletedAt" column should help, so I tried creating an index but that was taking too much time. So we increased the database instance size from t3.small to t3.xlarge, created the index and down sized the database back to t3.small.
After creation of index, simple query was fast enough. But full CTE query was still taking too much time (cancelled the execution after 5min). I ran the parts of CTE query individually and they were all quick, came to conclusion that we needed to break the CTE into parts. But that conclusion was wrong. During the execution of individual parts I was getting
entity_defs_sourceid_foreign
key constraint violation when deleting data fromentity_def_sources
:So I purged the deleted rows from
entity_defs
table, then ran the above query again which was surely quick but deleted nothing; I overlooked that fact and came to wrong conclusion.Further experimentation showed that running simple query like
DELETE FROM entity_def_sources WHERE id = (113)
was taking ~10 seconds. Deleting by primary key shouldn't take that much so I thought there must be triggers or foreign validation happening. There are no triggers on the table but there'sentity_defs_sourceid_foreign
inentity_defs
table which had 1.6M+ rows, so I created an index onsourceId
column inentity_defs
table, executed simple delete fromentity_def_sources
and it was quick.After creation of second index, the complete CTE query was quick too.
What has been done to verify that this works as intended?
See above. Additionally CPU consumption in staging environment is not spiking anymore around 4AM UTC.
Why is this the best possible solution? Were any other approaches considered?
Creating two indices solve the problem. Breaking the CTE query might be slower and will have its own complexities like first locking the deleted entities rows so that concurrent undelete doesn't happen. We can certainly came back to this if the problem resurfaces.
How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Faster purge command and cron job.
Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.
None.
Before submitting this PR, please make sure you have:
make test
and confirmed all checks still pass OR confirm CircleCI build passesAdditional notes:
We should probably create an index on "deletedAt" in Submission table as well, in this PR or separately.