Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docs) Add docs for deletions subsystem #28915

Merged
merged 1 commit into from
Sep 29, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 67 additions & 10 deletions src/sentry/deletions/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,64 @@
"""
The deletions subsystem managers bulk deletes as well as cascades. It attempts
to optimize around various patterns while using a standard approach to do so.
The deletions subsystem manages asynchronous scheduled bulk deletes as well as cascading deletes
into relations. When adding new models to the application you should consider how those records will
be deleted when a project or organization are deleted.

The deletion subsystem uses records in postgres to track deletions and their status. This also
allows deletions to be retried when a deploy interrupts a deletion task, or a deletion job fails
because of a new relation or database failure.

Celery Tasks
------------

Every 15 minutes `sentry.tasks.deletion.run_scheduled_deletion()` runs. This task queries for jobs
that were scheduled to be run in the past that are not already in progress. Tasks are spawned for
each deletion that needs to be processed.

If tasks fail, the daily run of `sentry.tasks.deletion.reattempt_deletions()` will
clear the `in_progress` flag of old jobs so that they are picked up by the next scheduled run.

Scheduling Deletions
--------------------

The entrypoint into deletions for the majority of application code is via the ``ScheduledDeletion``
model. This model lets you creation deletion jobs that are run in the future.

>>> from sentry.models import ScheduledDeletion
>>> ScheduledDeletion.schedule(organization, days=1, hours=2)

The above would schedule an organization to be deleted in 1 day and 2 hours.

Deletion Tasks
--------------

The deletions system provides two base classes to cover common scenarios:

- ``ModelDeletionTask`` fetches records and deletes each instance individually. This strategy is
good for models that rely on django signals or have child relations. This strategy is also the
default used when a deletion task isn't specified for a model.
- ``BulkModelDeletionTask`` Deletes records in bulk using a single query. This strategy is well
suited to removing records that don't have any relations.

If your model has child relations that need to be cleaned up you should implement a custom
deletion task. Doing so requires a few steps:

1. Add your deletion task subclass to `sentry.deletions.defaults`
2. Add your deletion task to the default manager mapping in `sentry.deletions.__init__`.

Undoing Deletions
-----------------

If you have scheduled a record for deletion and want to be able to cancel that deletion, your
deletion task needs to implement the `should_proceed` hook.

>>> def should_proceed(self, instance):
>>> return instance.status in {ObjectStatus.PENDING_DELETION, ObjectStatus. DELETION_IN_PROGRESS}

The above would only proceed with the deletion if the record's status was correct. When a deletion
is cancelled by this hook, the `ScheduledDeletion` row will be removed.

Using Deletions Manager Directly
--------------------------------

For example, let's say you want to delete an organization:

Expand All @@ -10,15 +68,14 @@
>>> while work:
>>> work = task.chunk()

The system has a default task implementation to handle Organization which will
efficiently cascade deletes. This behavior varies based on the input object,
as the task can override the behavior for it's children.
The system has a default task implementation to handle Organization which will efficiently cascade
deletes. This behavior varies based on the input object, as the task can override the behavior for
it's children.

For example, when you delete a Group, it will cascade in a more traditional
manner. It will batch each child (such as Event). However, when you delete a
project, it won't actually cascade to the registered Group task. It will instead
take a more efficient approach of batch deleting its indirect descendants, such
as Event, so it can more efficiently bulk delete rows.
For example, when you delete a Group, it will cascade in a more traditional manner. It will batch
each child (such as Event). However, when you delete a project, it won't actually cascade to the
registered Group task. It will instead take a more efficient approach of batch deleting its indirect
descendants, such as Event, so it can more efficiently bulk delete rows.
"""


Expand Down