Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#710: Restore Entities, Purge Entities, ability to see deleted Entities #1349

Merged
merged 11 commits into from
Jan 30, 2025

Conversation

sadiqkhoja
Copy link
Contributor

@sadiqkhoja sadiqkhoja commented Dec 30, 2024

Part of getodk/central#710

What has been done to verify that this works as intended?

Written bunch of tests, ran it with update frontend as well.

Why is this the best possible solution? Were any other approaches considered?

I have saved UUIDs of the purged entities in the audits table with entities.purge row and created GIN index on the property - I have verified its usage by insert 100K+ records in audits table.

Other option is to create a separate table to store all the UUID which we can do if we sense performance is not acceptable with the JSONB in audits table.

I was also thinking to just check for UUID in entity.create audits; if given UUID was ever created - but for entity.bulk.create we don't log UUIDs.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Now entity creation checks for the duplicate UUID in the purged history, we should verify that there is no regression.

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Yes

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

@sadiqkhoja sadiqkhoja force-pushed the features/710-entity-restore-purge branch from 9125d0e to 0713461 Compare December 30, 2024 21:52
@sadiqkhoja sadiqkhoja mentioned this pull request Jan 2, 2025
2 tasks
lib/model/query/entities.js Outdated Show resolved Hide resolved
lib/model/query/entities.js Show resolved Hide resolved
lib/model/query/audits.js Show resolved Hide resolved
lib/model/query/audits.js Show resolved Hide resolved
@sadiqkhoja sadiqkhoja changed the title Features/710 entity restore purge #710: Restore Entities, Purge Entities, ability to see deleted Entities Jan 3, 2025
Comment on lines 899 to 901
const PURGE_DAY_RANGE = config.has('default.taskSchedule.purge')
? config.get('default.taskSchedule.purge')
: 30; // Default is 30 days
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move this to central place, this is present in three file at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, maybe it could be injected in the container around here

  // Inject some constants
  const PURGE_DAY_RANGE = config.has('default.taskSchedule.purge')
    ? config.get('default.taskSchedule.purge')
    : 30; // Default is 30 days
  container.constants = { PURGE_DAY_RANGE };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved it to a separate constant file instead of populating it in the container. I find it easier to access that way. Let me know if you think it is better to have them in the container, I will change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separate constant file looks good to me!

@sadiqkhoja sadiqkhoja marked this pull request as ready for review January 3, 2025 22:07
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await Entities.createMany(dataset, partials, sourceId, userAgent);

Should catch UUID conflict error if it is not provided in the request body (means backend has generated it and there is a collision) and generate a new one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea! I'm not sure if it's necessary to implement or how exactly you'd test it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it again, probability of collision is so low that it doesn't worth the effort. I would blame the luck of that user who fall into this case 🙈

docs/api.yaml Outdated
Comment on lines 9286 to 9287
In the future, we will provide a way to restore deleted entities and purge
deleted entities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this sentence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@ktuite ktuite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for taking so long to review this! I've now looked through

  • restoring a deleted entity
  • deletedAt in the odata feed
  • the migrations including the change from having uuid on its own in an audit entry to nesting it under entity {}
  • the audit log query for all entity-related events
  • the purging task
  • the purge query (see my comment about not necessarily purging from entity_def_sources)

The only thing I haven't really thought throughly about yet is checking for collisions with purged uuids.

It's looking really solid 👍

WHERE ${_trashedFilter(force, projectId, datasetName, entityUuid)}
GROUP BY datasets."acteeId"
HAVING count(*) > 0
), delete_entity_sources AS (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the source is a bulk source, deleting this for one purged entity doesn't seem right and also leads to FK violation.

My thoughts are either

  1. never delete an entity_def_source?
  2. only delete entity_def_source rows if the type is submission, but leave API sources (which could be singular or bulk)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am leaning towards adding AND entity_def_sources.type = 'submission'

Here's a test where entities are uploaded in bulk but not all are deleted/purged:

const createBulkEntities = async (user, count, projectId, datasetName) => {
  const entities = [];
  const uuids = [];
  for (let i = 0; i < count; i += 1) {
    const _uuid = uuid();
    entities.push({ uuid: _uuid, label: 'label' });
    uuids.push(_uuid);
  }
  await user.post(`/v1/projects/${projectId}/datasets/${datasetName}/entities`)
    .send({
      entities,
      source: { name: 'bulk.csv', size: count }
    })
    .expect(200);
  return uuids;
};
    it.only('should not purge bulk entity sources', testService(async (service, container) => {
      const { Entities, oneFirst } = container;
      const asAlice = await service.login('alice');

      await asAlice.post('/v1/projects/1/forms?publish=true')
        .set('Content-Type', 'application/xml')
        .send(testData.forms.simpleEntity)
        .expect(200);

      const uuids = await createBulkEntities(asAlice, 2, 1, 'people');

      let sourcesCount = await oneFirst(sql`select count(1) from entity_def_sources`);
      sourcesCount.should.be.equal(1);

      await deleteEntities(asAlice, uuids.slice(0, 1), 1, 'people');

      await Entities.purge(true);

      sourcesCount = await oneFirst(sql`select count(1) from entity_def_sources`);
      sourcesCount.should.be.equal(1);
    }));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I have just kept source for the entities created via bulk upload.

Maybe in future, we can remove the orphan sources where all the entities created by them are purged.

AND entity_defs."entityId" = entities.id
AND entities."datasetId" = datasets.id
AND ${_trashedFilter(force, projectId, datasetName, entityUuid)}
), delete_submission_backlog AS (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this backlog is working correctly, this table will probably be empty and we don't have to delete anything here.

The only thing I'd want to see tested would be if you delete and purge while something is in the backlog, the worker that tries to force-process the backlog doesn't have a problem. But I don't think it should... I think it'll just fail to find the entity it's looking for and it'll quietly skip it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take this back... leave this bit in. I tried altering the test should purge submission backlog for entities where this query didn't delete from the backlog and an update entity submission was left in the backlog. When the backlog was later processed await Entities.processBacklog(true); it led to a submission processing error.

// Validate UUID against purged entities
(await _isPurgedUuid(maybeOneFirst, partial.uuid))
.ifDefined(() => {
throw Problem.user.uniquenessViolation({ fields: 'uuid', values: partial.uuid });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should throw this instead

Suggested change
throw Problem.user.uniquenessViolation({ fields: 'uuid', values: partial.uuid });
throw Problem.user.entityDeleted({ entityUuid: clientEntity.uuid });

But I also noticed that if you try to reuse a soft-deleted UUID, it also has Problem.user.uniquenessViolation instead of Problem.user.entityDeleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced entityUniquenessViolationWithDeleted - please suggest a shorter name.

I think we should not return 404 during create entity where there is/was entity with the same UUID is soft-deleted or purged. So for create I am returning 409 and for update 404.

Let me know what do you think about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean about the 409 vs the 404. Also that other entityDeleted problem doesn't have a very descriptive name; it's more like entityNotFoundDeleted. It's also only used in one case where a submission tries to update something about an entity and the entity has been deleted.

I can't really think of a shorter name than entityUniquenessViolationWithDeleted. entityConflictDeleted? Nah, what you have, even though it is long, is clear.

// Validate UUID with the purged entities
const purgedUuids = await _getPurgedUuids(all, rawEntities.map(e => e.uuid));
if (purgedUuids.length > 0) {
throw Problem.user.uniquenessViolation({ fields: 'uuid', values: purgedUuids.join(', ') });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this should say given that the existing message mentions a single entity uuid.

"The entity with UUID (uuid1, uuid2, uuid3,...) has been deleted."

But it might be better than what it currently returns, e.g. this response json:

{'code': 409.3,
 'details': {'fields': 'uuid',
             'values': '88ed7209-cb54-4b7e-98ed-e1cc26a93cc6, '
                       'd8b37943-9202-44ce-837b-8be85a34d373'},
 'message': 'A resource already exists with uuid value(s) of '
            '88ed7209-cb54-4b7e-98ed-e1cc26a93cc6, '
            'd8b37943-9202-44ce-837b-8be85a34d373.'}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with entityUniquenessViolationWithDeleted

AND ${_trashedFilter(force, projectId, datasetName, entityUuid)}
), purge_audit AS (
INSERT INTO audits ("action", "acteeId", "loggedAt", "processed", "details")
SELECT 'entities.purge', datasets."acteeId", clock_timestamp(), clock_timestamp(), jsonb_build_object('entitiesDeleted', COUNT(*), 'entityUuids', ARRAY_AGG(uuid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For submissions, the event is just submission.purge even though it could relate to multiple submissions. We should probably keep this consistent with other entity events too and call it entity.purge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

@ktuite
Copy link
Member

ktuite commented Jan 24, 2025

Could we do something else with the purged entity UUIDs instead of storing them in the audits table? If you purge a bunch the details could be really long. It also feels too fragile for the use cases we want to build on top of rejecting re-used uuids and the integrity url.

     action     |      acteeId       |                          details
----------------+--------------------+------------------------------------------------------------------
 entities.purge | (dataset actee id) | {"entityUuids": ["uuid1", "uuid2", "uuid3"], "entitiesDeleted": 3}

I think we could merge this PR (with the suggested changes above) with this mechanism, but come up with a different entity tombstone plan soon after (in this same release). Made an issue to track this: #1381

@sadiqkhoja sadiqkhoja force-pushed the features/710-entity-restore-purge branch from 1e35617 to bfae596 Compare January 28, 2025 23:05
@sadiqkhoja sadiqkhoja force-pushed the features/710-entity-restore-purge branch from ad6e466 to c9aab19 Compare January 28, 2025 23:56
@sadiqkhoja sadiqkhoja requested a review from ktuite January 29, 2025 00:02
@sadiqkhoja sadiqkhoja merged commit 37b85b5 into getodk:master Jan 30, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants