-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#710: Restore Entities, Purge Entities, ability to see deleted Entities #1349
#710: Restore Entities, Purge Entities, ability to see deleted Entities #1349
Conversation
9125d0e
to
0713461
Compare
lib/model/query/entities.js
Outdated
const PURGE_DAY_RANGE = config.has('default.taskSchedule.purge') | ||
? config.get('default.taskSchedule.purge') | ||
: 30; // Default is 30 days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move this to central place, this is present in three file at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, maybe it could be injected in the container around here
// Inject some constants
const PURGE_DAY_RANGE = config.has('default.taskSchedule.purge')
? config.get('default.taskSchedule.purge')
: 30; // Default is 30 days
container.constants = { PURGE_DAY_RANGE };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved it to a separate constant file instead of populating it in the container. I find it easier to access that way. Let me know if you think it is better to have them in the container, I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separate constant file looks good to me!
lib/resources/entities.js
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await Entities.createMany(dataset, partials, sourceId, userAgent);
Should catch UUID conflict error if it is not provided in the request body (means backend has generated it and there is a collision) and generate a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea! I'm not sure if it's necessary to implement or how exactly you'd test it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it again, probability of collision is so low that it doesn't worth the effort. I would blame the luck of that user who fall into this case 🙈
docs/api.yaml
Outdated
In the future, we will provide a way to restore deleted entities and purge | ||
deleted entities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove this sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for taking so long to review this! I've now looked through
- restoring a deleted entity
deletedAt
in the odata feed- the migrations including the change from having
uuid
on its own in an audit entry to nesting it underentity {}
- the audit log query for all entity-related events
- the purging task
- the purge query (see my comment about not necessarily purging from
entity_def_sources
)
The only thing I haven't really thought throughly about yet is checking for collisions with purged uuids.
It's looking really solid 👍
WHERE ${_trashedFilter(force, projectId, datasetName, entityUuid)} | ||
GROUP BY datasets."acteeId" | ||
HAVING count(*) > 0 | ||
), delete_entity_sources AS ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the source is a bulk source, deleting this for one purged entity doesn't seem right and also leads to FK violation.
My thoughts are either
- never delete an entity_def_source?
- only delete entity_def_source rows if the type is submission, but leave API sources (which could be singular or bulk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am leaning towards adding AND entity_def_sources.type = 'submission'
Here's a test where entities are uploaded in bulk but not all are deleted/purged:
const createBulkEntities = async (user, count, projectId, datasetName) => {
const entities = [];
const uuids = [];
for (let i = 0; i < count; i += 1) {
const _uuid = uuid();
entities.push({ uuid: _uuid, label: 'label' });
uuids.push(_uuid);
}
await user.post(`/v1/projects/${projectId}/datasets/${datasetName}/entities`)
.send({
entities,
source: { name: 'bulk.csv', size: count }
})
.expect(200);
return uuids;
};
it.only('should not purge bulk entity sources', testService(async (service, container) => {
const { Entities, oneFirst } = container;
const asAlice = await service.login('alice');
await asAlice.post('/v1/projects/1/forms?publish=true')
.set('Content-Type', 'application/xml')
.send(testData.forms.simpleEntity)
.expect(200);
const uuids = await createBulkEntities(asAlice, 2, 1, 'people');
let sourcesCount = await oneFirst(sql`select count(1) from entity_def_sources`);
sourcesCount.should.be.equal(1);
await deleteEntities(asAlice, uuids.slice(0, 1), 1, 'people');
await Entities.purge(true);
sourcesCount = await oneFirst(sql`select count(1) from entity_def_sources`);
sourcesCount.should.be.equal(1);
}));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I have just kept source for the entities created via bulk upload.
Maybe in future, we can remove the orphan sources where all the entities created by them are purged.
AND entity_defs."entityId" = entities.id | ||
AND entities."datasetId" = datasets.id | ||
AND ${_trashedFilter(force, projectId, datasetName, entityUuid)} | ||
), delete_submission_backlog AS ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this backlog is working correctly, this table will probably be empty and we don't have to delete anything here.
The only thing I'd want to see tested would be if you delete and purge while something is in the backlog, the worker that tries to force-process the backlog doesn't have a problem. But I don't think it should... I think it'll just fail to find the entity it's looking for and it'll quietly skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take this back... leave this bit in. I tried altering the test should purge submission backlog for entities
where this query didn't delete from the backlog and an update entity submission was left in the backlog. When the backlog was later processed await Entities.processBacklog(true);
it led to a submission processing error.
lib/model/query/entities.js
Outdated
// Validate UUID against purged entities | ||
(await _isPurgedUuid(maybeOneFirst, partial.uuid)) | ||
.ifDefined(() => { | ||
throw Problem.user.uniquenessViolation({ fields: 'uuid', values: partial.uuid }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should throw this instead
throw Problem.user.uniquenessViolation({ fields: 'uuid', values: partial.uuid }); | |
throw Problem.user.entityDeleted({ entityUuid: clientEntity.uuid }); |
But I also noticed that if you try to reuse a soft-deleted UUID, it also has Problem.user.uniquenessViolation
instead of Problem.user.entityDeleted
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have introduced entityUniquenessViolationWithDeleted
- please suggest a shorter name.
I think we should not return 404 during create entity where there is/was entity with the same UUID is soft-deleted or purged. So for create I am returning 409 and for update 404.
Let me know what do you think about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean about the 409 vs the 404. Also that other entityDeleted
problem doesn't have a very descriptive name; it's more like entityNotFoundDeleted
. It's also only used in one case where a submission tries to update something about an entity and the entity has been deleted.
I can't really think of a shorter name than entityUniquenessViolationWithDeleted
. entityConflictDeleted
? Nah, what you have, even though it is long, is clear.
lib/model/query/entities.js
Outdated
// Validate UUID with the purged entities | ||
const purgedUuids = await _getPurgedUuids(all, rawEntities.map(e => e.uuid)); | ||
if (purgedUuids.length > 0) { | ||
throw Problem.user.uniquenessViolation({ fields: 'uuid', values: purgedUuids.join(', ') }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this should say given that the existing message mentions a single entity uuid.
"The entity with UUID (uuid1, uuid2, uuid3,...) has been deleted."
But it might be better than what it currently returns, e.g. this response json:
{'code': 409.3,
'details': {'fields': 'uuid',
'values': '88ed7209-cb54-4b7e-98ed-e1cc26a93cc6, '
'd8b37943-9202-44ce-837b-8be85a34d373'},
'message': 'A resource already exists with uuid value(s) of '
'88ed7209-cb54-4b7e-98ed-e1cc26a93cc6, '
'd8b37943-9202-44ce-837b-8be85a34d373.'}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with entityUniquenessViolationWithDeleted
lib/model/query/entities.js
Outdated
AND ${_trashedFilter(force, projectId, datasetName, entityUuid)} | ||
), purge_audit AS ( | ||
INSERT INTO audits ("action", "acteeId", "loggedAt", "processed", "details") | ||
SELECT 'entities.purge', datasets."acteeId", clock_timestamp(), clock_timestamp(), jsonb_build_object('entitiesDeleted', COUNT(*), 'entityUuids', ARRAY_AGG(uuid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For submissions, the event is just submission.purge
even though it could relate to multiple submissions. We should probably keep this consistent with other entity events too and call it entity.purge
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
Could we do something else with the purged entity UUIDs instead of storing them in the
I think we could merge this PR (with the suggested changes above) with this mechanism, but come up with a different entity tombstone plan soon after (in this same release). Made an issue to track this: #1381 |
1e35617
to
bfae596
Compare
ad6e466
to
c9aab19
Compare
Part of getodk/central#710
What has been done to verify that this works as intended?
Written bunch of tests, ran it with update frontend as well.
Why is this the best possible solution? Were any other approaches considered?
I have saved UUIDs of the purged entities in the audits table with
entities.purge
row and created GIN index on the property - I have verified its usage by insert 100K+ records in audits table.Other option is to create a separate table to store all the UUID which we can do if we sense performance is not acceptable with the JSONB in audits table.
I was also thinking to just check for UUID in
entity.create
audits; if given UUID was ever created - but forentity.bulk.create
we don't log UUIDs.How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Now entity creation checks for the duplicate UUID in the purged history, we should verify that there is no regression.
Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.
Yes
Before submitting this PR, please make sure you have:
make test
and confirmed all checks still pass OR confirm CircleCI build passes