Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(authz-migration): add migration script #218

Merged
merged 14 commits into from
Jun 7, 2019

Conversation

rudyardrichter
Copy link
Contributor

Deployment changes

  • Add script to run for filling out records' authz fields based on the acl fields.

@rudyardrichter rudyardrichter force-pushed the chore/authz-migration branch 7 times, most recently from cfa673b to 2561aac Compare May 29, 2019 19:31
@rudyardrichter rudyardrichter force-pushed the chore/authz-migration branch from 2561aac to 63220ca Compare May 29, 2019 19:44
bin/migrate_acl_authz.py Show resolved Hide resolved
bin/migrate_acl_authz.py Outdated Show resolved Hide resolved
record.authz = []
continue
try:
record.authz = acl_converter.acl_to_authz(record)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could perhaps skip records that already have authz field filled out? I'm thinking of a situation where this script fails halfway through and want to continue where it left off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it should go through the whole thing anyways, that would let us reset the authz fields if necessary, and for the cases where it's already cached the resource from arborist the time to do the conversion is minimal, just some string manipulation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I am concerned about is that with the current implementation, it doesn't commit anything until everything is done, so if the job gets killed early then it has to go through the whole thing again. Maybe commit every N records or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, and made me think of load handling for this, since DCF has over a million records. But yes, I agree, we should commit every so often

bin/migrate_acl_authz.py Show resolved Hide resolved
logger.error("can't continue without database connection")
sys.exit(1)
with driver.session as session:
records = session.query(IndexRecord)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going to happen if there are almost 2 million records? DCF prod has like 1.7 million. do we need to stream them through differently in chunks or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait, maybe this is some fancy generator ? session.query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to document the current discussion: looking into options for performant updates on a large number of records, maybe using something like this

@rudyardrichter rudyardrichter force-pushed the chore/authz-migration branch 4 times, most recently from f9449eb to 06463a2 Compare June 3, 2019 21:30
@rudyardrichter rudyardrichter force-pushed the chore/authz-migration branch from 06463a2 to 8b6b744 Compare June 4, 2019 15:53
bin/migrate_acl_authz.py Show resolved Hide resolved
bin/migrate_acl_authz.py Show resolved Hide resolved
@rudyardrichter rudyardrichter force-pushed the chore/authz-migration branch from 4b81797 to 5b0e571 Compare June 5, 2019 19:27
@rudyardrichter rudyardrichter merged commit f7c623d into master Jun 7, 2019
@rudyardrichter rudyardrichter deleted the chore/authz-migration branch June 7, 2019 20:48
paulineribeyre pushed a commit that referenced this pull request Jun 10, 2019
* chore(authz-migration): add migration script

* chore(authz-migration): code review fixes

* chore(authz-migration): try windowed approach

* chore(authz-migration): remove duplicate slash

* chore(authz-migration): reorganize

* chore(authz-migration): add ?p

* chore(authz-migration): more logs and partial commits

* chore(authz-migration): add start-did logic

* chore(authz-migration): log pathological cases

* chore(authz-migration): handle empty acl correctly

* chore(authz-migration): refactor start-did

* chore(authz-migration): add log for start-did

* chore(authz-migration): fix logs
paulineribeyre pushed a commit that referenced this pull request Jun 10, 2019
* chore(authz-migration): add migration script

* chore(authz-migration): code review fixes

* chore(authz-migration): try windowed approach

* chore(authz-migration): remove duplicate slash

* chore(authz-migration): reorganize

* chore(authz-migration): add ?p

* chore(authz-migration): more logs and partial commits

* chore(authz-migration): add start-did logic

* chore(authz-migration): log pathological cases

* chore(authz-migration): handle empty acl correctly

* chore(authz-migration): refactor start-did

* chore(authz-migration): add log for start-did

* chore(authz-migration): fix logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants