-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(authz-migration): add migration script #218
Conversation
cfa673b
to
2561aac
Compare
2561aac
to
63220ca
Compare
bin/migrate_acl_authz.py
Outdated
record.authz = [] | ||
continue | ||
try: | ||
record.authz = acl_converter.acl_to_authz(record) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could perhaps skip records that already have authz field filled out? I'm thinking of a situation where this script fails halfway through and want to continue where it left off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking it should go through the whole thing anyways, that would let us reset the authz fields if necessary, and for the cases where it's already cached the resource from arborist the time to do the conversion is minimal, just some string manipulation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I am concerned about is that with the current implementation, it doesn't commit anything until everything is done, so if the job gets killed early then it has to go through the whole thing again. Maybe commit every N records or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, and made me think of load handling for this, since DCF has over a million records. But yes, I agree, we should commit every so often
bin/migrate_acl_authz.py
Outdated
logger.error("can't continue without database connection") | ||
sys.exit(1) | ||
with driver.session as session: | ||
records = session.query(IndexRecord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's going to happen if there are almost 2 million records? DCF prod has like 1.7 million. do we need to stream them through differently in chunks or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait, maybe this is some fancy generator ? session.query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to document the current discussion: looking into options for performant updates on a large number of records, maybe using something like this
f9449eb
to
06463a2
Compare
06463a2
to
8b6b744
Compare
4b81797
to
5b0e571
Compare
* chore(authz-migration): add migration script * chore(authz-migration): code review fixes * chore(authz-migration): try windowed approach * chore(authz-migration): remove duplicate slash * chore(authz-migration): reorganize * chore(authz-migration): add ?p * chore(authz-migration): more logs and partial commits * chore(authz-migration): add start-did logic * chore(authz-migration): log pathological cases * chore(authz-migration): handle empty acl correctly * chore(authz-migration): refactor start-did * chore(authz-migration): add log for start-did * chore(authz-migration): fix logs
* chore(authz-migration): add migration script * chore(authz-migration): code review fixes * chore(authz-migration): try windowed approach * chore(authz-migration): remove duplicate slash * chore(authz-migration): reorganize * chore(authz-migration): add ?p * chore(authz-migration): more logs and partial commits * chore(authz-migration): add start-did logic * chore(authz-migration): log pathological cases * chore(authz-migration): handle empty acl correctly * chore(authz-migration): refactor start-did * chore(authz-migration): add log for start-did * chore(authz-migration): fix logs
Deployment changes
authz
fields based on theacl
fields.