-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHA-256 and SHA-512 support #5035
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look. I'd like to think about this some more.
The fixity algorithm used on existing files can be changed by a superuser using the API. The API call will only update the algorithm and checksum for a file if the existing checksum can be validated against the file. | ||
Statistics concerning the updates are returned in the response to the API call with details in the log. | ||
|
||
``curl -X PUT -d domainstats http://localhost:8080/api/admin/updateHashValues/{alg}`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "domainstats" a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config section may not be the best place to document "admin" commands. I'm wondering if it should go in a future version of one of these places:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're in here, we should add a curl
example for :FileFixityChecksumAlgorithm
.
try { | ||
User u = findAuthenticatedUserOrDie(); | ||
if (!u.isSuperuser()) | ||
return error(Status.UNAUTHORIZED, "must be superuser"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad to see this is superuser-only for now.
alreadyUpdated + " of " + count + " files already had hashes with the new algorithm. " + new Date()); | ||
logger.info(rehashed + " of " + count + " files to rehash. " + new Date()); | ||
logger.info( | ||
successes + " of " + rehashed + " files successfully rehashed with the new algorithm. " + new Date()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of logging at the "info" level. Should it be reduced to "fine"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was going to suggest that we keep the more verbose logging. This is a very special operation; nobody's going to be changing the checksum algorithms on their files every day. And it's something important enough, so could be a good thing to have some "paper trail" of this switch having been performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you guys. As is now, the summary is info level and there are warnings for any file that doesn't match (or can't be read, etc.). The original hash is deleted in this operation though, so if keeping the original one documented is trail-worthy, a per-file info message could be added with the original algorithm and hash and file id (storageidentifier?).
@pdurbin - I made some updates to address your comments. In general, feel free to change documentation or logging (or anything) - I'll watch for issues but you guys should have edit access and if its faster to make changes than to iterate with me, just go for it. |
@qqmyers how do you feel about me reformatting the code with Netbeans? |
@pdurbin - should have added formatting in the list above - feel free. (For the Admin file, I was going to swap tab for 4 spaces and I'm trying to stick with spaces overall now, but it looked like the rest of the file used tabs, outside my changes on this branch (possibly me from some earlier merge).) |
@qqmyers ah, I see that @michbarsinai was in that file recently and put tabs everywhere. Please nevermind about the formatting. At some point, maybe in this pull request, we'll reformat the whole file. You're right, generally we try to stick to spaces rather than tabs as explained at http://guides.dataverse.org/en/4.9.2/developers/coding-style.html#formatting-code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. My main concern was to make sure the change is not allowed if the file no longer matches the old, existing checksum - and it really looks like this is what it does.
Conflicts: src/main/java/edu/harvard/iq/dataverse/api/Admin.java
New Contributors
Welcome! New contributors should at least glance at CONTRIBUTING.md, especially the section on pull requests where we encourage you to reach out to other developers before you start coding. Also, please note that we measure code coverage and prefer you write unit tests. Pull requests can still be reviewed without tests or completion of the checklist outlined below. Thanks!
Related Issues
Pull Request Checklist