Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catch failures with tiff file thumbnail generation #10509

Merged
merged 8 commits into from
Apr 30, 2024

Conversation

stevenwinship
Copy link
Contributor

@stevenwinship stevenwinship commented Apr 19, 2024

Handles failures with TIFF file thumbnail failures by preventing broken images as thumbnails or previews

Which issue(s) this PR closes: #10352

Special notes for your reviewer:

Suggestions on how to test this: Use some of the tiff files in question from [
](IQSS/dataverse.harvard.edu#250)

When testing with existing bad tiff files you will see the broken images first since the ui gets the dataset(s) first and doesn't know the images failed. After loading the dataset the db is updated so each time the dataset is loaded going forward the images should look correct (ie default thumbnails with no preview).

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Before:
Screenshot 2024-04-19 at 10 08 43 AM
Screenshot 2024-04-19 at 10 09 36 AM

After:
Screenshot 2024-04-19 at 10 16 55 AM

Is there a release notes update needed for this change?: No

@stevenwinship stevenwinship self-assigned this Apr 19, 2024
@stevenwinship stevenwinship added Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Type: Bug a defect labels Apr 19, 2024
@coveralls
Copy link

coveralls commented Apr 19, 2024

Coverage Status

coverage: 20.726% (+0.009%) from 20.717%
when pulling 66f56df on graceful-failure-mode-for-TIFF-images-failure
into 447d576 on develop.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look reasonable - finding places where the failure wasn't getting flagged and making sure the 0 length file is deleted if there's a failure. Also some nice cleanup.

The one add that would be nice is to get an IT test with a tiff file known to fail - is that possible?

Also - I see that jenkins is failing in a new way - with a file not found during the build. Doesn't seem related to this PR.

This comment has been minimized.

1 similar comment

This comment has been minimized.

@stevenwinship stevenwinship requested a review from qqmyers April 19, 2024 19:19
@stevenwinship stevenwinship removed their assignment Apr 19, 2024
uploadLogoResponse = UtilIT.uploadDatasetLogo(datasetPersistentId, goodTiff, apiToken);
uploadLogoResponse.prettyPrint();
uploadLogoResponse.then().assertThat().statusCode(FORBIDDEN.getStatusCode());
uploadLogoResponse.then().assertThat().body("message", equalTo("File is larger than maximum size: 500000."));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: Including the size makes the test dependent on the default max size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test

@landreev landreev self-assigned this Apr 19, 2024

This comment has been minimized.

@qqmyers
Copy link
Member

qqmyers commented Apr 20, 2024

@stevenwinship - Heads up that your change causes a failure in a SearchIT test which is checking for the exact string (looks like you dropped a period in going to the bundled string).

@pdurbin
Copy link
Member

pdurbin commented Apr 22, 2024

Does this PR also fix the following issue? Should we put it through QA as well?

@stevenwinship
Copy link
Contributor Author

Does this PR also fix the following issue? Should we put it through QA as well?

The files will now show up with a default image but the tiff files will not generate an actual thumbnail image. I agree that we should switch to another image conversion library, but I don't think it's part of this bug. I'll create another issue to address this.

@stevenwinship
Copy link
Contributor Author

Created new feature issue for enhancing the code
#10515

This comment has been minimized.

@landreev
Copy link
Contributor

but I don't think it's part of this bug. I'll create another issue to address this.

I second this. Expanding the image format coverage is out of scope and should be a separate issue.
I'm not sure we even need a new dev. issue for that. There was already this: #10351, the second issue opened based on IQSS/dataverse.harvard.edu#250. And there must be multiple other issues, like the #10317 mentioned, where people are complaining about TIFFs.

@stevenwinship
Copy link
Contributor Author

@stevenwinship - Heads up that your change causes a failure in a SearchIT test which is checking for the exact string (looks like you dropped a period in going to the bundled string).

Fixed

@landreev
Copy link
Contributor

The new code is working as advertised for new uploads (the 0-size .thumb64 files are still there; but previewimagefail is properly set to true, and the default icons are displayed on the pages going forward).

Is there an easy-enough way to fix this as an existing condition as well? I.e., I have an existing dataset with problematic TIFFs uploaded under a "before" version of Dataverse; with thumbnails failing to generate, but previewimageavailable already set to true, and previewimagefail to false; the page appears to stay broken for these.

@qqmyers
Copy link
Member

qqmyers commented Apr 22, 2024

Hmm - I thought the code would delete the 0 size file. In any case, https://guides.dataverse.org/en/latest/api/native-api.html#reset-thumbnail-failure-flags are API calls to remove the fail flags. Not sure if that's enough with the changes in this PR to get the thumbs completely reprocessed or not.

@landreev
Copy link
Contributor

Hmm - I thought the code would delete the 0 size file. In any case, https://guides.dataverse.org/en/latest/api/native-api.html#reset-thumbnail-failure-flags are API calls to remove the fail flags. Not sure if that's enough with the changes in this PR to get the thumbs completely reprocessed or not.

I may be missing something, but the api above resets the failure flag; it's the other way around with a preexisting case - the failure flag is false, the previewImageAvailable flag is true. Plus zero-size thumbnail files on disk.

In my experiments the workaround required BOTH setting previewimageavailable=FALSE AND deleting the zero-size caches, then reloading the page for the fixes in the PR to take effect. But that also makes me think that perhaps all that's needed is a zero-size check added to the method that checks if a cached copy exists (and then setting previewimagefail=TRUE if it is zero).

@landreev landreev assigned stevenwinship and unassigned landreev Apr 23, 2024

This comment has been minimized.

@stevenwinship stevenwinship force-pushed the graceful-failure-mode-for-TIFF-images-failure branch from 66f56df to 4c78209 Compare April 26, 2024 21:08

This comment has been minimized.

@landreev
Copy link
Contributor

Appears to be working as advertised - a pre-existing "bad" thumbnail still looking broken on the first page load, fixed going forward.
Will test just a bit more, but so far it's looking ready to be merged.

@landreev
Copy link
Contributor

@stevenwinship OK, I have one request:
As of now, everything is working as is should for "bad" TIFFs, but the 0-size caches are kept in storage. (For pre-existing cases, this is true for multiple sizes; for new uploads, it's only the .thumb64 version).

Once again, this is not a problem for as long as the images are bad, thumbnails generation-wise. But it does become a problem if they become processable. If/when, for example, we upgrade the image libraries to handle the extra flavors of TIFF. Then the presence of the 0-size cache prevents the re-generation, even after the previewimagefail flag is reset.

I tested this by replacing one of the existing bad tiffs on disk with a tiff known to be processable, adjusting the filesize in the db accordingly and resetting previewimagefail to false, then reloading the page.

There should be an easy way to both not leave that 0-size cache behind, AND delete any existing 0-size caches when detected by the new code from the last commits - is there?

Please use your judgement re: whether this is worth any extra dev. effort. If this is remotely non-trivial for whatever reason, and you feel we should instead say that it's the responsibility of the admin to manually delete these 0-size caches when processing fixes become available, I can accept that.

@landreev
Copy link
Contributor

landreev commented Apr 29, 2024

@stevenwinship
P.S. in that future situation where we expand our TIFF and/or other formats coverage and tell the admins to try to generate the thumbnails again - we will already be asking them to do something manually for the files affected: we will have to tell them to identify the files affected, then use the reset API (/api/admin/clearThumbnailFailureFlag) to clear the previewimagefail flag on them... so, that would be a reason to argue that adding "... and you need to find any 0-size *.thumb* files in storage and delete them..." to the release note would not be that big of a big deal, possibly? - I'm seriously open to feedback on this.

@landreev
Copy link
Contributor

@stevenwinship
probably unnecessary at this point, since I clarified this part during standup, but this was my hacky way to test the "image becomes processable" case:

ls -l ~/Downloads/known_good_image.tiff
-rw-r--r--@ 1 landreev  staff  2266171 Apr 29 10:27 /Users/landreev/Downloads/known_good_image.tiff
# replace an existing "bad" tiffs:
cp ~/Downloads/known_good_image.tiff /usr/local/payara6/glassfish/domains/domain1/files/10.70122/FK2/HCTR3R/18f2a0f4c3d-c90ea25532ae

... then update the file size in the db (I think this is necessary for the whole thing to work) and reset the flag (could've used the API, but did so in the db):

UPDATE datafile SET filesize=2266171 WHERE id=9045
UPDATE dvobject SET previewimagefail=FALSE WHERE id=9045

... then reload the dataset page

// ImageThumbConverter fixed the DataFile
// Now we need to update dataset since this is a bad logo
DatasetServiceBean datasetService = CDI.current().select(DatasetServiceBean.class).get();
datasetService.removeDatasetThumbnail(dataset);
Copy link
Contributor

@landreev landreev Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenwinship Just to follow up on the slack conversation:
datasetService.removeDatasetThumbnail(dataset); does NOT, and is not supposed to, remove the cached thumbnails associated with a specific DataFile. We may want to come up with a better, and potentially less confusing name for the method. Something like datasetService.clearDatasetThumbnail(dataset);. Because that's what it does - the purpose of the method is to make sure the Dataset no longer has a designated thumbnail. If that designated thumbnail was a "custom logo", it will attempt to delete the physical file (because, once we stop using it as the dataset thumbnail, there's no other use for it); but if it was one of the DataFile-level thumbnail, it will only clear the dataset-level thumbnail designation, but it normally would not have any reason to attempt to delete the caches, because normally they would still be useful elsewhere on the page.

I do believe that it is entirely safe to just leave the code above as is; I don't think you need to do anything special to delete these 0-size caches here, in the dataset-level thumbnail method; instead, those should be deleted in the datafile-level method, in ImageThumbConverter.getImageThumbnailAsInputStream(thumbnailFile.getStorageIO(),ImageThumbConverter.DEFAULT_DATASETLOGO_SIZE);
where you are checking for that size == 0.

I hope this makes sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that I corrected an important typo in the above. When I was talking about changing the method name, I meant to say I wanted to name it clearDatasetThumbnail(), instead of remove...
I just typed the same name twice earlier ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll go ahead and do that)

…ro-size files are not left behind when rescaling fails; and to erase such when they are encountered for legacy files; renamed a method to better reflect what it does; and dropped TIF from the list of formats supported for custom logos. Nothing too serious.

it's still a mess, don't get me wrong. but should be ready to be merged, for now. #10352

This comment has been minimized.

@landreev
Copy link
Contributor

@qqmyers I offered @stevenwinship to add a couple of last minute improvements yesterday. I'm fairly confident they are not going to break anything, but for the sake of due process, if you have a moment before standup, please take a look at the commit 6d1f70a and see if anything seems off. I'm planning to merge it otherwise.

The only functional changes are for removing broken/zero size caches. The rest are comments and such.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good.

@landreev
Copy link
Contributor

@qqmyers @stevenwinship
This isn't directly relevant to the QA of the fixes in the PR, but I just wanted to share this, as an illustration of how fragile and unpredictable things are in the thumbnail-generating framework:
The bug that started this, with the unprocessable TIFFs still being marked as previewimageavailable=TRUE etc. - it was NOT happening with direct upload: the previewimagefail flag was properly set then. So this difference in the workflow - whether the rescaling was first attempted before, or after the file was permanently saved, was enough to account for the flags to be properly set, and/or saved in one case but not in the other. (I'm guessing that was the difference, that is).

I'll take any further discussion of things like this to the refactoring issue #10515.
(merging the PR)

@landreev landreev merged commit e615050 into develop Apr 30, 2024
18 of 19 checks passed
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:graceful-failure-mode-for-TIFF-images-failure
ghcr.io/gdcc/configbaker:graceful-failure-mode-for-TIFF-images-failure

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Type: Bug a defect
Projects
None yet
5 participants