catch failures with tiff file thumbnail generation #10509

stevenwinship · 2024-04-19T14:11:15Z

Handles failures with TIFF file thumbnail failures by preventing broken images as thumbnails or previews

Which issue(s) this PR closes: #10352

Closes We need a more graceful failure mode: TIFF images failure is not detected and a size-0 thumbnail is cached on s3 - and that results in the broken page view. #10352

Special notes for your reviewer:

Suggestions on how to test this: Use some of the tiff files in question from [
](IQSS/dataverse.harvard.edu#250)

When testing with existing bad tiff files you will see the broken images first since the ui gets the dataset(s) first and doesn't know the images failed. After loading the dataset the db is updated so each time the dataset is loaded going forward the images should look correct (ie default thumbnails with no preview).

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Before:

After:

Is there a release notes update needed for this change?: No

coveralls · 2024-04-19T14:16:23Z

coverage: 20.726% (+0.009%) from 20.717%
when pulling 66f56df on graceful-failure-mode-for-TIFF-images-failure
into 447d576 on develop.

qqmyers

The code changes look reasonable - finding places where the failure wasn't getting flagged and making sure the 0 length file is deleted if there's a failure. Also some nice cleanup.

The one add that would be nice is to get an IT test with a tiff file known to fail - is that possible?

Also - I see that jenkins is failing in a new way - with a file not found during the build. Doesn't seem related to this PR.

qqmyers · 2024-04-19T20:35:59Z

src/test/java/edu/harvard/iq/dataverse/api/ThumbnailsIT.java

+        uploadLogoResponse = UtilIT.uploadDatasetLogo(datasetPersistentId, goodTiff, apiToken);
+        uploadLogoResponse.prettyPrint();
+        uploadLogoResponse.then().assertThat().statusCode(FORBIDDEN.getStatusCode());
+        uploadLogoResponse.then().assertThat().body("message", equalTo("File is larger than maximum size: 500000."));


FWIW: Including the size makes the test dependent on the default max size.

qqmyers

Thanks for the test

qqmyers · 2024-04-20T12:37:25Z

@stevenwinship - Heads up that your change causes a failure in a SearchIT test which is checking for the exact string (looks like you dropped a period in going to the bundled string).

pdurbin · 2024-04-22T13:38:47Z

Does this PR also fix the following issue? Should we put it through QA as well?

Some tiff files don't generate thumbnails #10317

stevenwinship · 2024-04-22T13:43:18Z

Does this PR also fix the following issue? Should we put it through QA as well?

Some tiff files don't generate thumbnails #10317

The files will now show up with a default image but the tiff files will not generate an actual thumbnail image. I agree that we should switch to another image conversion library, but I don't think it's part of this bug. I'll create another issue to address this.

stevenwinship · 2024-04-22T13:53:10Z

Created new feature issue for enhancing the code
#10515

landreev · 2024-04-22T14:13:31Z

but I don't think it's part of this bug. I'll create another issue to address this.

I second this. Expanding the image format coverage is out of scope and should be a separate issue.
I'm not sure we even need a new dev. issue for that. There was already this: #10351, the second issue opened based on IQSS/dataverse.harvard.edu#250. And there must be multiple other issues, like the #10317 mentioned, where people are complaining about TIFFs.

stevenwinship · 2024-04-22T14:57:00Z

@stevenwinship - Heads up that your change causes a failure in a SearchIT test which is checking for the exact string (looks like you dropped a period in going to the bundled string).

Fixed

landreev · 2024-04-22T16:09:13Z

The new code is working as advertised for new uploads (the 0-size .thumb64 files are still there; but previewimagefail is properly set to true, and the default icons are displayed on the pages going forward).

Is there an easy-enough way to fix this as an existing condition as well? I.e., I have an existing dataset with problematic TIFFs uploaded under a "before" version of Dataverse; with thumbnails failing to generate, but previewimageavailable already set to true, and previewimagefail to false; the page appears to stay broken for these.

qqmyers · 2024-04-22T16:24:00Z

Hmm - I thought the code would delete the 0 size file. In any case, https://guides.dataverse.org/en/latest/api/native-api.html#reset-thumbnail-failure-flags are API calls to remove the fail flags. Not sure if that's enough with the changes in this PR to get the thumbs completely reprocessed or not.

landreev · 2024-04-22T17:27:06Z

Hmm - I thought the code would delete the 0 size file. In any case, https://guides.dataverse.org/en/latest/api/native-api.html#reset-thumbnail-failure-flags are API calls to remove the fail flags. Not sure if that's enough with the changes in this PR to get the thumbs completely reprocessed or not.

I may be missing something, but the api above resets the failure flag; it's the other way around with a preexisting case - the failure flag is false, the previewImageAvailable flag is true. Plus zero-size thumbnail files on disk.

In my experiments the workaround required BOTH setting previewimageavailable=FALSE AND deleting the zero-size caches, then reloading the page for the fixes in the PR to take effect. But that also makes me think that perhaps all that's needed is a zero-size check added to the method that checks if a cached copy exists (and then setting previewimagefail=TRUE if it is zero).

landreev · 2024-04-29T13:36:57Z

Appears to be working as advertised - a pre-existing "bad" thumbnail still looking broken on the first page load, fixed going forward.
Will test just a bit more, but so far it's looking ready to be merged.

landreev · 2024-04-29T14:57:57Z

@stevenwinship OK, I have one request:
As of now, everything is working as is should for "bad" TIFFs, but the 0-size caches are kept in storage. (For pre-existing cases, this is true for multiple sizes; for new uploads, it's only the .thumb64 version).

Once again, this is not a problem for as long as the images are bad, thumbnails generation-wise. But it does become a problem if they become processable. If/when, for example, we upgrade the image libraries to handle the extra flavors of TIFF. Then the presence of the 0-size cache prevents the re-generation, even after the previewimagefail flag is reset.

I tested this by replacing one of the existing bad tiffs on disk with a tiff known to be processable, adjusting the filesize in the db accordingly and resetting previewimagefail to false, then reloading the page.

There should be an easy way to both not leave that 0-size cache behind, AND delete any existing 0-size caches when detected by the new code from the last commits - is there?

Please use your judgement re: whether this is worth any extra dev. effort. If this is remotely non-trivial for whatever reason, and you feel we should instead say that it's the responsibility of the admin to manually delete these 0-size caches when processing fixes become available, I can accept that.

landreev · 2024-04-29T15:24:52Z

@stevenwinship
P.S. in that future situation where we expand our TIFF and/or other formats coverage and tell the admins to try to generate the thumbnails again - we will already be asking them to do something manually for the files affected: we will have to tell them to identify the files affected, then use the reset API (/api/admin/clearThumbnailFailureFlag) to clear the previewimagefail flag on them... so, that would be a reason to argue that adding "... and you need to find any 0-size *.thumb* files in storage and delete them..." to the release note would not be that big of a big deal, possibly? - I'm seriously open to feedback on this.

landreev · 2024-04-29T15:37:05Z

@stevenwinship
probably unnecessary at this point, since I clarified this part during standup, but this was my hacky way to test the "image becomes processable" case:

ls -l ~/Downloads/known_good_image.tiff
-rw-r--r--@ 1 landreev  staff  2266171 Apr 29 10:27 /Users/landreev/Downloads/known_good_image.tiff
# replace an existing "bad" tiffs:
cp ~/Downloads/known_good_image.tiff /usr/local/payara6/glassfish/domains/domain1/files/10.70122/FK2/HCTR3R/18f2a0f4c3d-c90ea25532ae

... then update the file size in the db (I think this is necessary for the whole thing to work) and reset the flag (could've used the API, but did so in the db):

UPDATE datafile SET filesize=2266171 WHERE id=9045
UPDATE dvobject SET previewimagefail=FALSE WHERE id=9045

... then reload the dataset page

landreev · 2024-04-29T20:13:08Z

src/main/java/edu/harvard/iq/dataverse/dataset/DatasetUtil.java

+                    // ImageThumbConverter fixed the DataFile
+                    // Now we need to update dataset since this is a bad logo
+                    DatasetServiceBean datasetService = CDI.current().select(DatasetServiceBean.class).get();
+                    datasetService.removeDatasetThumbnail(dataset);


@stevenwinship Just to follow up on the slack conversation:
datasetService.removeDatasetThumbnail(dataset); does NOT, and is not supposed to, remove the cached thumbnails associated with a specific DataFile. We may want to come up with a better, and potentially less confusing name for the method. Something like datasetService.clearDatasetThumbnail(dataset);. Because that's what it does - the purpose of the method is to make sure the Dataset no longer has a designated thumbnail. If that designated thumbnail was a "custom logo", it will attempt to delete the physical file (because, once we stop using it as the dataset thumbnail, there's no other use for it); but if it was one of the DataFile-level thumbnail, it will only clear the dataset-level thumbnail designation, but it normally would not have any reason to attempt to delete the caches, because normally they would still be useful elsewhere on the page.

I do believe that it is entirely safe to just leave the code above as is; I don't think you need to do anything special to delete these 0-size caches here, in the dataset-level thumbnail method; instead, those should be deleted in the datafile-level method, in ImageThumbConverter.getImageThumbnailAsInputStream(thumbnailFile.getStorageIO(),ImageThumbConverter.DEFAULT_DATASETLOGO_SIZE);
where you are checking for that size == 0.

I hope this makes sense!

Please note that I corrected an important typo in the above. When I was talking about changing the method name, I meant to say I wanted to name it clearDatasetThumbnail(), instead of remove...
I just typed the same name twice earlier ...

(I'll go ahead and do that)

…ro-size files are not left behind when rescaling fails; and to erase such when they are encountered for legacy files; renamed a method to better reflect what it does; and dropped TIF from the list of formats supported for custom logos. Nothing too serious. it's still a mess, don't get me wrong. but should be ready to be merged, for now. #10352

landreev · 2024-04-30T13:56:13Z

@qqmyers I offered @stevenwinship to add a couple of last minute improvements yesterday. I'm fairly confident they are not going to break anything, but for the sake of due process, if you have a moment before standup, please take a look at the commit 6d1f70a and see if anything seems off. I'm planning to merge it otherwise.

The only functional changes are for removing broken/zero size caches. The rest are comments and such.

qqmyers

Still looks good.

landreev · 2024-04-30T16:54:09Z

@qqmyers @stevenwinship
This isn't directly relevant to the QA of the fixes in the PR, but I just wanted to share this, as an illustration of how fragile and unpredictable things are in the thumbnail-generating framework:
The bug that started this, with the unprocessable TIFFs still being marked as previewimageavailable=TRUE etc. - it was NOT happening with direct upload: the previewimagefail flag was properly set then. So this difference in the workflow - whether the rescaling was first attempted before, or after the file was permanently saved, was enough to account for the flags to be properly set, and/or saved in one case but not in the other. (I'm guessing that was the difference, that is).

I'll take any further discussion of things like this to the refactoring issue #10515.
(merging the PR)

github-actions · 2024-04-30T17:25:50Z

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:graceful-failure-mode-for-TIFF-images-failure

ghcr.io/gdcc/configbaker:graceful-failure-mode-for-TIFF-images-failure

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

stevenwinship self-assigned this Apr 19, 2024

stevenwinship added Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Type: Bug a defect labels Apr 19, 2024

qqmyers requested changes Apr 19, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

stevenwinship requested a review from qqmyers April 19, 2024 19:19

stevenwinship removed their assignment Apr 19, 2024

qqmyers reviewed Apr 19, 2024

View reviewed changes

qqmyers approved these changes Apr 19, 2024

View reviewed changes

landreev self-assigned this Apr 19, 2024

This comment has been minimized.

Sign in to view

stevenwinship mentioned this pull request Apr 22, 2024

Simplify thumbnail handling code and swap the image generation library #10515

Closed

pdurbin mentioned this pull request Apr 22, 2024

Some tiff files don't generate thumbnails #10317

Closed

This comment has been minimized.

Sign in to view

landreev assigned stevenwinship and unassigned landreev Apr 23, 2024

This comment has been minimized.

Sign in to view

stevenwinship added 2 commits April 26, 2024 17:08

catch failures with tiff file thumbnail generation

5d6d016

adding IT test

ee0be1b

stevenwinship added 4 commits April 26, 2024 17:08

replace hard coded strings with Bundle.properties

1321da7

replace hard coded strings with Bundle.properties

8489327

fix test using string compare

a4fa8f3

adding fix to pre existing thumbnails

4c78209

stevenwinship force-pushed the graceful-failure-mode-for-TIFF-images-failure branch from 66f56df to 4c78209 Compare April 26, 2024 21:08

This comment has been minimized.

Sign in to view

stevenwinship assigned landreev and unassigned stevenwinship Apr 26, 2024

landreev reviewed Apr 29, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

qqmyers approved these changes Apr 30, 2024

View reviewed changes

a few typos. #10352

1efdf9c

landreev merged commit e615050 into develop Apr 30, 2024
18 of 19 checks passed

pdurbin added this to the 6.3 milestone May 15, 2024

pdurbin mentioned this pull request May 23, 2024

quiesce missing thumbnail NPE #10181

Closed

landreev mentioned this pull request Sep 3, 2024

Fix for publishing breaks designated Dataset thumbnail, messes up Collection page #10820

Merged

DS-INRAE mentioned this pull request Nov 19, 2024

PDF datafile without thumbnail results in (logged) 500 error #10179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

catch failures with tiff file thumbnail generation #10509

catch failures with tiff file thumbnail generation #10509

stevenwinship commented Apr 19, 2024 •

edited

Loading

coveralls commented Apr 19, 2024 •

edited

Loading

qqmyers left a comment

This comment has been minimized.

This comment has been minimized.

qqmyers Apr 19, 2024

stevenwinship Apr 19, 2024

qqmyers left a comment

This comment has been minimized.

qqmyers commented Apr 20, 2024

pdurbin commented Apr 22, 2024

stevenwinship commented Apr 22, 2024

stevenwinship commented Apr 22, 2024

This comment has been minimized.

landreev commented Apr 22, 2024

stevenwinship commented Apr 22, 2024

landreev commented Apr 22, 2024

qqmyers commented Apr 22, 2024

landreev commented Apr 22, 2024

This comment has been minimized.

This comment has been minimized.

landreev commented Apr 29, 2024

landreev commented Apr 29, 2024

landreev commented Apr 29, 2024 •

edited

Loading

landreev commented Apr 29, 2024

landreev Apr 29, 2024 •

edited

Loading

landreev Apr 29, 2024

landreev Apr 29, 2024

This comment has been minimized.

landreev commented Apr 30, 2024

qqmyers left a comment

landreev commented Apr 30, 2024

github-actions bot commented Apr 30, 2024

catch failures with tiff file thumbnail generation #10509

catch failures with tiff file thumbnail generation #10509

Conversation

stevenwinship commented Apr 19, 2024 • edited Loading

coveralls commented Apr 19, 2024 • edited Loading

qqmyers left a comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

qqmyers Apr 19, 2024

Choose a reason for hiding this comment

stevenwinship Apr 19, 2024

Choose a reason for hiding this comment

qqmyers left a comment

Choose a reason for hiding this comment

This comment has been minimized.

qqmyers commented Apr 20, 2024

pdurbin commented Apr 22, 2024

stevenwinship commented Apr 22, 2024

stevenwinship commented Apr 22, 2024

This comment has been minimized.

landreev commented Apr 22, 2024

stevenwinship commented Apr 22, 2024

landreev commented Apr 22, 2024

qqmyers commented Apr 22, 2024

landreev commented Apr 22, 2024

This comment has been minimized.

This comment has been minimized.

landreev commented Apr 29, 2024

landreev commented Apr 29, 2024

landreev commented Apr 29, 2024 • edited Loading

landreev commented Apr 29, 2024

landreev Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

landreev Apr 29, 2024

Choose a reason for hiding this comment

landreev Apr 29, 2024

Choose a reason for hiding this comment

This comment has been minimized.

landreev commented Apr 30, 2024

qqmyers left a comment

Choose a reason for hiding this comment

landreev commented Apr 30, 2024

github-actions bot commented Apr 30, 2024

stevenwinship commented Apr 19, 2024 •

edited

Loading

coveralls commented Apr 19, 2024 •

edited

Loading

landreev commented Apr 29, 2024 •

edited

Loading

landreev Apr 29, 2024 •

edited

Loading