Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNF Recalculation Endpoint #3589

Closed
djbrooke opened this issue Jan 24, 2017 · 12 comments
Closed

UNF Recalculation Endpoint #3589

djbrooke opened this issue Jan 24, 2017 · 12 comments
Assignees

Comments

@djbrooke
Copy link
Contributor

djbrooke commented Jan 24, 2017

This dataset in production has no UNF listed: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/26935

But it has 3 files with UNFs, including this one: https://dataverse.harvard.edu/file.xhtml?fileId=2491887&version=7.2

It's not missing for all datasets, as this recently created dataset has a UNF listed:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/S2VGJ1

@djbrooke djbrooke added the ready label Jan 24, 2017
@scolapasta
Copy link
Contributor

We should investigate the why, of course, but to fix these it should be fairly straightforward to add a simple admin api to generate a UNF (this should not create a new version, since the UNF should have been generated before).

@djbrooke djbrooke changed the title Some Dataset UNFs missing UNF Endpoint Jan 25, 2017
@djbrooke djbrooke changed the title UNF Endpoint UNF Recalculation Endpoint Jan 25, 2017
@djbrooke
Copy link
Contributor Author

We identified a few potential tasks in sprint planning today.

  • Create the new endpoint
  • Determine the permissions (who can access the endpoint)
  • Determine version implications (this is a change to the file/dataset, but how should it be reflected)
  • Once this is implemented and available in production, we'll need to identify the files/datasets that should have this recalculated

@pdurbin
Copy link
Member

pdurbin commented Jan 26, 2017

Also in yesterday's meeting it was clarified that the cause of the missing UNFs was #2327 (fixed in Dataverse 4.5 via pull request #3226) so these datasets were not migrated. They were created post 4.0. It sounds like @scolapasta has a SQL query in mind to identify which datasets have been affected in an installation of Dataverse and we should provide this query as part of this issue. I'd actually be in favor of adding an API endpoint that iterates through all datasets and checks for anomalies such as missing UNFs but if it's more expedient for now to simply provide the query, that's a good start.

@pdurbin pdurbin added in progress and removed ready labels Jan 26, 2017
@pdurbin pdurbin self-assigned this Jan 26, 2017
pdurbin added a commit that referenced this issue Jan 27, 2017
@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2017

I created pull request #3605 and associated it with this issue, which I moved to Code Review at https://waffle.io/IQSS/dataverse

@pdurbin
Copy link
Member

pdurbin commented Feb 6, 2017

@kcondon here are some expected messages you'll may see from the new "Dataset Integrity" API I just documented in 08d50a4 assuming you mess with the data in the database a bit. 😄

curl http://localhost:8080/api/admin/datasets/integrity

{
  "status": "OK",
  "data": {
    "numProblems": 2,
    "problems": [
      {
        "datasetVersionId": 1,
        "message": "Dataset version 1.0 (datasetVersionId 1) from doi:10.5072/FK2/PEBY2F has a UNF (foo) but shouldn't!"
      },
      {
        "datasetVersionId": 62,
        "message": "Dataset version DRAFT (datasetVersionId 62) from doi:10.5072/FK2/QJDRVI doesn't have a UNF but should!"
      }
    ]
  }
}

curl -X POST http://localhost:8080/api/admin/datasets/integrity/62/fixunf

{
  "status": "OK",
  "data": {
    "message": "New UNF value saved (UNF:6:x10r+Q9EK6aF/BMi+eKzGw==). Reindexing dataset."
  }
}

@pdurbin
Copy link
Member

pdurbin commented Feb 6, 2017

@kcondon I improved the wording in the API Guide based on your feedback: d4daced

@kcondon
Copy link
Contributor

kcondon commented Feb 6, 2017

OK, basic functionality works as described:

The integrity check checks whether there is a unf when there shouldn't and says so and whether there should be and isn't and says so. The fixunf endpoint only fixes the should be a unf but isn't case as designed and for the case where there is one but shouldn't be, says:
{"status":"OK","data":{"message":"Dataset version (id=91) already has a UNF. Blank the UNF value in the database if you must change it."}}

Am now testing against copy of production db but so far integrity test dies after 40 mins.

@kcondon
Copy link
Contributor

kcondon commented Feb 6, 2017

Throws 500 error after running for nearly 40 mins with no output until the error:
[2017-02-06T17:51:19.245-0500] [glassfish 4.1] [WARNING] [] [javax.enterprise.web.core] [tid: _ThreadID=75 _
ThreadName=http-listener-1(46)] [timeMillis: 1486421479245] [levelValue: 900] [[
Servlet.service() for servlet edu.harvard.iq.dataverse.api.ApiConfiguration threw exception
javax.json.stream.JsonGenerationException: Generating incomplete JSON

@pdurbin
Copy link
Member

pdurbin commented Feb 7, 2017

@kcondon as of cc4158c for the check I'm returning Dataset integrity check has begun. See log at ../logs/dataset_integrity_2017-02-07T10-13-46.txt (or whatever timestamp) immediately. That file gets written to glassfish/domains/domain1/logs/dataset_integrity_2017-02-07T10-13-46.txt for example. If this solution doesn't work I suspect we'll need to look into a native query.

@kcondon
Copy link
Contributor

kcondon commented Feb 7, 2017

Tested the new output method, still fails after around 40 mins:

[2017-02-07T16:29:02.073-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter] [tid: _ThreadID=71 _ThreadName=http-listener-1(40)] [timeMillis: 1486502942073] [levelValue: 900] [[
getImageThumbAsBase64: Failed to open FileAccess object for DataFile id 2776072]]

[2017-02-07T16:37:20.597-0500] [glassfish 4.1] [WARNING] [] [javax.enterprise.web.core] [tid: _ThreadID=48 _ThreadName=http-listener-1(17)] [timeMillis: 1486503440597] [levelValue: 900] [[
Servlet.service() for servlet edu.harvard.iq.dataverse.api.ApiConfiguration threw exception
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302)
at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:292)
at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.glassfish.jersey.servlet.internal.ResponseWriter.getResponseContext(ResponseWriter.java:273)
at org.glassfish.jersey.servlet.internal.ResponseWriter.writeResponseStatusAndHeaders(ResponseWriter.java:148)
at org.glassfish.jersey.server.ServerRuntime$Responder$1.getOutputStream(ServerRuntime.java:611)
at org.glassfish.jersey.message.internal.CommittingOutputStream.commitStream(CommittingOutputStream.java:200)
at org.glassfish.jersey.message.internal.CommittingOutputStream.flushBuffer(CommittingOutputStream.java:305)
at org.glassfish.jersey.message.internal.CommittingOutputStream.commit(CommittingOutputStream.java:261)
at org.glassfish.jersey.message.internal.CommittingOutputStream.close(CommittingOutputStream.java:276)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:320)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at org.glassfish.json.JsonGeneratorImpl.close(JsonGeneratorImpl.java:513)
at org.glassfish.json.JsonWriterImpl.close(JsonWriterImpl.java:150)
at org.glassfish.json.jaxrs.JsonStructureBodyWriter.writeTo(JsonStructureBodyWriter.java:119)
at org.glassfish.json.jaxrs.JsonStructureBodyWriter.writeTo(JsonStructureBodyWriter.java:69)
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:263)
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:250)
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:162)
at org.glassfish.jersey.server.internal.Jso

@kcondon
Copy link
Contributor

kcondon commented Feb 7, 2017

Also, discussed whether harvested datasets should be checked since we do not have ability to update them and not sure that UNF info is harvested: ie. dataset or file.

@pdurbin
Copy link
Member

pdurbin commented Feb 8, 2017

@kcondon sorry for all the bugs in the GET http://$SERVER/api/admin/datasets/integrity endpoint. @scolapasta @landreev and I decided to remove it entirely, which I did in 019bbb3. I made no changes to the other endpoint, which is still documented as POST http://$SERVER/api/admin/datasets/integrity/{datasetVersionId}/fixunf but I did pull the latest from the develop branch into this pull request. There were no merge conflicts.

See the note from @djbrooke above that says, "Once this is implemented and available in production, we'll need to identify the files/datasets that should have this recalculated". This task still needs to be done. I assume someone will write a SQL script or something.

@pdurbin pdurbin assigned pdurbin and landreev and unassigned pdurbin and landreev Feb 8, 2017
kcondon added a commit that referenced this issue Feb 8, 2017
@kcondon kcondon self-assigned this Feb 8, 2017
@kcondon kcondon closed this as completed Feb 8, 2017
@kcondon kcondon removed the Status: QA label Feb 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants