Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Published dataset inaccessible after metadata update #10880

Closed
jarulsamy opened this issue Sep 25, 2024 · 12 comments
Closed

Published dataset inaccessible after metadata update #10880

jarulsamy opened this issue Sep 25, 2024 · 12 comments
Labels
Type: Bug a defect

Comments

@jarulsamy
Copy link

jarulsamy commented Sep 25, 2024

What steps does it take to reproduce the issue?

  • When does this issue occur?

We had a user update the metadata of an existing published dataset. When they committed the change, it appeared to be successful, however navigating to the dataset now produces an error. Specifically, they added another related publication URL.

Editing other metadata fields and re-publishing does not cause this issue.

  • Which page(s) does it occurs on?

On the dataset page (viewing files / metadata).

  • What happens?

There is an internal server error displayed with very little information:

image

Looking at the logs, there is this large stack trace emitted when a user tries to navigate to the dataset:

[2024-09-25T12:04:03.101-0600] [Payara 6.2024.6] [WARNING] [] [jakarta.enterprise.web] [tid: _ThreadID=103 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1727287443101] [levelValue: 900] [[
  StandardWrapperValve[Faces Servlet]: Servlet.service() for servlet Faces Servlet threw exception
java.lang.NullPointerException: Cannot invoke "java.util.List.iterator()" because "dsFieldTypeInputLevels" is null
	at edu.harvard.iq.dataverse.DatasetPage.updateDatasetFieldInputLevels(DatasetPage.java:1848)
	at edu.harvard.iq.dataverse.DatasetPage.init(DatasetPage.java:2126)
	at edu.harvard.iq.dataverse.DatasetPage.init(DatasetPage.java:1926)
	at jdk.internal.reflect.GeneratedMethodAccessor1465.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.glassfish.expressly.util.ReflectionUtil.invokeMethod(ReflectionUtil.java:186)
	at org.glassfish.expressly.parser.AstValue.invoke(AstValue.java:253)
	at org.glassfish.expressly.MethodExpressionImpl.invoke(MethodExpressionImpl.java:248)
	at org.jboss.weld.module.web.util.el.ForwardingMethodExpression.invoke(ForwardingMethodExpression.java:40)
	at org.jboss.weld.module.web.el.WeldMethodExpression.invoke(WeldMethodExpression.java:50)
	at com.sun.faces.facelets.el.TagMethodExpression.invoke(TagMethodExpression.java:70)
	at com.sun.faces.application.ActionListenerImpl.getNavigationOutcome(ActionListenerImpl.java:74)
	at com.sun.faces.application.ActionListenerImpl.processAction(ActionListenerImpl.java:62)
	at jakarta.faces.component.UIViewAction.broadcast(UIViewAction.java:506)
	at jakarta.faces.component.UIViewRoot.broadcastEvents(UIViewRoot.java:858)
	at jakarta.faces.component.UIViewRoot.processApplication(UIViewRoot.java:1332)
	at com.sun.faces.lifecycle.InvokeApplicationPhase.execute(InvokeApplicationPhase.java:56)
	at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:72)
	at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:159)
	at jakarta.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:691)
	at jakarta.faces.webapp.FacesServlet.service(FacesServlet.java:449)
	at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1554)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at org.glassfish.tyrus.servlet.TyrusServletFilter.doFilter(TyrusServletFilter.java:83)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:166)
	at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
	at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
	at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
	at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:372)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:239)
	at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
	at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
	at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:174)
	at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:153)
	at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:196)
	at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:88)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:246)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:178)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:118)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:96)
	at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:51)
	at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:510)
	at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:82)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:83)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:101)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:535)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:515)
	at java.base/java.lang.Thread.run(Thread.java:840)
]]
  • To whom does it occur (all users, curators, superusers)?

This occurs to all users attempting to view this particular dataset. Viewing other datasets functions properly.

  • What did you expect to happen?

I expected the dataset information to be displayed correctly just like any other dataset.

Which version of Dataverse are you using?

v6.3 build 1607-8c99a74

Any related open or closed issues to this bug report?

Not that I can find.

Are you thinking about creating a pull request for this issue?

Not at this time, though I think I tracked down the error to this particular function (specifically line 1849, I believe dsFieldTypeInputLevels is null, and the attempted iteration is causing the panic):

private void updateDatasetFieldInputLevels() {
Long dvIdForInputLevel = ownerId;
// OPTIMIZATION (?): replaced "dataverseService.find(ownerId)" with
// simply dataset.getOwner()... saves us a few lookups.
// TODO: could there possibly be any reason we want to look this
// dataverse up by the id here?? -- L.A. 4.2.1
if (!dataset.getOwner().isMetadataBlockRoot()) {
dvIdForInputLevel = dataset.getOwner().getMetadataRootId();
}
/* ---------------------------------------------------------
Map to hold DatasetFields
Format: { DatasetFieldType.id : DatasetField }
--------------------------------------------------------- */
// Initialize Map
Map<Long, DatasetField> mapDatasetFields = new HashMap<>();
// Populate Map
for (DatasetField dsf : workingVersion.getFlatDatasetFields()) {
if (dsf.getDatasetFieldType().getId() != null){
mapDatasetFields.put(dsf.getDatasetFieldType().getId(), dsf);
}
}
/* ---------------------------------------------------------
Retrieve List of DataverseFieldTypeInputLevel objects
Use the DatasetFieldType id's which are the Map's keys
--------------------------------------------------------- */
List<Long> idList = new ArrayList<>(mapDatasetFields.keySet());
List<DataverseFieldTypeInputLevel> dsFieldTypeInputLevels = dataverseFieldTypeInputLevelService.findByDataverseIdAndDatasetFieldTypeIdList(dvIdForInputLevel, idList);
/* ---------------------------------------------------------
Iterate through List of DataverseFieldTypeInputLevel objects
Call "setInclude" on its related DatasetField object
--------------------------------------------------------- */
for (DataverseFieldTypeInputLevel oneDSFieldTypeInputLevel : dsFieldTypeInputLevels){
if (oneDSFieldTypeInputLevel != null) {
// Is the DatasetField in the hash? hash format: { DatasetFieldType.id : DatasetField }
DatasetField dsf = mapDatasetFields.get(oneDSFieldTypeInputLevel.getDatasetFieldType().getId());
if (dsf != null){
// Yes, call "setInclude"
dsf.setInclude(oneDSFieldTypeInputLevel.isInclude());
// remove from hash
mapDatasetFields.remove(oneDSFieldTypeInputLevel.getDatasetFieldType().getId());
}
}
} // end: updateDatasetFieldInputLevels
/* ---------------------------------------------------------
Iterate through any DatasetField objects remaining in the hash
Call "setInclude(true) on each one
--------------------------------------------------------- */
for ( DatasetField dsf : mapDatasetFields.values()) {
if (dsf != null){
dsf.setInclude(true);
}
}
}

It seems that my database may be missing some reference somehow. Not really sure, some insight from a developer would be much appreciated. I can share queries from my DB as well as any other info as needed.

Thanks.

@jarulsamy jarulsamy added the Type: Bug a defect label Sep 25, 2024
@qqmyers
Copy link
Member

qqmyers commented Sep 25, 2024

It's possible you could be the second known victim of the bug in the "Update-Current-Version" functionality that we recently announced. If so, it would unfortunately mean that the metadata for the latest version is lost and would have to be restored from a backup or re-entered.

The way to verify would be to select * from datasetfield where datasetversion_id = <id of the version>. If there aren't any, it is likely that it's the bug we reported.

I think @scolapasta discovered a way to restore the version to a state where the missing metadata can be re-entered via the UI, which could be easier than trying to restore the datasetfield entries from a backup directly. Hopefully he can report here or in the email.

If the datasetfields are still in the db, it's still possible you've hit some other issue. If so, we can help figure it out.

@scolapasta
Copy link
Contributor

Yes, the method to restore via the UI* is to go manually to the upload file page (using the db id of the dataset) and upload a file to create a draft. You can then view this draft via the UI to a) re add the metadata and b) delete the dummy file you uploaded.

  • it might require a small db change too - we saw an issue where the subject was being stored both as a controlled vocabulary value and as a text value of "N/A". in this case you would just need to manually delete the corresponding datasetfieldvalue.

@jarulsamy
Copy link
Author

@qqmyers & @scolapasta thanks for the info. I realize I need to subscribe to the mailing list, I missed the bug info. I'll plan to deploy the patched version as soon as I can.

I do have daily backups so I should be able to restore the affected dataset. How do I identify the datasetversion_id from the query you suggested?

@qqmyers
Copy link
Member

qqmyers commented Sep 25, 2024

The dataset table has authority, identifier columns for the dataset persistent id so you can find the dataset id, and then the datasetversion table has a dataset_id column so you can find the version(s) associated with the dataset.

@jarulsamy
Copy link
Author

I confirmed that this query:

select * from datasetfield where datasetversion_id = <id of the version>

does not return any rows. So I suspect that this is the "Update-Current-Version" bug you mentioned. To resolve this from the backup, would I just copy the rows with the matching datasetversion_id from the backup to the running database?

I could also try the UI option, but I would prefer not requiring the user to re-enter their metadata.

@qqmyers
Copy link
Member

qqmyers commented Sep 25, 2024

Unfortunately, it's more complex than that and, since we haven't seen any other instances except the one that was handled manually, we don't have a canned query to use. DatasetFields have associated DatasetFieldValues, there are DatasetFieldCompoundValues with more DatasetFields, some values are ControlledVocabularValues, etc.

If you have the ability to instantiate the backup db somewhere, doing that and either inspecting the UI or looking in the json or OAI-ORE exports might be an easier way to find all of the values, though someone would still have to reenter the data via the UI (or API).
(which makes me wonder - the exports are all cached - I wonder if the export files still exist from before the bug hit. If so, getting them from the file system/s3 or perhaps even by calling the API (look for the URL to get the exports for another dataset and then swap the persistent identifier used) might be the easiest way to get the values.

@jarulsamy
Copy link
Author

Well, it appears I don't have a backup of the metadata anyway. The user had just created the dataset, published it, then attempted to update the metadata. We have nightly backups so unfortunately there's no way to recover it from my end. Given the complexity @qqmyers mentions, it was probably better just to have the user re-enter the metadata from my POV anyway. I used the UI trick @scolapasta detailed to get the dataset edit-able again, then had the user re-add their metadata. In case anyone else runs into the bug, these are the exact steps I took.

  1. Backup Dataverse and DB.

  2. Upgrade to the patched version of 6.3.

    wget 'https://github.com/donsizemore/dataverse_backports/releases/download/6.3_10797_10820/dataverse-6.3_10797_10820.war'
    mv dataverse-6.3_10797_10820.war dataverse-6.3.war
    asadmin list-applications
    asadmin undeploy dataverse
    asadmin deploy dataverse-6.3.war
    
  3. Make sure web interface came up correctly, then restarted payara (just to be sure).

    systemctl restart payara
    

    Again, validate web UI is back up.

  4. Query the DB for the ID of the dataset using this query:

    SELECT * FROM public.dvobject
    WHERE
      authority = '10.15786'
    AND
      identifier = '20.500.11919/7166'
  5. Go to the WebUI and find some other published dataset and select it. "Edit Dataset" -> "Files (Upload)".

  6. Edit the URL with the ID of the dataset from before. In my case, it was this:

    https://dataverse.arcc.uwyo.edu/editdatafiles.xhtml?datasetId=260&mode=UPLOAD
    
  7. Upload any temporary file. I just used a text file and save changes.

  8. The dataset became available with "This draft version has incomplete metadata that needs to be edited before it can be published." I had the user update the metadata, remove my temporary text file, and republish just fine.

Thank you very much @qqmyers and @scolapasta for the quick help. This was easier to solve than I anticipated. Thankfully, we're still migrating a lot of our datasets over to our new Dataverse instance, so this was just a minor bump in the road :).

@qqmyers
Copy link
Member

qqmyers commented Sep 26, 2024

Glad you were able to recover and thanks for the step-by-step instructions for others!

@scolapasta
Copy link
Contributor

@ jarulsamy did you check the subject? (is it showing correctly in the UI?)

@jarulsamy
Copy link
Author

@ jarulsamy did you check the subject? (is it showing correctly in the UI?)

Yes, it does show currently in the UI after re-adding the correct metadata and re-publishing.

@scolapasta
Copy link
Contributor

OK, great! glad to see you don't seem to be having the same issue there that we were seeing (with both the value and the controlled vocab vlaue both being set). If you decide to check it directly in the db, and need help with the queries, let us know.

In the meanwhile, I'll go ahead and close this issue.

@bikramj
Copy link
Contributor

bikramj commented Jan 24, 2025

FYI, we have encountered this bug in v6.2 recently. We tried the dummy file upload option and now able to see new empty DRAFT version. Thank you for the workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
None yet
Development

No branches or pull requests

4 participants