Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Reingest API #4865

Closed
djbrooke opened this issue Jul 17, 2018 · 9 comments
Closed

File Reingest API #4865

djbrooke opened this issue Jul 17, 2018 · 9 comments
Assignees

Comments

@djbrooke
Copy link
Contributor

We've added support for new versions of Stata in #2301. Following the release of the code for #2301, we should reingest the files in production that had previously failed ingest due to being an unsupported Stata version. An API endpoint instead of a db script seems the best way to go for community and reusability purposes.

This should not be a version change and it's OK if the citation gets a UNF added or changed as a result of the reingest.

@landreev you mentioned you wanted to put in a few notes as well

@djbrooke djbrooke added this to the 4.9.2 - Stata Upgrades, etc. milestone Jul 19, 2018
landreev added a commit that referenced this issue Jul 30, 2018
…k it would need a little more in order to move this into "done"; will update the github issue with details. (#4865)
@landreev
Copy link
Contributor

Status update:
I checked in the initial implementation from last week.
It's working but I realized I need to add a little more before I consider it done:

  • I'm adding API tests;
  • I need to add some extra code for re-checking the mime type of the data file; for example, the uningested Stata v.14 files in the prod. db are listed as application/x-stata; but they need to be re-identified as application/x-stata-1[345] in order to be associated with the "new stata" ingest plugin.
  • The current system of running tab. ingest asynchronously hides all the status info from the caller. For example, if you try to ingest a file, but we skip it because of the size, this status info will only be available in the server log. Since this is the decision we make early on, before we attempt anything else, it should be easy to communicate this back to the API (or page user), and inform the user.

Also, I didn't like using the term "reingest" for this. Strictly speaking, "reingest" would be attempting to ingest something as tabular data AGAIN; i.e. running ingest on something that's not yet a tabular file. (I see a use case for that too - for example, if an updated ingest plugin fixes some ingest issues that need to be corrected...). So I would save the word "reingest" for that. I called this api call /api/files/{id}/ingestAsTabular
If anyone has a better name in mind/has strong opinions on this, please let me know.

@oscardssmith
Copy link
Contributor

Why wouldn't both of these cases be the same API? Maybe it should be called tryIngest or something with the behavior that if it is ingested, it will drop the data and re-ingest, and if not ingested it will try to complete an ingest.

landreev added a commit that referenced this issue Aug 1, 2018
…ype was not properly identified previously.

Added extra diagnostics (ref #4865)
@landreev
Copy link
Contributor

landreev commented Aug 1, 2018

Yeah, I decided to use the same API for both. And to call it "reingest" after all.
It will require a "force" option, in order to run it on a datafile that's already been ingested as tabular.

@landreev
Copy link
Contributor

landreev commented Aug 6, 2018

For QA: An easy way to upload stata, etc. files without ingesting is tabular is to set the limit (:TabularIngestSizeLimit) to something very low.
Then you can reset the limit, and test the API:

/api/files/{FILEID}/reingest

You have to be superuser.

@oscardssmith
Copy link
Contributor

Since ingest limits can change, should we remove the code that checks if the format is one that we've changed ingest process for?

@landreev
Copy link
Contributor

landreev commented Aug 6, 2018

Not sure I got that, sorry - remove what code?

@oscardssmith
Copy link
Contributor

Never mind, I must have imagined this functionality.

@landreev
Copy link
Contributor

landreev commented Aug 6, 2018

let me fix the already-ingested-as-tabular check real quick...

@landreev
Copy link
Contributor

landreev commented Aug 6, 2018

This is documented in the Native API guide.
(The formatting of the entire "Files" section of the Native API guide was a bit messed up; section headings were missing, etc. - but I fixed it)

@landreev landreev removed their assignment Aug 6, 2018
@kcondon kcondon closed this as completed Aug 6, 2018
@kcondon kcondon removed the Status: QA label Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants