-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4865 file reingest api #4914
4865 file reingest api #4914
Conversation
try { | ||
dataFile = findDataFileOrDie(id); | ||
} catch (WrappedResponse ex) { | ||
dataFile = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just throw the error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was in case findDataFileOrDie() returns null, without throwing an exception.
Which I guess it should never be the case. As the name of the method suggests. :)
} | ||
|
||
dataFile.getIngestRequest().setForceTypeCheck(true); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this just be ingestrequest?
@@ -503,6 +503,89 @@ public int compare(DataFile d1, DataFile d2) { | |||
} | |||
} | |||
|
|||
public String startIngestJobForSingleFile(DataFile dataFile, AuthenticatedUser user) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have this function, should we simplify startIngestJobs
to be
public void startIngestJobs(Dataset dataset, AuthenticatedUser user) {
for (DataFile dataFile : dataset.getFiles()) {
if (dataFile.isIngestScheduled()) {
startIngestJobForSingleFile(dataFile, user);
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no - that would result in a different behavior. When a user uploads 10 ingestable files, startIngestJobs will schedule the ingest as one queued JMS job, for the entire batch of 10 files. We want to keep the dataset locked for as long as the entire batch is being processed. With 10 individual queued jobs this would be harder to control. The UNF recalculation would be slightly trickier too; there may be other potential issues.
At the same time, these 2 methods do have more duplicated code than strictly necessary. I could definitely get away with using the old one (startIngestJobs) for both ingest and reingest, with just slight modifications. I just felt like writing a method from scratch, for some reason.
As a reviewer, if you feel this is ugly/potentially confusing to a future developer, I'm willing to quickly do that and merge these 2 methods into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they were both nice small functions, I'd say don't bother, but this adds another ~100 lines of duplicate complicated code, so I think it's probably a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done. There's now a startIngestJobs(); and an extra small startIngestJobsForDataset(), that doesn't duplicate any code.
// typed as "application/x-stata". So we want to run the new-and-improved | ||
// DTA check that will re-identify the specific DTA flavor. | ||
// If similar cases are introduced in the future, the affected formats will | ||
// need to be added to the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just retest all? it's not expensive (I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think you're right on this. It could actually be useful to retest everything; in case the type detection code has been improved since the last ingest attempt.
New Contributors
Welcome! New contributors should at least glance at CONTRIBUTING.md, especially the section on pull requests where we encourage you to reach out to other developers before you start coding. Also, please note that we measure code coverage and prefer you write unit tests. Pull requests can still be reviewed without tests or completion of the checklist outlined below. Thanks!
Related Issues
Pull Request Checklist