-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Vision BatchAnnotateFiles endpoint #966
Add support for Vision BatchAnnotateFiles endpoint #966
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! It looks very reasonable. I just added a few minor comments.
spring-cloud-gcp-vision/src/main/java/com/google/cloud/spring/vision/CloudVisionTemplate.java
Outdated
Show resolved
Hide resolved
public String extractTextFromFile(Resource fileResource, String mimeType) { | ||
AnnotateFileResponse response = analyzeFile(fileResource, mimeType, Type.TEXT_DETECTION); | ||
|
||
List<AnnotateImageResponse> annotateImageResponses = response.getResponsesList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 10 lines of code below are the same as what we need to do for extractTextFromImage
and should be refactored into a shared private utility method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to address this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the latest changes, the 10 lines below are not the exact same anymore. The return value changed now to a list of strings instead of a single string. Do you think there are still some lines left that are worth moving into a separate utility method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually got me thinking, for the sake of simplicity of use, maybe we should just return a single concatenated string, separated by newlines, or maybe a customizable separator that can be configured on the class level. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, this really depends on what the caller is doing with the extracted text. If you want to process the text page by page it does not make too much sense to merge it into one string. But by returning a list of strings you leave the option open to the caller how he then wants to handle the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just wondering if the most common usecase would be to process only 1 page anyway. In that case, a list would make it a bit more verbose to use. Which option seems more convenient for your usecase, for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say, that processing one-page PDFs are more common than processing multiple pages. In our current usecase we need to be able to process both (single and multi-page) documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing! Code looks good in general.
I only have one concern about extractTextFromFile()
that file can have multiple pages, details in comment below.
Additionally, I would suggest to add an integration test or a case in the existing sample code if possible.
spring-cloud-gcp-vision/src/main/java/com/google/cloud/spring/vision/CloudVisionTemplate.java
Outdated
Show resolved
Hide resolved
...-cloud-gcp-vision/src/test/java/com/google/cloud/spring/vision/CloudVisionTemplateTests.java
Outdated
Show resolved
Hide resolved
I enhanced the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code looks good, except for one minor comment that's unresolved.
If it's not too much to ask, two more things would be ideal:
- Can you add an integration test for the sample endpoint?
- Reference documentation in
vision.adoc
should be updated to describe the new functionality.
Thanks!
public String extractTextFromFile(Resource fileResource, String mimeType) { | ||
AnnotateFileResponse response = analyzeFile(fileResource, mimeType, Type.TEXT_DETECTION); | ||
|
||
List<AnnotateImageResponse> annotateImageResponses = response.getResponsesList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to address this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added sample looks great. Made a few minor comments below.
Can you please also add to the existing Vision doc here with the newly added functionalities in CloudVisionTemplate
? This way they are discoverable by more users.
spring-cloud-gcp-vision/src/main/java/com/google/cloud/spring/vision/CloudVisionTemplate.java
Show resolved
Hide resolved
spring-cloud-gcp-vision/src/main/java/com/google/cloud/spring/vision/CloudVisionTemplate.java
Outdated
Show resolved
Hide resolved
...ng-cloud-gcp-samples/spring-cloud-gcp-vision-api-sample/src/main/resources/static/index.html
Show resolved
Hide resolved
public String extractTextFromFile(Resource fileResource, String mimeType) { | ||
AnnotateFileResponse response = analyzeFile(fileResource, mimeType, Type.TEXT_DETECTION); | ||
|
||
List<AnnotateImageResponse> annotateImageResponses = response.getResponsesList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually got me thinking, for the sake of simplicity of use, maybe we should just return a single concatenated string, separated by newlines, or maybe a customizable separator that can be configured on the class level. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@florian-3ap Thank you so much for this contribution and your patience with the reviews!
Kudos, SonarCloud Quality Gate passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc addition looks great to me! Thank you for contributing!
…rm#966) Add support for Vision BatchAnnotateFiles endpoint fixes GoogleCloudPlatform#844
fixes #844