Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Vision BatchAnnotateFiles endpoint #966

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 57 additions & 1 deletion docs/src/main/asciidoc/vision.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Spring Cloud GCP provides:

* A convenience starter which automatically configures authentication settings and client objects needed to begin using the https://cloud.google.com/vision/[Google Cloud Vision API].
* `CloudVisionTemplate` which simplifies interactions with the Cloud Vision API.
** Allows you to easily send images to the API as Spring Resources.
** Allows you to easily send images, PDF, TIFF and GIF documents to the API as Spring Resources.
** Offers convenience methods for common operations, such as classifying content of an image.
* `DocumentOcrTemplate` which offers convenient methods for running https://cloud.google.com/vision/docs/pdf[optical character recognition (OCR)] on PDF and TIFF documents.

Expand Down Expand Up @@ -120,6 +120,62 @@ public void processImage() {
}
----

=== File Analysis

The `CloudVisionTemplate` allows you to easily analyze PDF, TIFF and GIF documents; it provides the following method for interfacing with Cloud Vision:

`public AnnotateFileResponse analyzeFile(Resource fileResource, String mimeType, Feature.Type... featureTypes)`
meltsufin marked this conversation as resolved.
Show resolved Hide resolved

**Parameters:**

- `Resource fileResource` refers to the Spring Resource of the PDF, TIFF or GIF object you wish to analyze.
Documents with more than 5 pages are not supported.

- `String mimeType` is the mime type of the fileResource.
Currently, only `application/pdf`, `image/tiff` and `image/gif` are supported.

- `Feature.Type... featureTypes` refers to a var-arg array of Cloud Vision Features to extract from the document.
A feature refers to a kind of image analysis one wishes to perform on a document, such as label detection, OCR recognition, facial detection, etc.
One may specify multiple features to analyze within one request.
A full list of Cloud Vision Features is provided in the https://cloud.google.com/vision/docs/features[Cloud Vision Feature docs].

**Returns:**

- https://cloud.google.com/vision/docs/reference/rpc/google.cloud.vision.v1#google.cloud.vision.v1.AnnotateFileResponse[`AnnotateFileResponse`] contains the results of all the feature analyses that were specified in the request.
For each page of the analysed document the response will contain an `AnnotateImageResponse` object which you can retrieve using `annotateFileResponse.getResponsesList()`.
For each feature type that you provide in the request, `AnnotateImageResponse` provides a getter method to get the result of that feature analysis.
For example, if you analysed an PDF using the `DOCUMENT_TEXT_DETECTION` feature, you would retrieve the results from the response using `annotateImageResponse.getFullTextAnnotation().getText()`.
+
`AnnotateFileResponse` is provided by the Google Cloud Vision libraries; please consult the https://cloud.google.com/vision/docs/reference/rpc/google.cloud.vision.v1#google.cloud.vision.v1.AnnotateFileResponse[RPC reference] or https://googleapis.dev/java/google-cloud-vision/latest/index.html?com/google/cloud/vision/v1/AnnotateFileResponse.html[Javadoc] for more details.
Additionally, you may consult the https://cloud.google.com/vision/docs/[Cloud Vision docs] to familiarize yourself with the concepts and features of the API.

==== Running Text Detection Example

https://cloud.google.com/vision/docs/file-small-batch[Detect text in files] refers to extracting text from small document such as PDF or TIFF.
Below is a code sample of how this is done using the Cloud Vision Spring Template.

[source,java]
----
@Autowired
private ResourceLoader resourceLoader;

@Autowired
private CloudVisionTemplate cloudVisionTemplate;

public void processPdf() {
Resource imageResource = this.resourceLoader.getResource("my_file.pdf");
AnnotateFileResponse response =
this.cloudVisionTemplate.analyzeFile(
imageResource, "application/pdf", Type.DOCUMENT_TEXT_DETECTION);

response
.getResponsesList()
.forEach(
annotateImageResponse ->
System.out.println(annotateImageResponse.getFullTextAnnotation().getText()));
}
----

=== Document OCR Template

The `DocumentOcrTemplate` allows you to easily run https://cloud.google.com/vision/docs/pdf[optical character recognition (OCR)] on your PDF and TIFF documents stored in your Google Storage bucket.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,15 @@ public ModelAndView extractText(String imageUrl, ModelMap map) {

return new ModelAndView("result", map);
}

@GetMapping("/extractTextFromPdf")
public ModelAndView extractTextFromPdf(String pdfUrl, ModelMap map) {
List<String> texts =
this.cloudVisionTemplate.extractTextFromPdf(this.resourceLoader.getResource(pdfUrl));

map.addAttribute("texts", texts);
map.addAttribute("pdfUrl", pdfUrl);

return new ModelAndView("result_pdf", map);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,17 @@ <h1>Text Extraction</h1>
</form>
</div>

<div>
<h1>Text Extraction PDF</h1>
<p>Read and extract the text from a small PDF (maximum 5 Pages are supported):</p>
<form action="/extractTextFromPdf">
Web URL of a PDF to analyze:
meltsufin marked this conversation as resolved.
Show resolved Hide resolved
<input type="text"
name="pdfUrl"
value="https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf" />
<input type="submit" />
</form>
</div>

</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<!DOCTYPE html>
<html xmlns:th="https://www.thymeleaf.org">
<head>
<title>Google Cloud Vision Results</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>

<style>
.extracted_text {
width: 30em;
min-height: 10em;
font-family: "Lucida Console", Monaco, monospace;
background-color: #f2f2f2;
}

.embed_pdf {
width: 30em;
height: 30em;
}
</style>

<body>
<h1>PDF Analysis Results</h1>

<div th:if="${texts}">
<h3>We think the text inside the pdf is...</h3>
<div th:each="text: ${texts}">
<h2>Page: [[${textStat.index} + 1]]</h2>
<div class="extracted_text">[[${text}]]</div>
</div>
</div>

<div>
<embed th:src="${pdfUrl}" class="embed_pdf" />
</div>
</body>
</html>
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,20 @@

package com.google.cloud.spring.vision;

import com.google.cloud.vision.v1.AnnotateFileRequest;
import com.google.cloud.vision.v1.AnnotateFileResponse;
import com.google.cloud.vision.v1.AnnotateImageRequest;
import com.google.cloud.vision.v1.AnnotateImageResponse;
import com.google.cloud.vision.v1.BatchAnnotateFilesRequest;
import com.google.cloud.vision.v1.BatchAnnotateFilesResponse;
import com.google.cloud.vision.v1.BatchAnnotateImagesRequest;
import com.google.cloud.vision.v1.BatchAnnotateImagesResponse;
import com.google.cloud.vision.v1.Feature;
import com.google.cloud.vision.v1.Feature.Type;
import com.google.cloud.vision.v1.Image;
import com.google.cloud.vision.v1.ImageAnnotatorClient;
import com.google.cloud.vision.v1.ImageContext;
import com.google.cloud.vision.v1.InputConfig;
import com.google.protobuf.ByteString;
import com.google.rpc.Code;
import java.io.IOException;
Expand All @@ -41,6 +46,11 @@
*/
public class CloudVisionTemplate {

public static final String READ_BYTES_ERROR_MESSAGE =
"Failed to read bytes from provided resource.";
public static final String EMPTY_RESPONSE_ERROR_MESSAGE =
"Failed to receive valid response Vision APIs; empty response received.";

private final ImageAnnotatorClient imageAnnotatorClient;

public CloudVisionTemplate(ImageAnnotatorClient imageAnnotatorClient) {
Expand All @@ -59,6 +69,17 @@ public String extractTextFromImage(Resource imageResource) {
return extractTextFromImage(imageResource, ImageContext.getDefaultInstance());
}

/**
* Extract the text out of a pdf and return the result as a String.
*
* @param fileResource the pdf one wishes to analyze
* @return the text extracted from the pdf as a string per page
* @throws CloudVisionException if the image could not be read or if text extraction failed
*/
public List<String> extractTextFromPdf(Resource fileResource) {
return extractTextFromFile(fileResource, "application/pdf");
}

/**
* Extract the text out of an image and return the result as a String.
*
Expand All @@ -78,6 +99,35 @@ public String extractTextFromImage(Resource imageResource, ImageContext imageCon
return result;
}

/**
* Extract the text out of a file and return the result as a String.
*
* @param fileResource the file one wishes to analyze
* @param mimeType the mime type of the fileResource. Currently, only "application/pdf",
* "image/tiff" and "image/gif" are supported.
* @return the text extracted from the pdf as a string per page
* @throws CloudVisionException if the image could not be read or if text extraction failed
*/
public List<String> extractTextFromFile(Resource fileResource, String mimeType) {
AnnotateFileResponse response =
analyzeFile(fileResource, mimeType, Type.DOCUMENT_TEXT_DETECTION);

List<AnnotateImageResponse> annotateImageResponses = response.getResponsesList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 10 lines of code below are the same as what we need to do for extractTextFromImage and should be refactored into a shared private utility method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to address this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest changes, the 10 lines below are not the exact same anymore. The return value changed now to a list of strings instead of a single string. Do you think there are still some lines left that are worth moving into a separate utility method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually got me thinking, for the sake of simplicity of use, maybe we should just return a single concatenated string, separated by newlines, or maybe a customizable separator that can be configured on the class level. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this really depends on what the caller is doing with the extracted text. If you want to process the text page by page it does not make too much sense to merge it into one string. But by returning a list of strings you leave the option open to the caller how he then wants to handle the response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering if the most common usecase would be to process only 1 page anyway. In that case, a list would make it a bit more verbose to use. Which option seems more convenient for your usecase, for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say, that processing one-page PDFs are more common than processing multiple pages. In our current usecase we need to be able to process both (single and multi-page) documents.

if (annotateImageResponses.isEmpty()) {
throw new CloudVisionException(EMPTY_RESPONSE_ERROR_MESSAGE);
}

List<String> result =
annotateImageResponses.stream()
.map(annotateImageResponse -> annotateImageResponse.getFullTextAnnotation().getText())
.collect(Collectors.toList());
if (result.isEmpty() && response.getError().getCode() != Code.OK.getNumber()) {
throw new CloudVisionException(response.getError().getMessage());
}

return result;
}

/**
* Analyze an image and extract the features of the image specified by {@code featureTypes}.
*
Expand Down Expand Up @@ -117,7 +167,7 @@ public AnnotateImageResponse analyzeImage(
try {
imgBytes = ByteString.readFrom(imageResource.getInputStream());
} catch (IOException ex) {
throw new CloudVisionException("Failed to read image bytes from provided resource.", ex);
throw new CloudVisionException(READ_BYTES_ERROR_MESSAGE, ex);
}

Image image = Image.newBuilder().setContent(imgBytes).build();
Expand All @@ -143,8 +193,60 @@ public AnnotateImageResponse analyzeImage(
if (!annotateImageResponses.isEmpty()) {
return annotateImageResponses.get(0);
} else {
throw new CloudVisionException(
"Failed to receive valid response Vision APIs; empty response received.");
throw new CloudVisionException(EMPTY_RESPONSE_ERROR_MESSAGE);
}
}

/**
* Analyze a file and extract the features of the image specified by {@code featureTypes}.
*
* <p>A feature describes the kind of Cloud Vision analysis one wishes to perform on a file, such
* as text detection, image labelling, facial detection, etc. A full list of feature types can be
* found in {@link Feature.Type}.
*
* @param fileResource the file one wishes to analyze. The Cloud Vision APIs support image formats
* described here: https://cloud.google.com/vision/docs/supported-files. Documents with more
* than 5 pages are not supported.
* @param mimeType the mime type of the fileResource. Currently, only "application/pdf",
* "image/tiff" and "image/gif" are supported.
* @param featureTypes the types of image analysis to perform on the image
* @return the results of file analyse
* @throws CloudVisionException if the file could not be read or if a malformed response is
* received from the Cloud Vision APIs
*/
public AnnotateFileResponse analyzeFile(
Resource fileResource, String mimeType, Feature.Type... featureTypes) {
ByteString imgBytes;
try {
imgBytes = ByteString.readFrom(fileResource.getInputStream());
} catch (IOException ex) {
throw new CloudVisionException(READ_BYTES_ERROR_MESSAGE, ex);
}

InputConfig inputConfig =
InputConfig.newBuilder().setMimeType(mimeType).setContent(imgBytes).build();

List<Feature> featureList =
Arrays.stream(featureTypes)
.map(featureType -> Feature.newBuilder().setType(featureType).build())
.collect(Collectors.toList());

BatchAnnotateFilesRequest request =
BatchAnnotateFilesRequest.newBuilder()
.addRequests(
AnnotateFileRequest.newBuilder()
meltsufin marked this conversation as resolved.
Show resolved Hide resolved
.addAllFeatures(featureList)
.setInputConfig(inputConfig)
.build())
.build();

BatchAnnotateFilesResponse response = this.imageAnnotatorClient.batchAnnotateFiles(request);
List<AnnotateFileResponse> annotateFileResponses = response.getResponsesList();

if (!annotateFileResponses.isEmpty()) {
return annotateFileResponses.get(0);
} else {
throw new CloudVisionException(EMPTY_RESPONSE_ERROR_MESSAGE);
}
}
}
Loading