Add support for Vision BatchAnnotateFiles endpoint (GoogleCloudPlatfo…

…rm#966) Add support for Vision BatchAnnotateFiles endpoint fixes GoogleCloudPlatform#844
kateryna216 · Mar 3, 2022 · acbe264 · acbe264
1 parent b77c070
commit acbe264
Show file tree

Hide file tree

Showing 10 changed files with 342 additions and 7 deletions.
diff --git a/docs/src/main/asciidoc/vision.adoc b/docs/src/main/asciidoc/vision.adoc
@@ -7,7 +7,7 @@ Spring Cloud GCP provides:
 
 * A convenience starter which automatically configures authentication settings and client objects needed to begin using the https://cloud.google.com/vision/[Google Cloud Vision API].
 * `CloudVisionTemplate` which simplifies interactions with the Cloud Vision API.
-** Allows you to easily send images to the API as Spring Resources.
+** Allows you to easily send images, PDF, TIFF and GIF documents to the API as Spring Resources.
 ** Offers convenience methods for common operations, such as classifying content of an image.
 * `DocumentOcrTemplate` which offers convenient methods for running https://cloud.google.com/vision/docs/pdf[optical character recognition (OCR)] on PDF and TIFF documents.
 
@@ -120,6 +120,62 @@ public void processImage() {
 }
 ----
 
+=== File Analysis
+
+The `CloudVisionTemplate` allows you to easily analyze PDF, TIFF and GIF documents; it provides the following method for interfacing with Cloud Vision:
+
+`public AnnotateFileResponse analyzeFile(Resource fileResource, String mimeType, Feature.Type... featureTypes)`
+
+**Parameters:**
+
+- `Resource fileResource` refers to the Spring Resource of the PDF, TIFF or GIF object you wish to analyze.
+Documents with more than 5 pages are not supported.
+
+- `String mimeType` is the mime type of the fileResource.
+Currently, only `application/pdf`, `image/tiff` and `image/gif` are supported.
+
+- `Feature.Type... featureTypes` refers to a var-arg array of Cloud Vision Features to extract from the document.
+A feature refers to a kind of image analysis one wishes to perform on a document, such as label detection, OCR recognition, facial detection, etc.
+One may specify multiple features to analyze within one request.
+A full list of Cloud Vision Features is provided in the https://cloud.google.com/vision/docs/features[Cloud Vision Feature docs].
+
+**Returns:**
+
+- https://cloud.google.com/vision/docs/reference/rpc/google.cloud.vision.v1#google.cloud.vision.v1.AnnotateFileResponse[`AnnotateFileResponse`] contains the results of all the feature analyses that were specified in the request.
+For each page of the analysed document the response will contain an `AnnotateImageResponse` object which you can retrieve using `annotateFileResponse.getResponsesList()`.
+For each feature type that you provide in the request, `AnnotateImageResponse` provides a getter method to get the result of that feature analysis.
+For example, if you analysed an PDF using the `DOCUMENT_TEXT_DETECTION` feature, you would retrieve the results from the response using `annotateImageResponse.getFullTextAnnotation().getText()`.
++
+`AnnotateFileResponse` is provided by the Google Cloud Vision libraries; please consult the https://cloud.google.com/vision/docs/reference/rpc/google.cloud.vision.v1#google.cloud.vision.v1.AnnotateFileResponse[RPC reference] or https://googleapis.dev/java/google-cloud-vision/latest/index.html?com/google/cloud/vision/v1/AnnotateFileResponse.html[Javadoc] for more details.
+Additionally, you may consult the https://cloud.google.com/vision/docs/[Cloud Vision docs] to familiarize yourself with the concepts and features of the API.
+
+==== Running Text Detection Example
+
+https://cloud.google.com/vision/docs/file-small-batch[Detect text in files] refers to extracting text from small document such as PDF or TIFF.
+Below is a code sample of how this is done using the Cloud Vision Spring Template.
+
+[source,java]
+----
+@Autowired
+private ResourceLoader resourceLoader;
+
+@Autowired
+private CloudVisionTemplate cloudVisionTemplate;
+
+public void processPdf() {
+  Resource imageResource = this.resourceLoader.getResource("my_file.pdf");
+  AnnotateFileResponse response =
+    this.cloudVisionTemplate.analyzeFile(
+        imageResource, "application/pdf", Type.DOCUMENT_TEXT_DETECTION);
+
+  response
+    .getResponsesList()
+    .forEach(
+        annotateImageResponse ->
+            System.out.println(annotateImageResponse.getFullTextAnnotation().getText()));
+}
+----
+
 === Document OCR Template
 
 The `DocumentOcrTemplate` allows you to easily run https://cloud.google.com/vision/docs/pdf[optical character recognition (OCR)] on your PDF and TIFF documents stored in your Google Storage bucket.

diff --git a/...amples/spring-cloud-gcp-vision-api-sample/src/main/java/com/example/VisionController.java b/...amples/spring-cloud-gcp-vision-api-sample/src/main/java/com/example/VisionController.java
@@ -78,4 +78,15 @@ public ModelAndView extractText(String imageUrl, ModelMap map) {
 
     return new ModelAndView("result", map);
   }
+
+  @GetMapping("/extractTextFromPdf")
+  public ModelAndView extractTextFromPdf(String pdfUrl, ModelMap map) {
+    List<String> texts =
+        this.cloudVisionTemplate.extractTextFromPdf(this.resourceLoader.getResource(pdfUrl));
+
+    map.addAttribute("texts", texts);
+    map.addAttribute("pdfUrl", pdfUrl);
+
+    return new ModelAndView("result_pdf", map);
+  }
 }
diff --git a/...cloud-gcp-samples/spring-cloud-gcp-vision-api-sample/src/main/resources/static/index.html b/...cloud-gcp-samples/spring-cloud-gcp-vision-api-sample/src/main/resources/static/index.html
@@ -31,5 +31,17 @@ <h1>Text Extraction</h1>
     </form>
 </div>
 
+<div>
+    <h1>Text Extraction PDF</h1>
+    <p>Read and extract the text from a small PDF (maximum 5 Pages are supported):</p>
+    <form action="/extractTextFromPdf">
+        Web URL of a PDF to analyze:
+        <input type="text"
+               name="pdfUrl"
+               value="https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf" />
+        <input type="submit" />
+    </form>
+</div>
+
 </body>
 </html>
diff --git a/...p-samples/spring-cloud-gcp-vision-api-sample/src/main/resources/templates/result_pdf.html b/...p-samples/spring-cloud-gcp-vision-api-sample/src/main/resources/templates/result_pdf.html
@@ -0,0 +1,37 @@
+<!DOCTYPE html>
+<html xmlns:th="https://www.thymeleaf.org">
+  <head>
+    <title>Google Cloud Vision Results</title>
+    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+  </head>
+
+  <style>
+    .extracted_text {
+      width: 30em;
+      min-height: 10em;
+      font-family: "Lucida Console", Monaco, monospace;
+      background-color: #f2f2f2;
+    }
+
+    .embed_pdf {
+      width: 30em;
+      height: 30em;
+    }
+  </style>
+
+  <body>
+    <h1>PDF Analysis Results</h1>
+
+    <div th:if="${texts}">
+      <h3>We think the text inside the pdf is...</h3>
+      <div th:each="text: ${texts}">
+        <h2>Page: [[${textStat.index} + 1]]</h2>
+        <div class="extracted_text">[[${text}]]</div>
+      </div>
+    </div>
+
+    <div>
+      <embed th:src="${pdfUrl}" class="embed_pdf" />
+    </div>
+  </body>
+</html>
diff --git a/...ng-cloud-gcp-vision/src/main/java/com/google/cloud/spring/vision/CloudVisionTemplate.java b/...ng-cloud-gcp-vision/src/main/java/com/google/cloud/spring/vision/CloudVisionTemplate.java
@@ -16,15 +16,20 @@
 
 package com.google.cloud.spring.vision;
 
+import com.google.cloud.vision.v1.AnnotateFileRequest;
+import com.google.cloud.vision.v1.AnnotateFileResponse;
 import com.google.cloud.vision.v1.AnnotateImageRequest;
 import com.google.cloud.vision.v1.AnnotateImageResponse;
+import com.google.cloud.vision.v1.BatchAnnotateFilesRequest;
+import com.google.cloud.vision.v1.BatchAnnotateFilesResponse;
 import com.google.cloud.vision.v1.BatchAnnotateImagesRequest;
 import com.google.cloud.vision.v1.BatchAnnotateImagesResponse;
 import com.google.cloud.vision.v1.Feature;
 import com.google.cloud.vision.v1.Feature.Type;
 import com.google.cloud.vision.v1.Image;
 import com.google.cloud.vision.v1.ImageAnnotatorClient;
 import com.google.cloud.vision.v1.ImageContext;
+import com.google.cloud.vision.v1.InputConfig;
 import com.google.protobuf.ByteString;
 import com.google.rpc.Code;
 import java.io.IOException;
@@ -41,6 +46,11 @@
  */
 public class CloudVisionTemplate {
 
+  public static final String READ_BYTES_ERROR_MESSAGE =
+      "Failed to read bytes from provided resource.";
+  public static final String EMPTY_RESPONSE_ERROR_MESSAGE =
+      "Failed to receive valid response Vision APIs; empty response received.";
+
   private final ImageAnnotatorClient imageAnnotatorClient;
 
   public CloudVisionTemplate(ImageAnnotatorClient imageAnnotatorClient) {
@@ -59,6 +69,17 @@ public String extractTextFromImage(Resource imageResource) {
     return extractTextFromImage(imageResource, ImageContext.getDefaultInstance());
   }
 
+  /**
+   * Extract the text out of a pdf and return the result as a String.
+   *
+   * @param fileResource the pdf one wishes to analyze
+   * @return the text extracted from the pdf as a string per page
+   * @throws CloudVisionException if the image could not be read or if text extraction failed
+   */
+  public List<String> extractTextFromPdf(Resource fileResource) {
+    return extractTextFromFile(fileResource, "application/pdf");
+  }
+
   /**
    * Extract the text out of an image and return the result as a String.
    *
@@ -78,6 +99,35 @@ public String extractTextFromImage(Resource imageResource, ImageContext imageCon
     return result;
   }
 
+  /**
+   * Extract the text out of a file and return the result as a String.
+   *
+   * @param fileResource the file one wishes to analyze
+   * @param mimeType the mime type of the fileResource. Currently, only "application/pdf",
+   *     "image/tiff" and "image/gif" are supported.
+   * @return the text extracted from the pdf as a string per page
+   * @throws CloudVisionException if the image could not be read or if text extraction failed
+   */
+  public List<String> extractTextFromFile(Resource fileResource, String mimeType) {
+    AnnotateFileResponse response =
+        analyzeFile(fileResource, mimeType, Type.DOCUMENT_TEXT_DETECTION);
+
+    List<AnnotateImageResponse> annotateImageResponses = response.getResponsesList();
+    if (annotateImageResponses.isEmpty()) {
+      throw new CloudVisionException(EMPTY_RESPONSE_ERROR_MESSAGE);
+    }
+
+    List<String> result =
+        annotateImageResponses.stream()
+            .map(annotateImageResponse -> annotateImageResponse.getFullTextAnnotation().getText())
+            .collect(Collectors.toList());
+    if (result.isEmpty() && response.getError().getCode() != Code.OK.getNumber()) {
+      throw new CloudVisionException(response.getError().getMessage());
+    }
+
+    return result;
+  }
+
   /**
    * Analyze an image and extract the features of the image specified by {@code featureTypes}.
    *
@@ -117,7 +167,7 @@ public AnnotateImageResponse analyzeImage(
     try {
       imgBytes = ByteString.readFrom(imageResource.getInputStream());
     } catch (IOException ex) {
-      throw new CloudVisionException("Failed to read image bytes from provided resource.", ex);
+      throw new CloudVisionException(READ_BYTES_ERROR_MESSAGE, ex);
     }
 
     Image image = Image.newBuilder().setContent(imgBytes).build();
@@ -143,8 +193,60 @@ public AnnotateImageResponse analyzeImage(
     if (!annotateImageResponses.isEmpty()) {
       return annotateImageResponses.get(0);
     } else {
-      throw new CloudVisionException(
-          "Failed to receive valid response Vision APIs; empty response received.");
+      throw new CloudVisionException(EMPTY_RESPONSE_ERROR_MESSAGE);
+    }
+  }
+
+  /**
+   * Analyze a file and extract the features of the image specified by {@code featureTypes}.
+   *
+   * <p>A feature describes the kind of Cloud Vision analysis one wishes to perform on a file, such
+   * as text detection, image labelling, facial detection, etc. A full list of feature types can be
+   * found in {@link Feature.Type}.
+   *
+   * @param fileResource the file one wishes to analyze. The Cloud Vision APIs support image formats
+   *     described here: https://cloud.google.com/vision/docs/supported-files. Documents with more
+   *     than 5 pages are not supported.
+   * @param mimeType the mime type of the fileResource. Currently, only "application/pdf",
+   *     "image/tiff" and "image/gif" are supported.
+   * @param featureTypes the types of image analysis to perform on the image
+   * @return the results of file analyse
+   * @throws CloudVisionException if the file could not be read or if a malformed response is
+   *     received from the Cloud Vision APIs
+   */
+  public AnnotateFileResponse analyzeFile(
+      Resource fileResource, String mimeType, Feature.Type... featureTypes) {
+    ByteString imgBytes;
+    try {
+      imgBytes = ByteString.readFrom(fileResource.getInputStream());
+    } catch (IOException ex) {
+      throw new CloudVisionException(READ_BYTES_ERROR_MESSAGE, ex);
+    }
+
+    InputConfig inputConfig =
+        InputConfig.newBuilder().setMimeType(mimeType).setContent(imgBytes).build();
+
+    List<Feature> featureList =
+        Arrays.stream(featureTypes)
+            .map(featureType -> Feature.newBuilder().setType(featureType).build())
+            .collect(Collectors.toList());
+
+    BatchAnnotateFilesRequest request =
+        BatchAnnotateFilesRequest.newBuilder()
+            .addRequests(
+                AnnotateFileRequest.newBuilder()
+                    .addAllFeatures(featureList)
+                    .setInputConfig(inputConfig)
+                    .build())
+            .build();
+
+    BatchAnnotateFilesResponse response = this.imageAnnotatorClient.batchAnnotateFiles(request);
+    List<AnnotateFileResponse> annotateFileResponses = response.getResponsesList();
+
+    if (!annotateFileResponses.isEmpty()) {
+      return annotateFileResponses.get(0);
+    } else {
+      throw new CloudVisionException(EMPTY_RESPONSE_ERROR_MESSAGE);
     }
   }
 }