Skip to content

Commit

Permalink
Creating composed models in FormRecognizer (Azure#15925)
Browse files Browse the repository at this point in the history
  • Loading branch information
samvaity authored Oct 14, 2020
1 parent ebdd414 commit 6eed58a
Show file tree
Hide file tree
Showing 54 changed files with 5,358 additions and 171 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,7 @@
<suppress checks="com.azure.tools.checkstyle.checks.GoodLoggingCheck" files="com.azure.ai.formrecognizer.implementation.Utility.java" />
<suppress checks="com.azure.tools.checkstyle.checks.GoodLoggingCheck" files="com.azure.search.documents.implementation.util.FieldBuilder.java" />
<suppress checks="com.azure.tools.checkstyle.checks.GoodLoggingCheck" files="com.azure.search.documents.SearchFilter.java"/>
<suppress checks="com.azure.tools.checkstyle.checks.GoodLoggingCheck" files="com.azure.ai.formrecognizer.implementation.PrivateFieldAccessHelper.java" />

<suppress checks="com.azure.tools.checkstyle.checks.EnforceFinalFieldsCheck" files="com.azure.ai.formrecognizer.models.FieldValue" />

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1279,6 +1279,61 @@
<Bug pattern="SE_BAD_FIELD" />
</Match>

<Match>
<Class name="com.azure.ai.formrecognizer.training.models.CustomFormModel" />
<Field name="modelName" />
<Bug pattern="UWF_UNWRITTEN_FIELD" />
</Match>

<Match>
<Class name="com.azure.ai.formrecognizer.training.models.CustomFormModel" />
<Field name="customFormModelProperties" />
<Bug pattern="UWF_UNWRITTEN_FIELD" />
</Match>

<Match>
<Class name="com.azure.ai.formrecognizer.training.models.CustomFormModelInfo" />
<Field name="modelName" />
<Bug pattern="UWF_UNWRITTEN_FIELD" />
</Match>

<Match>
<Class name="com.azure.ai.formrecognizer.training.models.CustomFormModelInfo" />
<Field name="customFormModelProperties" />
<Bug pattern="UWF_UNWRITTEN_FIELD" />
</Match>

<Match>
<Class name="com.azure.ai.formrecognizer.training.models.CustomFormSubmodel" />
<Field name="modelId" />
<Bug pattern="UWF_UNWRITTEN_FIELD" />
</Match>

<Match>
<Class name="com.azure.ai.formrecognizer.training.models.CustomFormModelProperties" />
<Field name="isComposed" />
<Bug pattern="UWF_UNWRITTEN_FIELD" />
</Match>

<Match>
<Class name="com.azure.ai.formrecognizer.training.models.TrainingDocumentInfo" />
<Field name="modelId" />
<Bug pattern="UWF_UNWRITTEN_FIELD" />
</Match>

<Match>
<Class name="com.azure.ai.formrecognizer.models.RecognizedForm" />
<Field name="formTypeConfidence" />
<Bug pattern="UWF_UNWRITTEN_FIELD" />
</Match>

<Match>
<Class name="com.azure.ai.formrecognizer.models.RecognizedForm" />
<Field name="modelId" />
<Bug pattern="UWF_UNWRITTEN_FIELD" />
</Match>


<!-- Exception is required to catch, ref: code comment in the OrderbyRowComparer::compare() method -->
<Match>
<Class name="com.azure.cosmos.implementation.query.orderbyquery.OrderbyRowComparer"/>
Expand Down
17 changes: 13 additions & 4 deletions sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
# Release History

## 3.1.0-beta.1 (Unreleased)
### New Features
- Added implementation support to create a composed model from the `FormTrainingClient` by calling method `beginCreateComposedModel`.
- Added properties `modelName` and `customFormModelProperties` to types `CustomFormModel` and `CustomFormModelInfo`.
- Added property `modelName` to `TrainingOptions` and new type `CreateComposedModelOptions`.
- Added property `modelId` to `CustomFormSubmodel` and `TrainingDocumentInfo`.
- Added properties `modelId` and `formTypeConfidence` to `RecognizedForm`.
`modelId` has a null value on `RecognizedForm` for prebuilt APIs.

### Breaking changes
- Defaults to the latest supported API version, which currently is `2.1-preview.1`.

## 3.0.2 (2020-10-06)
### Dependency updates
Expand Down Expand Up @@ -124,11 +133,11 @@ https://azure.github.io/azure-sdk/releases/latest/java.html.

- It uses the Form Recognizer service `v2.0-preview.1` API.
- Two client design:
- `FormRecognizerClient` to analyze fields/values on custom forms, receipts, and form content/layout
- `FormTrainingClient` to train custom models (with/without labels), and manage the custom models on your account
- `FormRecognizerClient` to analyze fields/values on custom forms, receipts, and form content/layout
- `FormTrainingClient` to train custom models (with/without labels), and manage the custom models on your account
- Different analyze methods based on input type: file stream or URL.
- URL input should use the method with suffix `fromUrl`
- Stream methods will automatically detect content-type of the input file if not provided.
- URL input should use the method with suffix `fromUrl`
- Stream methods will automatically detect content-type of the input file if not provided.
- Authentication with API key supported using `AzureKeyCredential("<api_key>")` from `com.azure.core.credential`
- All service errors use the base type: `com.azure.ai.formrecognizer.models.ErrorResponseException`
- Reactive streams support using [Project Reactor](https://projectreactor.io/).
Expand Down
34 changes: 22 additions & 12 deletions sdk/formrecognizer/azure-ai-formrecognizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,14 @@ resource, or by running the following Azure CLI command to get the key from the
az cognitiveservices account keys list --resource-group <your-resource-group-name> --name <your-resource-name>
```
Use the API key as the credential parameter to authenticate the client:
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L47-L50 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L49-L52 -->
```java
FormRecognizerClient formRecognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildClient();
```
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L57-L60 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L59-L62 -->
```java
FormTrainingClient formTrainingClient = new FormTrainingClientBuilder()
.credential(new AzureKeyCredential("{key}"))
Expand All @@ -98,7 +98,7 @@ FormTrainingClient formTrainingClient = new FormTrainingClientBuilder()

The Azure Form Recognizer client library provides a way to **rotate the existing key**.

<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L67-L73 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L69-L75 -->
```java
AzureKeyCredential credential = new AzureKeyCredential("{key}");
FormRecognizerClient formRecognizerClient = new FormRecognizerClientBuilder()
Expand Down Expand Up @@ -137,7 +137,7 @@ Authorization is easiest using [DefaultAzureCredential][wiki_identity]. It finds
running environment. For more information about using Azure Active Directory authorization with Form Recognizer, please
refer to [the associated documentation][aad_authorization].

<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L80-L84 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L82-L86 -->
```java
TokenCredential credential = new DefaultAzureCredentialBuilder().build();
FormRecognizerClient formRecognizerClient = new FormRecognizerClientBuilder()
Expand Down Expand Up @@ -168,6 +168,7 @@ A `CustomFormModel` is returned indicating the fields the model will extract, as
each field. See the [service's documents][fr_train_with_labels] for a more detailed explanation.
- Managing models created in your account. See example [Manage models](#manage-your-models).
- Copying a custom model from one Form Recognizer resource to another.
- Creating a composed model from a collection of existing trained models with labels.

Please note that models can also be trained using a graphical user interface such as the [Form Recognizer Labeling Tool][fr_labeling_tool].

Expand Down Expand Up @@ -195,7 +196,7 @@ The following section provides several code snippets covering some of the most c
### Recognize Forms Using a Custom Model
Recognize name/value pairs and table data from forms. These models are trained with your own data,
so they're tailored to your forms. You should only recognize forms of the same form type that the custom model was trained on.
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L88-L104 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L90-L107 -->
```java
String formUrl = "{form_url}";
String modelId = "{custom_trained_model_id}";
Expand All @@ -208,6 +209,7 @@ for (int i = 0; i < recognizedForms.size(); i++) {
RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
System.out.printf("Form type confidence: %.2f%n", form.getFormTypeConfidence());
form.getFields().forEach((label, formField) ->
System.out.printf("Field %s has value %s with confidence score of %f.%n", label,
formField.getValueData().getText(),
Expand All @@ -218,7 +220,7 @@ for (int i = 0; i < recognizedForms.size(); i++) {

### Recognize Content
Recognize text and table structures, along with their bounding box coordinates, from documents.
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L113-L136 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L116-L139 -->
```java
// recognize form content using file input stream
File form = new File("local/file_path/filename.png");
Expand Down Expand Up @@ -252,7 +254,7 @@ can be found [here][service_recognize_receipt].
See [StronglyTypedRecognizedForm][strongly_typed_sample] for a suggested approach to extract
information from receipts.

<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L140-L196-->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L143-L199-->
```java
String receiptUrl = "https://docs.microsoft.com/azure/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
Expand Down Expand Up @@ -317,16 +319,21 @@ for (int i = 0; i < receiptPageResults.size(); i++) {
Train a machine-learned model on your own form type. The resulting model will be able to recognize values from the types of forms it was trained on.
Provide a container SAS url to your Azure Storage Blob container where you're storing the training documents. See details on setting this up
in the [service quickstart documentation][quickstart_training].
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L200-L220 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L203-L229 -->
```java
String trainingFilesUrl = "{SAS_URL_of_your_container_in_blob_storage}";
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller =
formTrainingClient.beginTraining(trainingFilesUrl, false);
formTrainingClient.beginTraining(trainingFilesUrl,
false,
new TrainingOptions()
.setModelName("my model trained without labels"),
Context.NONE);

CustomFormModel customFormModel = trainingPoller.getFinalResult();

// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model name given by user: %s%n", customFormModel.getModelName());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
Expand All @@ -335,6 +342,7 @@ System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("Submodel Id: %s%n: ", customFormSubmodel.getModelId());
// Since the training data is unlabeled, we are unable to return the accuracy of this model
customFormSubmodel.getFields().forEach((field, customFormModelField) ->
System.out.printf("Field: %s Field Label: %s%n",
Expand All @@ -344,7 +352,7 @@ customFormModel.getSubmodels().forEach(customFormSubmodel -> {

### Manage your models
Manage the custom models in your Form Recognizer account.
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L224-L252 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L233-L261 -->
```java
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = formTrainingClient.getAccountProperties();
Expand Down Expand Up @@ -385,7 +393,7 @@ to provide an invalid file source URL an `HttpResponseException` would be raised
In the following code snippet, the error is handled
gracefully by catching the exception and display the additional information about the error.

<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L259-L263 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L268-L272 -->
```java
try {
formRecognizerClient.beginRecognizeContentFromUrl("invalidSourceUrl");
Expand Down Expand Up @@ -420,7 +428,7 @@ These code samples show common scenario operations with the Azure Form Recognize
#### Async APIs
All the examples shown so far have been using synchronous APIs, but we provide full support for async APIs as well.
You'll need to use `FormRecognizerAsyncClient`
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L270-L273 -->
<!-- embedme ./src/samples/java/com/azure/ai/formrecognizer/ReadmeSamples.java#L279-L282 -->
```java
FormRecognizerAsyncClient formRecognizerAsyncClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential("{key}"))
Expand All @@ -436,6 +444,7 @@ FormRecognizerAsyncClient formRecognizerAsyncClient = new FormRecognizerClientBu
* Train a model with labels: [TrainModelWithLabelsAsync][train_labeled_model_async]
* Manage custom models: [ManageCustomModelsAsync][manage_custom_models_async]
* Copy a model between Form Recognizer resources: [CopyModelAsync][copy_model_async]
* Create a composed model from a collection of models trained with labels: [CreateComposedModelAsync][create_composed_model_async]

### Additional documentation

Expand Down Expand Up @@ -465,6 +474,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For m
[coc_faq]: https://opensource.microsoft.com/codeofconduct/faq/
[coc_contact]: mailto:opencode@microsoft.com
[create_new_resource]: https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account?tabs=multiservice%2Cwindows#create-a-new-azure-cognitive-services-resource
[create_composed_model_async]: https://github.com/Azure/azure-sdk-for-java/blob/f90e98012c042179bbedec11b77614b62088d2e1/sdk/formrecognizer/azure-ai-formrecognizer/src/samples/java/com/azure/ai/formrecognizer/CreateComposedModelAsync.java
[differentiate_custom_forms_with_labeled_and_unlabeled_models]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/src/samples/java/com/azure/ai/formrecognizer/AdvancedDiffLabeledUnlabeledData.java
[form_recognizer_account]: https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account?tabs=multiservice%2Cwindows
[form_recognizer_async_client]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/src/main/java/com/azure/ai/formrecognizer/FormRecognizerAsyncClient.java
Expand Down
Loading

0 comments on commit 6eed58a

Please sign in to comment.