Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added API to inject custom serializer for point read and query #38997

Conversation

FabianMeiswinkel
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel commented Feb 28, 2024

Description

The goal of this PR is to add a new API that allows customers to extend the built-in serialization. Today, the Cosmso DB client has multiple methods where the customer can specify the PoJo type of documents/items to be returned. For example...

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
package com.azure.cosmos;

public class CosmosAsyncContainer {
    [...]

/**
     * Reads an item by itemId.
     * <br/>
     * This operation is used to retrieve a single item from a container based on its unique identifier (ID) and partition key.
     * The readItem operation provides direct access to a specific item using its unique identifier, which consists of the item's ID and the partition key value. This operation is efficient for retrieving a known item by its ID and partition key without the need for complex querying.
     * <p>
     * After subscription the operation will be performed.
     * The {@link Mono} upon successful completion will contain an item response with the read item.
     * <!-- src_embed com.azure.cosmos.CosmosAsyncContainer.readItem -->
     * <pre>
     * &#47;&#47; Read an item
     * cosmosAsyncContainer.readItem&#40;passenger.getId&#40;&#41;, new PartitionKey&#40;passenger.getId&#40;&#41;&#41;, Passenger.class&#41;
     *     .flatMap&#40;response -&gt; Mono.just&#40;response.getItem&#40;&#41;&#41;&#41;
     *     .subscribe&#40;passengerItem -&gt; System.out.println&#40;passengerItem&#41;, throwable -&gt; &#123;
     *         CosmosException cosmosException = &#40;CosmosException&#41; throwable;
     *         cosmosException.printStackTrace&#40;&#41;;
     *     &#125;&#41;;
     * &#47;&#47; ...
     * </pre>
     * <!-- end com.azure.cosmos.CosmosAsyncContainer.readItem -->
     *
     * @param <T> the type parameter.
     * @param itemId the item id.
     * @param partitionKey the partition key.
     * @param itemType the item type.
     * @return an {@link Mono} containing the Cosmos item response with the read item or an error.
     */
    public <T> Mono<CosmosItemResponse<T>> readItem(String itemId, PartitionKey partitionKey, Class<T> itemType) {
        return readItem(itemId, partitionKey, ModelBridgeInternal.createCosmosItemRequestOptions(partitionKey), itemType);
    }

    /**
     * Reads an item by itemId using a configured {@link CosmosItemRequestOptions}.
     * <br/>
     * This operation is used to retrieve a single item from a container based on its unique identifier (ID) and partition key.
     * The readItem operation provides direct access to a specific item using its unique identifier, which consists of the item's ID and the partition key value. This operation is efficient for retrieving a known item by its ID and partition key without the need for complex querying.
     * <p>
     * After subscription the operation will be performed.
     * The {@link Mono} upon successful completion will contain a Cosmos item response with the read item.
     *
     * @param <T> the type parameter.
     * @param itemId the item id.
     * @param partitionKey the partition key.
     * @param options the request (Optional) {@link CosmosItemRequestOptions}.
     * @param itemType the item type.
     * @return an {@link Mono} containing the Cosmos item response with the read item or an error.
     */
    public <T> Mono<CosmosItemResponse<T>> readItem(
        String itemId, PartitionKey partitionKey,
        CosmosItemRequestOptions options, Class<T> itemType) {
        [...]
    }

    [...]
}

Today, the serialization into the Class Pojo type happens with Jackson's ObjectMapper and pre-set serialization settings. The only way around this right now, is to use com.fasterxml.jackson.databind.node.ObjectNode as the Pojo type - this would still do the json parsing with the pre-defined setting, but customers can use their own ObjectMapper to customize actual serialization/deserialization. But even this approach has some drawbacks - for example it wouldn't be easily possible to do custom payload transformations - for example to put the Json-payload into an envelope, extract id, partition key and maybe some queriable properties and compress the remaining part etc. So, the goal of this PR is, to add an API allowing those payload transformations and also allow overriding serialization settings on a per-operation or client-wide.

The limitation of not allowing to modify json parsing -setting is maintained - this is needed to make sure that the json payload store in the database is readable in the service. Cosmos DB is - and stays - a json database after all.

CosmosItemSerializer trait

As a representation of a Cosmos DB item Map<String, Object> is used - why a generic map, not Jackson's ObjectNode or similar? The goal is to allow serialization to even use different json stacks (gson instead of Jackson etc.) - so, an abstraction of a json tree is needed - ideally one where efficient conversions exist. Map<String, Object> is a reasonably good compromise here.

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.cosmos;

/**
 * The {@link CosmosItemSerializer} allows customizing the serialization of Cosmos Items - either to transform payload (for
 * example wrap/unwrap in custom envelopes) or use custom serialization settings or json serializer stacks.
 */
public abstract class CosmosItemSerializer {
    private final static ObjectMapper objectMapper = Utils.getSimpleObjectMapper();

    /**
     * Gets the default Cosmos item serializer. This serializer is used by default when no custom serializer is
     * specified on request options or the {@link CosmosClientBuilder}
     */
    public final static CosmosItemSerializer DEFAULT_SERIALIZER = new DefaultCosmosItemSerializer();

    /**
     * Used to instantiate subclasses
     */
    protected CosmosItemSerializer() {
    }

    /**
     * Used to serialize a POJO into a json tree
     * @param item the POJO to be serialized
     * @return the json tree that will be used as payload in Cosmos DB items
     * @param <T> The type of the POJO
     */
    public abstract <T> Map<String, Object> serialize(T item);

    /**
     * Used to deserialize the json tree stored in the Cosmos DB item as a POJO
     * @param jsonNodeMap the json tree from the Cosmos DB item
     * @param classType The type of the POJO
     * @return The deserialized POJO
     * @param <T> The type of the POJO
     */
    public abstract  <T> T deserialize(Map<String, Object> jsonNodeMap, Class<T> classType);
}

Per-client serialization defaults

In the CosmsoClientBuilder a new API is added, which allows specifying the default serialization options. If not specified, the default serialization settings are the same as the -pre-set settings today.

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
package com.azure.cosmos;

@ServiceClientBuilder(serviceClients = {CosmosClient.class, CosmosAsyncClient.class})
public class CosmosClientBuilder implements
    TokenCredentialTrait<CosmosClientBuilder>,
    AzureKeyCredentialTrait<CosmosClientBuilder>,
    EndpointTrait<CosmosClientBuilder> {
    /**
     * Sets a custom serializer that should be used for conversion between POJOs and Json payload stored in the
     * Cosmos DB service. The custom serializer can also be specified in request options. If defined here and
     * in request options the serializer defined in request options will be used.
     * @param serializer the custom serialzier to be used for payload transformations
     * @return current CosmosClientBuilder
     */
    public CosmosClientBuilder setCustomSerializer(CosmosItemSerializer serializer) {
        this.defaultCustomSerializer = serializer;

        return this;
    }

    [...]
}

Per-operation serialization overrides

Like many other configuration settings, the serialization behavior can be overridden in request options. When no serialization settings are specified in request options, the default of the client is used. If specified, the request options always override whatever is defined at the client level.

The request-option level override of serialization settings is possible in the following classes:

  • com.azure.cosmos.models.CosmosItemRequestOptions (all point operations - including patch)
  • com.azure.cosmos.models.CosmosQueryRequestOptions (for queries and read all)
  • com.azure.cosmos.models.CosmosChangeFeedRequestOptions (for change feed pull model)
  • com.azure.cosmos.models.CosmosReadManyRequestOptions (for read many API)
  • com.azure.cosmos.models.CosmosBatchRequestOptions (for transactional batch)
  • com.azure.cosmos.models.CosmosBulkExecutionOptions (for bulk operations)

All of the above request options would expose a getCustomSerializer and sertCustomSerializer method like below...

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
package com.azure.cosmos.models;

public class CosmosItemRequestOptions {

    /**
     * Gets the custom item serializer defined for this instance of request options
     * @return the custom item serializer
     */
    public CosmosItemSerializer getCustomSerializer() {
        return this.customSerializer;
    }

    /**
     * Allows specifying a custom item serializer to be used for this operation. If the serializer
     * on the request options is null, the serializer on CosmosClientBuilder is used. If both serializers
     * are null (the default), an internal Jackson ObjectMapper is ued for serialization/deserialization.
     * @param itemSerializerOverride the custom item serializer for this operation
     * @return  the CosmosItemRequestOptions.
     */
    public CosmosItemRequestOptions setCustomSerializer(CosmosItemSerializer itemSerializerOverride) {
        this.customSerializer = itemSerializerOverride;

        return this;
    }

    [...]
}

Sequencing in azure-cosmos-encryption

The custom serialization can also be used for azure-cosmos-encryption. But one question worth calling out here is what the order of processing would be:

  • OriginalPayload --> Encryption --> Custom serialization or OriginalPayload --> custom serialization --> encryption

To maintain the configuration of the encryption policy based on the structure in the database the latter OriginalPayload --> custom serialization --> encryption is used. If the custom serialization for example wraps the payload in an envelope the encryption policy would be based on the structure in the database (with the envelope), so the custom serialization needs to be applied first - before any encryption.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@azure-sdk
Copy link
Collaborator

azure-sdk commented Feb 28, 2024

API change check

APIView has identified API level changes in this PR and created following API reviews.

com.azure:azure-cosmos
com.azure:azure-cosmos-encryption

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great changes, thanks

FabianMeiswinkel and others added 8 commits April 22, 2024 21:24
…xception.java

Co-authored-by: Abhijeet Mohanty <mabhijeet1995@gmail.com>
…mos/encryption/CosmosEncryptionAsyncContainer.java

Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>
…mos/encryption/CosmosEncryptionAsyncContainer.java

Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>
@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel FabianMeiswinkel merged commit 98ea926 into Azure:main Apr 23, 2024
83 checks passed
mssfang added a commit that referenced this pull request Apr 23, 2024
* [Automation] Generate Fluent Lite from help#package-2024-03-01-preview (#39848)

* Increment package versions for devcenter releases (#39851)

* Increment package versions for selfhelp releases (#39854)

* Increment package versions for standbypool releases (#39856)

* [Automation] Generate Fluent Lite from confidentialledger#package-preview-2023-06 (#39855)

* [Automation] Generate Fluent Lite from confidentialledger#package-preview-2023-06

* ignore api-version in playback

---------

Co-authored-by: Weidong Xu <weidxu@microsoft.com>

* Regen devcenter from latest main (#39836)

* Increment package versions for confidentialledger releases (#39859)

* [Automation] Generate Fluent Lite from support#package-2024-04 (#39860)

* [Automation] External Change

* [Automation] Generate Fluent Lite from support#package-2024-04

* Update CHANGELOG.md

---------

Co-authored-by: Weidong Xu <weidxu@microsoft.com>

* Increment package versions for support releases (#39864)

* Added capability to use (and enforce) native netty transport in azure-cosmos-spark (#39834)

* Add capability to use (and enforce) native netty transport in azure-cosmos-spark

* Update pom.xml

* Fix build breaks

* Update sdk/cosmos/azure-cosmos-spark_3_2-12/docs/configuration-reference.md

Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>

* Fixing build  breaks regarding antrun plugin

* Update pom.xml

* Update sdk/cosmos/azure-cosmos-spark_3_2-12/docs/configuration-reference.md

Co-authored-by: Kushagra Thapar <kushuthapar@gmail.com>

* Changelogs

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>
Co-authored-by: Kushagra Thapar <kushuthapar@gmail.com>

* Adding azure-core-http-netty dependency to resolve conflicts (#39786)

* [Automation] Generate Fluent Lite from hybridcompute#package-preview-2023-10 (#39852)

* mgmt deviceregistry, local generate from tsp (#39831)

* generate

* manual changelog.md

changelog release date

* fix failed generation sideeffect

* update tsp-location.yaml

* Fix for Remote Rendering build failure (#39868)

* This fixes an issue with Netty version resolution.

Co-authored-by: Christopher Manthei <chmant@microsoft.com>

* Added API to inject custom serializer for point read and query (#38997)

* Added API to inject custom serializer for point read and query

* Update CosmosItemSerializer.java

* Adding CreateItemCustomSerializer

* Update DatabaseAccount.java

* Update JsonSerializable.java

* Fixing primitive serialization

* Update PrimitiveJsonNodeMap.java

* Update ItemBulkOperation.java

* Iterating on fix

* Fixing build issues

* Update CosmosItemSerializer.java

* Fixing test failures

* Attempting to fix flakiness of Change feed split tests

* Update Utils.java

* Iterating on test coverage

* Iterating on test coverage

* Update Utils.java

* Update EncryptionUtils.java

* Iterating on test coverage

* Update Utils.java

* Added upsert to test coverage

* Update ImplementationBridgeHelpers.java

* Update Utils.java

* Added Bulk test coverage

* Update ItemBulkOperation.java

* Update CosmosItemSerializer.java

* Adding batch and change feed test coverage

* Update ItemBatchOperation.java

* Update SimpleSerializationTest.java

* Update ParallelDocumentQueryTest.java

* Adding remaining test coverage

* Update CosmosEncryptionItemSerializerTest.java

* Added change log

* Update TestSuiteBase.java

* Update TestSuiteBase.java

* Fixing JavaDoc break

* NITs

* Update CosmosEncryptionAsyncClient.java

* Fixing test failures

* Exclduing netty-all in Spark 3.1 and 3.2

* Reacting to code review feeedback

* Update sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosException.java

Co-authored-by: Abhijeet Mohanty <mabhijeet1995@gmail.com>

* Update sdk/cosmos/azure-cosmos-encryption/src/main/java/com/azure/cosmos/encryption/CosmosEncryptionAsyncContainer.java

Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>

* Update sdk/cosmos/azure-cosmos-encryption/src/main/java/com/azure/cosmos/encryption/CosmosEncryptionAsyncContainer.java

Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>

* Reacting to code review feedback

* Update FeedResponse.java

---------

Co-authored-by: Abhijeet Mohanty <mabhijeet1995@gmail.com>
Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>

* Increment package versions for deviceregistry releases (#39869)

* Updated list of libraries to patch in patch-release.yml

* Reset sources for azure-messaging-webpubsub to the release version 1.2.13.

* Prepare azure-messaging-webpubsub for 1.2.14 patch release.

* Reset sources for azure-storage-blob-cryptography to the release version 12.24.3.

* Prepare azure-storage-blob-cryptography for 12.24.4 patch release.

* Reset sources for azure-data-schemaregistry to the release version 1.4.4.

* Prepare azure-data-schemaregistry for 1.4.5 patch release.

* Reset sources for azure-storage-blob to the release version 12.25.3.

* Prepare azure-storage-blob for 12.25.4 patch release.

* Reset sources for azure-messaging-eventhubs-checkpointstore-blob to the release version 1.19.2.

* Prepare azure-messaging-eventhubs-checkpointstore-blob for 1.19.3 patch release.

* Reset sources for azure-mixedreality-authentication to the release version 1.2.22.

* Prepare azure-mixedreality-authentication for 1.2.23 patch release.

* Reset sources for azure-developer-loadtesting to the release version 1.0.11.

* Prepare azure-developer-loadtesting for 1.0.12 patch release.

* Reset sources for azure-identity-extensions to the release version 1.1.14.

* Prepare azure-identity-extensions for 1.1.15 patch release.

* Reset sources for azure-containers-containerregistry to the release version 1.2.6.

* Prepare azure-containers-containerregistry for 1.2.7 patch release.

* Reset sources for azure-ai-textanalytics to the release version 5.4.4.

* Prepare azure-ai-textanalytics for 5.4.5 patch release.

* Reset sources for azure-storage-queue to the release version 12.20.3.

* Prepare azure-storage-queue for 12.20.4 patch release.

* Reset sources for azure-security-confidentialledger to the release version 1.0.18.

* Prepare azure-security-confidentialledger for 1.0.19 patch release.

* Reset sources for azure-search-documents to the release version 11.6.3.

* Prepare azure-search-documents for 11.6.4 patch release.

* Reset sources for azure-monitor-query to the release version 1.3.0.

* Prepare azure-monitor-query for 1.3.1 patch release.

* Reset sources for azure-ai-metricsadvisor to the release version 1.1.23.

* Prepare azure-ai-metricsadvisor for 1.1.24 patch release.

* Reset sources for azure-security-keyvault-administration to the release version 4.5.1.

* Prepare azure-security-keyvault-administration for 4.5.2 patch release.

* Reset sources for azure-storage-file-share to the release version 12.21.3.

* Prepare azure-storage-file-share for 12.21.4 patch release.

* Reset sources for azure-storage-common to the release version 12.24.3.

* Prepare azure-storage-common for 12.24.4 patch release.

* Reset sources for azure-storage-blob-batch to the release version 12.21.3.

* Prepare azure-storage-blob-batch for 12.21.4 patch release.

* Reset sources for azure-data-schemaregistry-apacheavro to the release version 1.1.15.

* Prepare azure-data-schemaregistry-apacheavro for 1.1.16 patch release.

* Reset sources for azure-security-attestation to the release version 1.1.22.

* Prepare azure-security-attestation for 1.1.23 patch release.

* Reset sources for azure-storage-internal-avro to the release version 12.10.3.

* Prepare azure-storage-internal-avro for 12.10.4 patch release.

* Reset sources for azure-messaging-webpubsub-client to the release version 1.0.1.

* Prepare azure-messaging-webpubsub-client for 1.0.2 patch release.

* Reset sources for azure-security-keyvault-certificates to the release version 4.6.1.

* Prepare azure-security-keyvault-certificates for 4.6.2 patch release.

* Reset sources for azure-ai-formrecognizer to the release version 4.1.6.

* Prepare azure-ai-formrecognizer for 4.1.7 patch release.

* Reset sources for azure-messaging-eventgrid to the release version 4.22.0.

* Prepare azure-messaging-eventgrid for 4.22.1 patch release.

* Reset sources for azure-storage-file-datalake to the release version 12.18.3.

* Prepare azure-storage-file-datalake for 12.18.4 patch release.

* Reset sources for azure-security-keyvault-secrets to the release version 4.8.1.

* Prepare azure-security-keyvault-secrets for 4.8.2 patch release.

* Reset sources for azure-mixedreality-remoterendering to the release version 1.1.27.

* Prepare azure-mixedreality-remoterendering for 1.1.28 patch release.

* Reset sources for azure-iot-deviceupdate to the release version 1.0.16.

* Prepare azure-iot-deviceupdate for 1.0.17 patch release.

* Reset sources for azure-messaging-eventhubs to the release version 5.18.2.

* Prepare azure-messaging-eventhubs for 5.18.3 patch release.

* Reset sources for azure-digitaltwins-core to the release version 1.3.18.

* Prepare azure-digitaltwins-core for 1.3.19 patch release.

* Updated dependencies in libraries and READMEs via version_client.txt

* fixes cspell error:

##[error]./sdk/search/azure-search-documents/CHANGELOG.md:88:123 - Unknown word (Vetor)
##[error]./sdk/search/azure-search-documents/CHANGELOG.md:879:70 - Unknown word (Cleint)

* nio and changefeed are two beta packages and not in the release for patches. Remove their tests will not impact patch release of storage.

* fixed a broken link

---------

Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com>
Co-authored-by: Weidong Xu <weidxu@microsoft.com>
Co-authored-by: Driele Neves Ribeiro <drielene@microsoft.com>
Co-authored-by: Fabian Meiswinkel <fabianm@microsoft.com>
Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>
Co-authored-by: Kushagra Thapar <kushuthapar@gmail.com>
Co-authored-by: Maqsood Jafferi <mjafferi@microsoft.com>
Co-authored-by: Xiaofei Cao <92354331+XiaofeiCao@users.noreply.github.com>
Co-authored-by: Christopher Manthei <22638947+ChristopherManthei@users.noreply.github.com>
Co-authored-by: Christopher Manthei <chmant@microsoft.com>
Co-authored-by: Abhijeet Mohanty <mabhijeet1995@gmail.com>
Co-authored-by: azure-sdk <azuresdk@microsoft.com>
@@ -252,29 +243,61 @@ public void remove(String propertyName) {
* @param value the value of the property.
*/
@SuppressWarnings({"unchecked", "rawtypes"})
public <T> void set(String propertyName, T value) {
public <T> void set(String propertyName, T value, CosmosItemSerializer itemSerializer) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We found this is a breaking change when upgrade azure-cosmos from 4.58 to 4.59 since it requires an extra parameter, though it's always CosmosItemSerializer.DEFAULT_SERIALIZER.
Is it a missing in release note to call out this API breaking change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a breaking change - because this is an internal API. The implementation package is not exported and is not part of public surface are. Breaking changes in the implementation package can and will happen rather frequently - even in hotfix versions - and it is simply a bug to take a direct dependency on it - modules are not enforced in Java 8 technically - but more recent Java versions would even prevent your app to use these APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants