-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KafkaV2SourceConnector #38748
KafkaV2SourceConnector #38748
Conversation
sdk/cosmos/azure-cosmos-kafka-connect/src/docker/Dockerfile.sinkperf
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos-kafka-connect/src/docker/docker-compose.yml
Outdated
Show resolved
Hide resolved
...smos-kafka-connect/src/main/java/com/azure/cosmos/kafka/connect/CosmosDBSourceConnector.java
Outdated
Show resolved
Hide resolved
...-kafka-connect/src/main/java/com/azure/cosmos/kafka/connect/implementation/CosmosConfig.java
Outdated
Show resolved
Hide resolved
...a/com/azure/cosmos/kafka/connect/implementation/source/ContainersMetadataTopicPartition.java
Show resolved
Hide resolved
...ect/src/main/java/com/azure/cosmos/kafka/connect/implementation/source/CosmosSourceTask.java
Show resolved
Hide resolved
.../java/com/azure/cosmos/kafka/connect/implementation/source/CosmosSourceChangeFeedConfig.java
Outdated
Show resolved
Hide resolved
...connect/src/main/java/com/azure/cosmos/kafka/connect/implementation/source/JsonToStruct.java
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos-kafka-connect/src/main/java/module-info.java
Outdated
Show resolved
Hide resolved
API change check APIView has identified API level changes in this PR and created following API reviews. com.azure:azure-cosmos |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
...c/main/java/com/azure/cosmos/kafka/connect/implementation/source/CosmosSourceTaskConfig.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/azure/cosmos/kafka/connect/implementation/source/CosmosSourceTaskConfig.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/azure/cosmos/kafka/connect/implementation/source/CosmosSourceTaskConfig.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/azure/cosmos/kafka/connect/implementation/source/CosmosSourceTaskConfig.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/azure/cosmos/kafka/connect/implementation/source/CosmosSourceTaskConfig.java
Outdated
Show resolved
Hide resolved
...ct/src/main/java/com/azure/cosmos/kafka/connect/implementation/source/FeedRangeTaskUnit.java
Show resolved
Hide resolved
...a/com/azure/cosmos/kafka/connect/implementation/source/FeedRangesMetadataTopicPartition.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Annie - looks great!
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
Flaky tests: openConnectionsAndInitCachesWithCosmosClient_And_PerContainerConnectionPoolSize_ThroughProactiveContainerInitConfig_WithTimeout |
/check-enforcer override |
This reverts commit 30835d9.
* Revert "KafkaV2SinkConnector (#38973)" This reverts commit 6c983ab. * Revert "UsingTestContainerForKafkaIntegrationTests (#38884)" This reverts commit 12bec49. * Revert "KafkaV2SourceConnector (#38748)" This reverts commit 30835d9. * revert one more change * revert change --------- Co-authored-by: annie-mac <xinlian@microsoft.com>
Feature request #38768
In this PR, we added the kafka CosmosDB source connector v2 version.
High level design
In the V1 version, source connector uses CFP to read the changes (push model). In V2, we have decided to to go for pull model for the following benefits:
This is the major entry class. Few of the most important to be implemented methods will be in this class.
taskClass()
-> return the source task class namestart()
-> called when the connector is started. As Dynamic Connector, we need to monitor changes happened on containers (which including containers being added/removed/recreated, containers has split/merge), when changes being detected, task reconfiguration will be requested so new task configs can be recalculated.MetadataMonitorThread
is started to monitor such changes.taskConfigs()
- return a List of task configurations. Each task will try to copy data from 1...N feed ranges. All the feedRanges from all monitored containers will be allocated to the task buckets (min(feedRanges, maxTasks)) in round-robin fashion. FeedRanges from the same container will be allocated into different task buckets.Except the data copy task unit, we have to maintain certain necessary metadata - the list of containerRids being monitored, the effective feedRanges for each container being monitored. These metadata will need to be maintained so we can resume from correct offset when split/merge happens. The metadata will be persisted in
_cosmos.metadata.topic
topic or any topic customer has configured.The metadata info will be calculated when connector start (either first time initialization or restart). And one of the task will be assigned to write the metadata info to related metadata topic.
This is the main entry class for kafka task. As mentioned previously, each task will need to pull data for 1...N feedRanges, we are using round-robin fashion to loop through the feed ranges.
Tests
Config
kafka.connect.cosmos.accountEndpoint
-> No default. Cosmos DB Account Endpoint Urikafka.connect.cosmos.accountKey
-> No default. Cosmos DB Account Keykafka.connect.cosmos.useGatewayMode
-> Default false. Flag to indicate whether to use gateway mode. By default it is false.kafka.connect.cosmos.preferredRegionsList
-> Default empty list. Preferred regions list to be used for a multi region Cosmos DB account. This is a comma separated value (e.g.,[East US, West US]
orEast US, West US
) provided preferred regions will be used as hint. You should use a collocated kafka cluster with your Cosmos DB account and pass the kafka cluster region as preferred region. See list of azure regions herekafka.connect.cosmos.applicationName
-> Default empty string. Will be added as the userAgent suffix.kafka.connect.cosmos.source.database.name
-> No default. Cosmos DB database name.kafka.connect.cosmos.source.containers.includeAll
-> Default false. Flag to indicate whether reading from all containers.kafka.connect.cosmos.source.containers.includedList
-> Default empty list. Containers to be monitored. This config will be ignored ifkafka.connect.cosmos.source.includeAllContainers
is true.kafka.connect.cosmos.source.containers.topicMap
-> Default empty list. A comma delimited list of Kafka topics mapped to Cosmos containers. For example: topic1#con1,topic2#con2. By default, container name is used as the name of the kafka topic to publish data to, can use this property to override the default configkafka.connect.cosmos.source.changeFeed.startFrom
-> DefaultBeginning
. ChangeFeed Start from settings (Now, Beginning or a certain point in time (UTC) for example 2020-02-10T14:15:03) - the default value is 'Beginning'.kafka.connect.cosmos.source.changeFeed.mode
-> DefaultLatestVersion
. ChangeFeed mode (LatestVersion or AllVersionsAndDeletes).kafka.connect.cosmos.source.changeFeed.maxItemCountHint
-> Default 1000. The maximum number of documents returned in a single change feed request. But the number of items received might be higher than the specified value if multiple items are changed by the same transaction.kafka.connect.cosmos.source.metadata.poll.delay.ms
-> Default300000
.metadata changes (including container split/merge, adding/removing/recreated containers). When changes are detected, it will reconfigure the tasks. Default is 5 minutes.
kafka.connect.cosmos.source.metadata.storage.topic-> Default
_cosmos.metadata.topic` . The name of the topic where the metadata are stored. The metadata topic will be created if it does not already exist, else it will use the pre-created topic.kafka.connect.cosmos.source.messageKey.enabled
-> Defaulttrue
. Whether to set the kafka record message key.kafka.connect.cosmos.source.messageKey.field
-> Defaultid
. The field to use as the message key.