Support for JSONConverter in sink connector #71

tomblench · 2022-04-13T10:13:28Z

Checklist

Tick to sign-off your agreement to the Developer Certificate of Origin (DCO) 1.1
Added tests for code changes or test/build only changes
Updated the change log file (CHANGES.md|CHANGELOG.md) or test/build only changes
Completed the PR template below:

Description

See #62

Approach

Support JSONConverter by expecting values from kafka sink tasks to either be a java Map or a kafka Struct.

If Map, pass through to batchWrite directly
If Struct, convert to Map using new StructToMapConverter

Add documentation in README

Schema & API Changes

"No change"

Security and Privacy

"No change"

Testing

See added StructToMapConverterTests

Monitoring and Logging

Added slf4j-simple to view log output when running tests

tomblench · 2022-04-26T08:14:15Z

src/main/java/com/ibm/cloudant/kafka/common/utils/JavaCloudantUtil.java

@@ -116,6 +112,7 @@ public static JSONArray batchWrite(Map<String, String> props, JSONArray data)
 				result.put(jsonResult);
 			}
 		} catch (Exception e) {
+			LOG.error("Exception caught in batchWrite()", e);


This may need to be revisited in another PR - the worrying thing is that we were just swallowing exceptions from the cloudant client which I had manage to trigger with a misconfigured test.

Agreed, there needs to be a separate look at error handling to conform to the behaviours of the built-in errors.tolerance=all and none flags. (all implies silently ignore bad messages so I guess that's all we have right now!)

Should we open a ticket specifically for investigating and improving error handling?

I made a note in my error handling epic, I think strictly speaking we should iterate the result and push to the DLQ or whatever error handling is configured specifically for each document/message, but I'm ok with us improving that later.

tomblench · 2022-04-26T08:15:46Z

README.md

+value.converter.schemas.enable=true
+```
+
+#### Converter configuration: sink connector


source connector converter needs covering when we do the PR for that work - my intention is that we support JsonConverter on source and sink which simplifies things (as mentioned above it's the default anyway so no need to explicitly set it in config)

tomblench · 2022-04-26T08:17:48Z

src/test/java/com/ibm/cloudant/kafka/connect/CloudantSourceAndSinkTest.java

@@ -88,13 +92,17 @@ public void testReplicateAll() throws Exception {
            // - no offset
            for (SourceRecord record : records) {

+                // source task returns strings but sink task expects structs or maps
+                // in a real kafka instance this would be fixed by using appropriate converters
+                Map recordValue = gson.fromJson((String)record.value(), Map.class);


This is a bit awkward but is needed until we do the PR to support JsonConverter in the source connector - at which point this line can be removed since the source will return us a Map

@ricellis your idea of enabling schemas here (to return a struct/map instead of a string) didn't work because it caused various limits to be exceeded (memory, http request size), because the test payloads are complex resulting in a huge inline schema per doc.

the test payloads are complex resulting in a huge inline schema per doc.

I'm fine with doing something other than the enable.schemas approach, but honestly this is a little concerning to me, yeah they do have a 100 or so properties, but they don't look that complex. Maybe this is something that we need to cover in QA though to get a better handle on what is stressing it.

tomblench · 2022-04-26T08:21:20Z

src/test/java/com/ibm/cloudant/kafka/connect/CloudantSourceTaskTest.java

@@ -60,7 +63,8 @@ protected void setUp() throws Exception {
        data = new JSONArray(tokener);

        // Load data into the source database (create if it does not exist)
-        JavaCloudantUtil.batchWrite(sourceProperties, data);
+        JavaCloudantUtil.batchWrite(sourceProperties,


There are a few instances of this awkward mapping from org.json in test code - I've tried to make the changes as minimal as possible but would love to get rid of that library altogether at a later date.

Agreed I've already got this noted downf or later work. There are 3 different JSON libs hanging around in various places and we should narrow that down to use only the one brought by Kafka itself or the cloudant-java-sdk or maybe both as they'll both be there anyway, but we definitely shouldn't rely on an extra third one.

emlaver · 2022-04-28T13:36:43Z

README.md


-Assume these settings in a file `connect-standalone.properties` or `connect-distributed.properties`.
+Usually the kafka distribution defaults (`connect-(standalone|distributed).properties`) are as follows:


Do we need to expand on this if we say that the values below are usually the defaults? Would it be any better if we said:
"The Kafka distribution defaults are typically as follows:"

Done in ac8b55d

emlaver · 2022-04-28T15:52:42Z

src/main/java/com/ibm/cloudant/kafka/common/utils/JavaCloudantUtil.java

@@ -116,6 +112,7 @@ public static JSONArray batchWrite(Map<String, String> props, JSONArray data)
 				result.put(jsonResult);
 			}
 		} catch (Exception e) {
+			LOG.error("Exception caught in batchWrite()", e);


Should we open a ticket specifically for investigating and improving error handling?

ricellis

Looks good on the whole, just a few minor suggestions.

README.md

build.gradle

ricellis · 2022-05-03T12:23:59Z

src/main/java/com/ibm/cloudant/kafka/common/utils/JavaCloudantUtil.java

@@ -116,6 +112,7 @@ public static JSONArray batchWrite(Map<String, String> props, JSONArray data)
 				result.put(jsonResult);
 			}
 		} catch (Exception e) {
+			LOG.error("Exception caught in batchWrite()", e);


I made a note in my error handling epic, I think strictly speaking we should iterate the result and push to the DLQ or whatever error handling is configured specifically for each document/message, but I'm ok with us improving that later.

src/main/java/com/ibm/cloudant/kafka/common/utils/JavaCloudantUtil.java

src/main/java/com/ibm/cloudant/kafka/connect/CloudantSinkTask.java

src/main/java/com/ibm/cloudant/kafka/connect/StructToMapConverter.java

src/test/java/com/ibm/cloudant/kafka/connect/CloudantSinkTaskTest.java

ricellis · 2022-05-03T15:43:28Z

src/test/java/com/ibm/cloudant/kafka/connect/CloudantSourceAndSinkTest.java

@@ -88,13 +92,17 @@ public void testReplicateAll() throws Exception {
            // - no offset
            for (SourceRecord record : records) {

+                // source task returns strings but sink task expects structs or maps
+                // in a real kafka instance this would be fixed by using appropriate converters
+                Map recordValue = gson.fromJson((String)record.value(), Map.class);


the test payloads are complex resulting in a huge inline schema per doc.

I'm fine with doing something other than the enable.schemas approach, but honestly this is a little concerning to me, yeah they do have a 100 or so properties, but they don't look that complex. Maybe this is something that we need to cover in QA though to get a better handle on what is stressing it.

ricellis · 2022-05-03T15:48:16Z

src/test/java/com/ibm/cloudant/kafka/connect/CloudantSourceTaskTest.java

@@ -60,7 +63,8 @@ protected void setUp() throws Exception {
        data = new JSONArray(tokener);

        // Load data into the source database (create if it does not exist)
-        JavaCloudantUtil.batchWrite(sourceProperties, data);
+        JavaCloudantUtil.batchWrite(sourceProperties,


Agreed I've already got this noted downf or later work. There are 3 different JSON libs hanging around in various places and we should narrow that down to use only the one brought by Kafka itself or the cloudant-java-sdk or maybe both as they'll both be there anyway, but we definitely shouldn't rely on an extra third one.

These will probably need a follow-up commit to iron out any breakages... Co-authored-by: Rich Ellis <ricellis@users.noreply.github.com>

Test removed because we no longer support the KC_SCHEMA stuff, as discussed in PR.

ricellis

I am +1 now, but a couple of minor things

src/main/java/com/ibm/cloudant/kafka/connect/CloudantSinkTask.java

…java Co-authored-by: Rich Ellis <ricellis@users.noreply.github.com>

emlaver

Looks good!

WIP - converter for map/struct w/ and w/out schema

f03cd16

tomblench force-pushed the 62-kafka-connect-data-format branch from 28305f7 to f03cd16 Compare April 26, 2022 07:35

tomblench added 2 commits April 26, 2022 09:10

Clarify converter config

659c761

Don't log record value as it could be large

734a928

tomblench commented Apr 26, 2022

View reviewed changes

test cleanup

a426313

tomblench changed the title ~~WIP - support for JSONConverter~~ Support for JSONConverter in sink connector Apr 26, 2022

Add changelog entry

107b0e3

emlaver reviewed Apr 28, 2022

View reviewed changes

ricellis requested changes May 3, 2022

View reviewed changes

tomblench and others added 8 commits May 5, 2022 09:59

Apply suggestions from code review

1e7ab83

These will probably need a follow-up commit to iron out any breakages... Co-authored-by: Rich Ellis <ricellis@users.noreply.github.com>

Post-suggestion fixups

ec2587f

PR suggestion: more explicit schema/type checks

3fdee9d

PR feedback: cleanup imports

da856f7

PR feedback: drop docid checks, to be addressed in #65

5583155

PR feedback: fix tests and remove testNonReplicateSinkRecordSchema

0231864

Test removed because we no longer support the KC_SCHEMA stuff, as discussed in PR.

PR feedback: clarify defaults

ac8b55d

PR feedback: used linked list

7585eb7

ricellis approved these changes May 6, 2022

View reviewed changes

src/main/java/com/ibm/cloudant/kafka/connect/CloudantSinkTask.java Outdated Show resolved Hide resolved

src/main/java/com/ibm/cloudant/kafka/connect/CloudantSinkTask.java Outdated Show resolved Hide resolved

tomblench and others added 2 commits May 6, 2022 11:05

Update src/main/java/com/ibm/cloudant/kafka/connect/CloudantSinkTask.…

35f30ce

…java Co-authored-by: Rich Ellis <ricellis@users.noreply.github.com>

PR feedback: streaming implementation of put()

8e5e88b

ricellis added this to the 0.100.next milestone May 6, 2022

emlaver approved these changes May 6, 2022

View reviewed changes

tomblench merged commit 4964afc into master May 10, 2022

ricellis mentioned this pull request May 10, 2022

Kafka connect data format (struct|map) support in sink connector #62

Closed

ricellis deleted the 62-kafka-connect-data-format branch June 13, 2022 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for JSONConverter in sink connector #71

Support for JSONConverter in sink connector #71

tomblench commented Apr 13, 2022 •

edited

Loading

tomblench Apr 26, 2022

ricellis Apr 27, 2022

emlaver Apr 28, 2022

ricellis May 3, 2022

tomblench Apr 26, 2022

tomblench Apr 26, 2022

tomblench Apr 26, 2022

ricellis May 3, 2022

tomblench Apr 26, 2022

ricellis May 3, 2022

emlaver Apr 28, 2022

tomblench May 6, 2022

emlaver Apr 28, 2022

ricellis left a comment

ricellis May 3, 2022

ricellis May 3, 2022

ricellis May 3, 2022

ricellis left a comment

emlaver left a comment


		Assume these settings in a file `connect-standalone.properties` or `connect-distributed.properties`.
		Usually the kafka distribution defaults (`connect-(standalone\|distributed).properties`) are as follows:

Support for JSONConverter in sink connector #71

Support for JSONConverter in sink connector #71

Conversation

tomblench commented Apr 13, 2022 • edited Loading

Checklist

Description

Approach

Schema & API Changes

Security and Privacy

Testing

Monitoring and Logging

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricellis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricellis left a comment

Choose a reason for hiding this comment

emlaver left a comment

Choose a reason for hiding this comment

tomblench commented Apr 13, 2022 •

edited

Loading