From 151e45871f7cfcf6f1310c0d3b33be443e590df6 Mon Sep 17 00:00:00 2001 From: Johnny Schmidt Date: Wed, 7 Aug 2024 13:12:33 -0700 Subject: [PATCH] S3 1.0 Breaking Change Documentation --- airbyte-cdk/java/airbyte-cdk/README.md | 575 +++++++++--------- .../async/buffers/BufferManager.kt | 2 +- .../src/main/resources/version.properties | 2 +- ..._types_coerced_schemaless_messages_out.txt | 6 +- ...matic_types_coerced_schemaless_schema.json | 16 + .../problematic_types_configured_catalog.json | 16 + ...atic_types_disjoint_union_messages_out.txt | 6 +- ...oblematic_types_disjoint_union_schema.json | 16 + .../v0/problematic_types_messages_in.txt | 6 +- .../destination/s3/S3ConsumerFactory.kt | 12 +- .../s3/avro/JsonToAvroSchemaConverter.kt | 2 +- .../s3/jsonschema/AirbyteJsonSchemaType.kt | 3 + .../s3/jsonschema/JsonRecordIdentityMapper.kt | 4 + .../s3/jsonschema/JsonRecordMapper.kt | 2 + .../s3/jsonschema/JsonSchemaIdentityMapper.kt | 4 + .../s3/jsonschema/JsonSchemaMapper.kt | 2 + .../s3/jsonschema/JsonSchemaUnionMerger.kt | 20 + .../parquet/JsonSchemaParquetPreprocessor.kt | 4 + .../s3/avro/JsonSchemaTransformerTest.kt | 27 +- .../test/resources/avro/complex_schema.json | 9 + .../type_conversion_test_cases_v1.json | 6 +- .../connectors/destination-s3/build.gradle | 10 +- .../connectors/destination-s3/metadata.yaml | 12 +- .../destinations/s3-migrations.md | 209 +++++++ docs/integrations/destinations/s3.md | 59 +- .../json-avro-conversion.md | 178 +++--- docusaurus/sidebars.js | 7 +- 27 files changed, 793 insertions(+), 422 deletions(-) create mode 100644 docs/integrations/destinations/s3-migrations.md diff --git a/airbyte-cdk/java/airbyte-cdk/README.md b/airbyte-cdk/java/airbyte-cdk/README.md index 95896561200c8..0d9c329668551 100644 --- a/airbyte-cdk/java/airbyte-cdk/README.md +++ b/airbyte-cdk/java/airbyte-cdk/README.md @@ -172,290 +172,291 @@ corresponds to that version. ### Java CDK -| Version | Date | Pull Request | Subject | -|:-----------|:-----------|:------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------| -| 0.44.4 | 2024-08-08 | [\#43410](https://github.com/airbytehq/airbyte/pull/43330) | Better logs for counting info to state message. | -| 0.44.3 | 2024-08-07 | [\#43330](https://github.com/airbytehq/airbyte/pull/43330) | make TypingDedupingTest aware of column name renaming. | -| 0.44.3 | 2024-08-07 | [\#43329](https://github.com/airbytehq/airbyte/pull/43329) | move generationIdHandling to its own class. | -| 0.44.2 | 2024-08-06 | [\#42869](https://github.com/airbytehq/airbyte/pull/42869) | Add logs about counting info to state message. | -| 0.44.1 | 2024-08-01 | [\#42550](https://github.com/airbytehq/airbyte/pull/42550) | Fix error on reporting counts. | -| 0.44.0 | 2024-08-01 | [\#42405](https://github.com/airbytehq/airbyte/pull/42405) | s3-destinations: Use async framework, adapt to support refreshes | -| 0.43.6 | 2024-07-30 | [\#42540](https://github.com/airbytehq/airbyte/pull/42540) | Fix generationId handling for destinations | -| 0.43.6 | 2024-07-30 | [\#42514](https://github.com/airbytehq/airbyte/pull/42514) | Add tests around generationId handling for destinations. | -| 0.43.4 | 2024-07-28 | [\#42839](https://github.com/airbytehq/airbyte/pull/42839) | Fix error translation framework to not rethrow ConfigErrorException and TransientErrorException. | -| 0.43.3 | 2024-07-22 | [\#42417](https://github.com/airbytehq/airbyte/pull/42417) | Handle null exception message in ConnectorExceptionHandler. | -| 0.43.2 | 2024-07-22 | [\#42431](https://github.com/airbytehq/airbyte/pull/42431) | Filter out debezium message change events | -| 0.43.1 | 2024-07-22 | [\#41622](https://github.com/airbytehq/airbyte/pull/41622) | Fix null safety bug in debezium event processing | -| 0.43.0 | 2024-07-17 | [\#41954](https://github.com/airbytehq/airbyte/pull/41954) | fix refreshes for connectors using the old SqlOperations | -| 0.43.0 | 2024-07-17 | [\#42017](https://github.com/airbytehq/airbyte/pull/42017) | bump postgres-jdbc version | -| 0.43.0 | 2024-07-17 | [\#42015](https://github.com/airbytehq/airbyte/pull/42015) | wait until migration before creating the Writeconfig objects | -| 0.43.0 | 2024-07-17 | [\#41953](https://github.com/airbytehq/airbyte/pull/41953) | add generationId and syncId to SqlOperations functions | -| 0.43.0 | 2024-07-17 | [\#41952](https://github.com/airbytehq/airbyte/pull/41952) | rename and add fields in WriteConfig | -| 0.43.0 | 2024-07-17 | [\#41951](https://github.com/airbytehq/airbyte/pull/41951) | remove nullables in JdbcBufferedConsumerFactory | -| 0.43.0 | 2024-07-17 | [\#41950](https://github.com/airbytehq/airbyte/pull/41950) | remove unused classes | -| 0.42.2 | 2024-07-21 | [\#42122](https://github.com/airbytehq/airbyte/pull/42122) | Support for Debezium resync and shutdown scenarios. | -| 0.42.2 | 2024-07-04 | [\#40208](https://github.com/airbytehq/airbyte/pull/40208) | Implement a new connector error handling and translation framework | -| 0.41.8 | 2024-07-18 | [\#42068](https://github.com/airbytehq/airbyte/pull/42068) | Add analytics message for WASS occurrence. | -| 0.41.7 | 2024-07-17 | [\#42055](https://github.com/airbytehq/airbyte/pull/42055) | Add debezium heartbeat timeout back to shutdown debezium. | -| 0.41.6 | 2024-07-17 | [\#41996](https://github.com/airbytehq/airbyte/pull/41996) | Fix java interop compilation issue in Config/TransientErrorException. | -| 0.41.5 | 2024-07-16 | [\#42011] (https://github.com/airbytehq/airbyte/pull/42011) | Async consumer accepts null default namespace | -| 0.41.4 | 2024-07-15 | [\#41959](https://github.com/airbytehq/airbyte/pull/41959) | Allow setting `internal_message` in Config/TransientErrorException. Destinations: shorten error message for INCOMPLETE stream status. | -| 0.41.3 | 2024-07-15 | [\#41680](https://github.com/airbytehq/airbyte/pull/41680) | Fix: CompletableFutures.allOf now handles empty list and `Throwable` | -| 0.41.2 | 2024-07-12 | [\#40567](https://github.com/airbytehq/airbyte/pull/40567) | Fix BaseSqlGenerator test case (generation_id support); update minimum platform version for refreshes support. | -| 0.41.1 | 2024-07-11 | [\#41212](https://github.com/airbytehq/airbyte/pull/41212) | Improve debezium logging. | -| 0.41.0 | 2024-07-11 | [\#38240](https://github.com/airbytehq/airbyte/pull/38240) | Sources : Changes in CDC interfaces to support WASS algorithm | -| 0.40.11 | 2024-07-08 | [\#41041](https://github.com/airbytehq/airbyte/pull/41041) | Destinations: Fix truncate refreshes incorrectly discarding data if successful attempt had 0 records | -| 0.40.10 | 2024-07-05 | [\#40719](https://github.com/airbytehq/airbyte/pull/40719) | Update test to refrlect isResumable field in catalog | -| 0.40.9 | 2024-07-01 | [\#39473](https://github.com/airbytehq/airbyte/pull/39473) | minor changes around error logging and testing | -| 0.40.8 | 2024-07-01 | [\#40499](https://github.com/airbytehq/airbyte/pull/40499) | Make JdbcDatabase SQL statement logging optional; add generation_id support to JdbcSqlGenerator | -| 0.40.7 | 2024-07-01 | [\#40516](https://github.com/airbytehq/airbyte/pull/40516) | Remove dbz hearbeat. | -| ~~0.40.6~~ | | | (this version does not exist) | -| 0.40.5 | 2024-06-26 | [\#40517](https://github.com/airbytehq/airbyte/pull/40517) | JdbcDatabase.executeWithinTransaction allows disabling SQL statement logging | -| 0.40.4 | 2024-06-18 | [\#40254](https://github.com/airbytehq/airbyte/pull/40254) | Destinations: Do not throw on unrecognized airbyte message type (ignore message instead) | -| 0.40.3 | 2024-06-18 | [\#39526](https://github.com/airbytehq/airbyte/pull/39526) | Destinations: INCOMPLETE stream status is a TRANSIENT error rather than SYSTEM | -| 0.40.2 | 2024-06-18 | [\#39552](https://github.com/airbytehq/airbyte/pull/39552) | Destinations: Throw error if the ConfiguredCatalog has no streams | -| 0.40.1 | 2024-06-14 | [\#39349](https://github.com/airbytehq/airbyte/pull/39349) | Source stats for full refresh streams | -| 0.40.0 | 2024-06-17 | [\#38622](https://github.com/airbytehq/airbyte/pull/38622) | Destinations: Implement refreshes logic in AbstractStreamOperation | -| 0.39.0 | 2024-06-17 | [\#38067](https://github.com/airbytehq/airbyte/pull/38067) | Destinations: Breaking changes for refreshes (fail on INCOMPLETE stream status; ignore OVERWRITE sync mode) | -| 0.38.3 | 2024-06-25 | [\#40499](https://github.com/airbytehq/airbyte/pull/40499) | (backport) Make JdbcDatabase SQL statement logging optional; add generation_id support to JdbcSqlGenerator | -| 0.38.2 | 2024-06-14 | [\#39460](https://github.com/airbytehq/airbyte/pull/39460) | Bump postgres JDBC driver version | -| 0.38.1 | 2024-06-13 | [\#39445](https://github.com/airbytehq/airbyte/pull/39445) | Sources: More CDK changes to handle big initial snapshots. | -| 0.38.0 | 2024-06-11 | [\#39405](https://github.com/airbytehq/airbyte/pull/39405) | Sources: Debezium properties manager interface changed to accept a list of streams to scope to | -| 0.37.1 | 2024-06-10 | [\#38075](https://github.com/airbytehq/airbyte/pull/38075) | Destinations: Track stream statuses in async framework | -| 0.37.0 | 2024-06-10 | [\#38121](https://github.com/airbytehq/airbyte/pull/38121) | Destinations: Set default namespace via CatalogParser | -| 0.36.8 | 2024-06-07 | [\#38763](https://github.com/airbytehq/airbyte/pull/38763) | Increase Jackson message length limit | -| 0.36.7 | 2024-06-06 | [\#39220](https://github.com/airbytehq/airbyte/pull/39220) | Handle null messages in ConnectorExceptionUtil | -| 0.36.6 | 2024-06-05 | [\#39106](https://github.com/airbytehq/airbyte/pull/39106) | Skip write to storage with 0 byte file | -| 0.36.5 | 2024-06-01 | [\#38792](https://github.com/airbytehq/airbyte/pull/38792) | Throw config exception if no selectable table exists in user provided schemas | -| 0.36.4 | 2024-05-31 | [\#38824](https://github.com/airbytehq/airbyte/pull/38824) | Param marked as non-null to nullable in JdbcDestinationHandler for NPE fix | -| 0.36.2 | 2024-05-29 | [\#38538](https://github.com/airbytehq/airbyte/pull/38357) | Exit connector when encountering a config error. | -| 0.36.0 | 2024-05-29 | [\#38358](https://github.com/airbytehq/airbyte/pull/38358) | Plumb generation_id / sync_id to destinations code | -| 0.35.16 | 2024-06-25 | [\#40517](https://github.com/airbytehq/airbyte/pull/40517) | (backport) JdbcDatabase.executeWithinTransaction allows disabling SQL statement logging | -| 0.35.15 | 2024-05-31 | [\#38824](https://github.com/airbytehq/airbyte/pull/38824) | Param marked as non-null to nullable in JdbcDestinationHandler for NPE fix | -| 0.35.14 | 2024-05-28 | [\#38738](https://github.com/airbytehq/airbyte/pull/38738) | make ThreadCreationInfo cast as nullable | -| 0.35.13 | 2024-05-28 | [\#38632](https://github.com/airbytehq/airbyte/pull/38632) | minor changes to allow conversion of snowflake tests to kotlin | -| 0.35.12 | 2024-05-23 | [\#38638](https://github.com/airbytehq/airbyte/pull/38638) | Minor change to support Snowflake conversion to Kotlin | -| 0.35.11 | 2024-05-23 | [\#38357](https://github.com/airbytehq/airbyte/pull/38357) | This release fixes an error on the previous release. | -| 0.35.10 | 2024-05-23 | [\#38357](https://github.com/airbytehq/airbyte/pull/38357) | Add shared code for db sources stream status trace messages and testing. | -| 0.35.9 | 2024-05-23 | [\#38586](https://github.com/airbytehq/airbyte/pull/38586) | code cleanup | -| 0.35.9 | 2024-05-23 | [\#37583](https://github.com/airbytehq/airbyte/pull/37583) | code cleanup | -| 0.35.9 | 2024-05-23 | [\#37555](https://github.com/airbytehq/airbyte/pull/37555) | code cleanup | -| 0.35.9 | 2024-05-23 | [\#37540](https://github.com/airbytehq/airbyte/pull/37540) | code cleanup | -| 0.35.9 | 2024-05-23 | [\#37539](https://github.com/airbytehq/airbyte/pull/37539) | code cleanup | -| 0.35.9 | 2024-05-23 | [\#37538](https://github.com/airbytehq/airbyte/pull/37538) | code cleanup | -| 0.35.9 | 2024-05-23 | [\#37537](https://github.com/airbytehq/airbyte/pull/37537) | code cleanup | -| 0.35.9 | 2024-05-23 | [\#37518](https://github.com/airbytehq/airbyte/pull/37518) | code cleanup | -| 0.35.8 | 2024-05-22 | [\#38572](https://github.com/airbytehq/airbyte/pull/38572) | Add a temporary static method to decouple SnowflakeDestination from AbstractJdbcDestination | -| 0.35.7 | 2024-05-20 | [\#38357](https://github.com/airbytehq/airbyte/pull/38357) | Decouple create namespace from per stream operation interface. | -| 0.35.6 | 2024-05-17 | [\#38107](https://github.com/airbytehq/airbyte/pull/38107) | New interfaces for Destination connectors to plug into AsyncStreamConsumer | -| 0.35.5 | 2024-05-17 | [\#38204](https://github.com/airbytehq/airbyte/pull/38204) | add assume-role authentication to s3 | -| 0.35.2 | 2024-05-13 | [\#38104](https://github.com/airbytehq/airbyte/pull/38104) | Handle transient error messages | -| 0.35.0 | 2024-05-13 | [\#38127](https://github.com/airbytehq/airbyte/pull/38127) | Destinations: Populate generation/sync ID on StreamConfig | -| 0.34.4 | 2024-05-10 | [\#37712](https://github.com/airbytehq/airbyte/pull/37712) | make sure the exceptionHandler always terminates | -| 0.34.3 | 2024-05-10 | [\#38095](https://github.com/airbytehq/airbyte/pull/38095) | Minor changes for databricks connector | -| 0.34.1 | 2024-05-07 | [\#38030](https://github.com/airbytehq/airbyte/pull/38030) | Add support for transient errors | -| 0.34.0 | 2024-05-01 | [\#37712](https://github.com/airbytehq/airbyte/pull/37712) | Destinations: Remove incremental T+D | -| 0.33.2 | 2024-05-03 | [\#37824](https://github.com/airbytehq/airbyte/pull/37824) | improve source acceptance tests | -| 0.33.1 | 2024-05-03 | [\#37824](https://github.com/airbytehq/airbyte/pull/37824) | Add a unit test for cursor based sync | -| 0.33.0 | 2024-05-03 | [\#36935](https://github.com/airbytehq/airbyte/pull/36935) | Destinations: Enable non-safe-casting DV2 tests | -| 0.32.0 | 2024-05-03 | [\#36929](https://github.com/airbytehq/airbyte/pull/36929) | Destinations: Assorted DV2 changes for mysql | -| 0.31.7 | 2024-05-02 | [\#36910](https://github.com/airbytehq/airbyte/pull/36910) | changes for destination-snowflake | -| 0.31.6 | 2024-05-02 | [\#37746](https://github.com/airbytehq/airbyte/pull/37746) | debuggability improvements. | -| 0.31.5 | 2024-04-30 | [\#37758](https://github.com/airbytehq/airbyte/pull/37758) | Set debezium max retries to zero | -| 0.31.4 | 2024-04-30 | [\#37754](https://github.com/airbytehq/airbyte/pull/37754) | Add DebeziumEngine notification log | -| 0.31.3 | 2024-04-30 | [\#37726](https://github.com/airbytehq/airbyte/pull/37726) | Remove debezium retries | -| 0.31.2 | 2024-04-30 | [\#37507](https://github.com/airbytehq/airbyte/pull/37507) | Better error messages when switching between global/per-stream modes. | -| 0.31.0 | 2024-04-26 | [\#37584](https://github.com/airbytehq/airbyte/pull/37584) | Update S3 destination deps to exclude zookeeper and hadoop-yarn-common | -| 0.30.11 | 2024-04-25 | [\#36899](https://github.com/airbytehq/airbyte/pull/36899) | changes for bigQuery destination. | -| 0.30.10 | 2024-04-24 | [\#37541](https://github.com/airbytehq/airbyte/pull/37541) | remove excessive logging | -| 0.30.9 | 2024-04-24 | [\#37477](https://github.com/airbytehq/airbyte/pull/37477) | remove unnecessary logs | -| 0.30.7 | 2024-04-23 | [\#37477](https://github.com/airbytehq/airbyte/pull/37477) | fix kotlin warnings in core CDK submodule | -| 0.30.7 | 2024-04-23 | [\#37484](https://github.com/airbytehq/airbyte/pull/37484) | fix kotlin warnings in dependencies CDK submodule | -| 0.30.7 | 2024-04-23 | [\#37479](https://github.com/airbytehq/airbyte/pull/37479) | fix kotlin warnings in azure-destination, datastore-{bigquery,mongo,postgres} CDK submodules | -| 0.30.7 | 2024-04-23 | [\#37481](https://github.com/airbytehq/airbyte/pull/37481) | fix kotlin warnings in destination CDK submodules | -| 0.30.7 | 2024-04-23 | [\#37482](https://github.com/airbytehq/airbyte/pull/37482) | fix kotlin warnings in db-sources CDK submodule | -| 0.30.6 | 2024-04-19 | [\#37442](https://github.com/airbytehq/airbyte/pull/37442) | Destinations: Rename File format related classes to be agnostic of S3 | -| 0.30.3 | 2024-04-12 | [\#37106](https://github.com/airbytehq/airbyte/pull/37106) | Destinations: Simplify constructors in `AsyncStreamConsumer` | -| 0.30.2 | 2024-04-12 | [\#36926](https://github.com/airbytehq/airbyte/pull/36926) | Destinations: Remove `JdbcSqlOperations#formatData`; misc changes for java interop | -| 0.30.1 | 2024-04-11 | [\#36919](https://github.com/airbytehq/airbyte/pull/36919) | Fix regression in sources conversion of null values | -| 0.30.0 | 2024-04-11 | [\#36974](https://github.com/airbytehq/airbyte/pull/36974) | Destinations: Pass config to jdbc sqlgenerator; allow cascade drop | -| 0.29.13 | 2024-04-10 | [\#36981](https://github.com/airbytehq/airbyte/pull/36981) | DB sources : Emit analytics for data type serialization errors. | -| 0.29.12 | 2024-04-10 | [\#36973](https://github.com/airbytehq/airbyte/pull/36973) | Destinations: Make flush batch size configurable for JdbcInsertFlush | -| 0.29.11 | 2024-04-10 | [\#36865](https://github.com/airbytehq/airbyte/pull/36865) | Sources : Remove noisy log line. | -| 0.29.10 | 2024-04-10 | [\#36805](https://github.com/airbytehq/airbyte/pull/36805) | Destinations: Enhance CatalogParser name collision handling; add DV2 tests for long identifiers | -| 0.29.9 | 2024-04-09 | [\#36047](https://github.com/airbytehq/airbyte/pull/36047) | Destinations: CDK updates for raw-only destinations | -| 0.29.8 | 2024-04-08 | [\#36868](https://github.com/airbytehq/airbyte/pull/36868) | Destinations: s3-destinations Compilation fixes for connector | -| 0.29.7 | 2024-04-08 | [\#36768](https://github.com/airbytehq/airbyte/pull/36768) | Destinations: Make destination state fetch/commit logic more resilient to errors | -| 0.29.6 | 2024-04-05 | [\#36577](https://github.com/airbytehq/airbyte/pull/36577) | Do not send system_error trace message for config exceptions. | -| 0.29.5 | 2024-04-05 | [\#36620](https://github.com/airbytehq/airbyte/pull/36620) | Missed changes - open for extension for destination-postgres | -| 0.29.3 | 2024-04-04 | [\#36759](https://github.com/airbytehq/airbyte/pull/36759) | Minor fixes. | -| 0.29.3 | 2024-04-04 | [\#36706](https://github.com/airbytehq/airbyte/pull/36706) | Enabling spotbugs for s3-destination. | -| 0.29.3 | 2024-04-03 | [\#36705](https://github.com/airbytehq/airbyte/pull/36705) | Enabling spotbugs for db-sources. | -| 0.29.3 | 2024-04-03 | [\#36704](https://github.com/airbytehq/airbyte/pull/36704) | Enabling spotbugs for datastore-postgres. | -| 0.29.3 | 2024-04-03 | [\#36703](https://github.com/airbytehq/airbyte/pull/36703) | Enabling spotbugs for gcs-destination. | -| 0.29.3 | 2024-04-03 | [\#36702](https://github.com/airbytehq/airbyte/pull/36702) | Enabling spotbugs for db-destinations. | -| 0.29.3 | 2024-04-03 | [\#36701](https://github.com/airbytehq/airbyte/pull/36701) | Enabling spotbugs for typing_and_deduping. | -| 0.29.3 | 2024-04-03 | [\#36612](https://github.com/airbytehq/airbyte/pull/36612) | Enabling spotbugs for dependencies. | -| 0.29.5 | 2024-04-05 | [\#36577](https://github.com/airbytehq/airbyte/pull/36577) | Do not send system_error trace message for config exceptions. | -| 0.29.3 | 2024-04-04 | [\#36759](https://github.com/airbytehq/airbyte/pull/36759) | Minor fixes. | -| 0.29.3 | 2024-04-04 | [\#36706](https://github.com/airbytehq/airbyte/pull/36706) | Enabling spotbugs for s3-destination. | -| 0.29.3 | 2024-04-03 | [\#36705](https://github.com/airbytehq/airbyte/pull/36705) | Enabling spotbugs for db-sources. | -| 0.29.3 | 2024-04-03 | [\#36704](https://github.com/airbytehq/airbyte/pull/36704) | Enabling spotbugs for datastore-postgres. | -| 0.29.3 | 2024-04-03 | [\#36703](https://github.com/airbytehq/airbyte/pull/36703) | Enabling spotbugs for gcs-destination. | -| 0.29.3 | 2024-04-03 | [\#36702](https://github.com/airbytehq/airbyte/pull/36702) | Enabling spotbugs for db-destinations. | -| 0.29.3 | 2024-04-03 | [\#36701](https://github.com/airbytehq/airbyte/pull/36701) | Enabling spotbugs for typing_and_deduping. | -| 0.29.3 | 2024-04-03 | [\#36612](https://github.com/airbytehq/airbyte/pull/36612) | Enabling spotbugs for dependencies. | -| 0.29.2 | 2024-04-04 | [\#36845](https://github.com/airbytehq/airbyte/pull/36772) | Changes to make source-mongo compileable | -| 0.29.1 | 2024-04-03 | [\#36772](https://github.com/airbytehq/airbyte/pull/36772) | Changes to make source-mssql compileable | -| 0.29.0 | 2024-04-02 | [\#36759](https://github.com/airbytehq/airbyte/pull/36759) | Build artifact publication changes and fixes. | -| 0.28.21 | 2024-04-02 | [\#36673](https://github.com/airbytehq/airbyte/pull/36673) | Change the destination message parsing to use standard java/kotlin classes. Adds logging to catch empty lines. | -| 0.28.20 | 2024-04-01 | [\#36584](https://github.com/airbytehq/airbyte/pull/36584) | Changes to make source-postgres compileable | -| 0.28.19 | 2024-03-29 | [\#36619](https://github.com/airbytehq/airbyte/pull/36619) | Changes to make destination-postgres compileable | -| 0.28.19 | 2024-03-29 | [\#36588](https://github.com/airbytehq/airbyte/pull/36588) | Changes to make destination-redshift compileable | -| 0.28.19 | 2024-03-29 | [\#36610](https://github.com/airbytehq/airbyte/pull/36610) | remove airbyte-api generation, pull depdendency jars instead | -| 0.28.19 | 2024-03-29 | [\#36611](https://github.com/airbytehq/airbyte/pull/36611) | disable spotbugs for CDK tes and testFixtures tasks | -| 0.28.18 | 2024-03-28 | [\#36606](https://github.com/airbytehq/airbyte/pull/36574) | disable spotbugs for CDK tes and testFixtures tasks | -| 0.28.18 | 2024-03-28 | [\#36574](https://github.com/airbytehq/airbyte/pull/36574) | Fix ContainerFactory | -| 0.28.18 | 2024-03-27 | [\#36570](https://github.com/airbytehq/airbyte/pull/36570) | Convert missing s3-destinations tests to Kotlin | -| 0.28.18 | 2024-03-27 | [\#36446](https://github.com/airbytehq/airbyte/pull/36446) | Convert dependencies submodule to Kotlin | -| 0.28.18 | 2024-03-27 | [\#36445](https://github.com/airbytehq/airbyte/pull/36445) | Convert functional out Checked interfaces to kotlin | -| 0.28.18 | 2024-03-27 | [\#36444](https://github.com/airbytehq/airbyte/pull/36444) | Use apache-commons classes in our Checked functional interfaces | -| 0.28.18 | 2024-03-27 | [\#36467](https://github.com/airbytehq/airbyte/pull/36467) | Convert #36465 to Kotlin | -| 0.28.18 | 2024-03-27 | [\#36473](https://github.com/airbytehq/airbyte/pull/36473) | Convert convert #36396 to Kotlin | -| 0.28.18 | 2024-03-27 | [\#36439](https://github.com/airbytehq/airbyte/pull/36439) | Convert db-destinations submodule to Kotlin | -| 0.28.18 | 2024-03-27 | [\#36438](https://github.com/airbytehq/airbyte/pull/36438) | Convert db-sources submodule to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36437](https://github.com/airbytehq/airbyte/pull/36437) | Convert gsc submodule to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36421](https://github.com/airbytehq/airbyte/pull/36421) | Convert typing-deduping submodule to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36420](https://github.com/airbytehq/airbyte/pull/36420) | Convert s3-destinations submodule to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36419](https://github.com/airbytehq/airbyte/pull/36419) | Convert azure submodule to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36413](https://github.com/airbytehq/airbyte/pull/36413) | Convert postgres submodule to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36412](https://github.com/airbytehq/airbyte/pull/36412) | Convert mongodb submodule to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36411](https://github.com/airbytehq/airbyte/pull/36411) | Convert datastore-bigquery submodule to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36205](https://github.com/airbytehq/airbyte/pull/36205) | Convert core/main to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36204](https://github.com/airbytehq/airbyte/pull/36204) | Convert core/test to Kotlin | -| 0.28.18 | 2024-03-26 | [\#36190](https://github.com/airbytehq/airbyte/pull/36190) | Convert core/testFixtures to Kotlin | -| 0.28.0 | 2024-03-26 | [\#36514](https://github.com/airbytehq/airbyte/pull/36514) | Bump CDK version to 0.28.0 | -| 0.27.7 | 2024-03-26 | [\#36466](https://github.com/airbytehq/airbyte/pull/36466) | Destinations: fix support for case-sensitive fields in destination state. | -| 0.27.6 | 2024-03-26 | [\#36432](https://github.com/airbytehq/airbyte/pull/36432) | Sources support for AirbyteRecordMessageMeta during reading source data types. | -| 0.27.5 | 2024-03-25 | [\#36461](https://github.com/airbytehq/airbyte/pull/36461) | Destinations: Handle case-sensitive columns in destination state handling. | -| 0.27.4 | 2024-03-25 | [\#36333](https://github.com/airbytehq/airbyte/pull/36333) | Sunset DebeziumSourceDecoratingIterator. | -| 0.27.1 | 2024-03-22 | [\#36296](https://github.com/airbytehq/airbyte/pull/36296) | Destinations: (async framework) Do not log invalid message data. | -| 0.27.0 | 2024-03-21 | [\#36364](https://github.com/airbytehq/airbyte/pull/36364) | Sources: Increase debezium initial record wait time to 40 minute. | -| 0.26.1 | 2024-03-19 | [\#35599](https://github.com/airbytehq/airbyte/pull/35599) | Sunset SourceDecoratingIterator. | -| 0.26.0 | 2024-03-19 | [\#36263](https://github.com/airbytehq/airbyte/pull/36263) | Improve conversion of debezium Date type for some edge case in mssql. | -| 0.25.0 | 2024-03-18 | [\#36203](https://github.com/airbytehq/airbyte/pull/36203) | Wiring of Transformer to StagingConsumerFactory and JdbcBufferedConsumerFactory; import changes for Kotlin conversion; State message logs to debug | -| 0.24.1 | 2024-03-13 | [\#36022](https://github.com/airbytehq/airbyte/pull/36022) | Move log4j2-test.xml to test fixtures, away from runtime classpath. | -| 0.24.0 | 2024-03-13 | [\#35944](https://github.com/airbytehq/airbyte/pull/35944) | Add `_airbyte_meta` in raw table and test fixture updates | -| 0.23.20 | 2024-03-12 | [\#36011](https://github.com/airbytehq/airbyte/pull/36011) | Debezium configuration for conversion of null value on a column with default value. | -| 0.23.19 | 2024-03-11 | [\#35904](https://github.com/airbytehq/airbyte/pull/35904) | Add retries to the debezium engine. | -| 0.23.18 | 2024-03-07 | [\#35899](https://github.com/airbytehq/airbyte/pull/35899) | Null check when retrieving destination state | -| 0.23.16 | 2024-03-06 | [\#35842](https://github.com/airbytehq/airbyte/pull/35842) | Improve logging in debezium processing. | -| 0.23.15 | 2024-03-05 | [\#35827](https://github.com/airbytehq/airbyte/pull/35827) | improving the Junit interceptor. | -| 0.23.14 | 2024-03-05 | [\#35739](https://github.com/airbytehq/airbyte/pull/35739) | Add logging to the CDC queue size. Fix the ContainerFactory. | -| 0.23.13 | 2024-03-04 | [\#35774](https://github.com/airbytehq/airbyte/pull/35774) | minor changes to the CDK test fixtures. | -| 0.23.12 | 2024-03-01 | [\#35767](https://github.com/airbytehq/airbyte/pull/35767) | introducing a timeout for java tests. | -| 0.23.11 | 2024-03-01 | [\#35313](https://github.com/airbytehq/airbyte/pull/35313) | Preserve timezone offset in CSV writer for destinations | -| 0.23.10 | 2024-03-01 | [\#35303](https://github.com/airbytehq/airbyte/pull/35303) | Migration framework with DestinationState for softReset | -| 0.23.9 | 2024-02-29 | [\#35720](https://github.com/airbytehq/airbyte/pull/35720) | various improvements for tests TestDataHolder | -| 0.23.8 | 2024-02-28 | [\#35529](https://github.com/airbytehq/airbyte/pull/35529) | Refactor on state iterators | -| 0.23.7 | 2024-02-28 | [\#35376](https://github.com/airbytehq/airbyte/pull/35376) | Extract typereduper migrations to separte method | -| 0.23.6 | 2024-02-26 | [\#35647](https://github.com/airbytehq/airbyte/pull/35647) | Add a getNamespace into TestDataHolder | -| 0.23.5 | 2024-02-26 | [\#35512](https://github.com/airbytehq/airbyte/pull/35512) | Remove @DisplayName from all CDK tests. | -| 0.23.4 | 2024-02-26 | [\#35507](https://github.com/airbytehq/airbyte/pull/35507) | Add more logs into TestDatabase. | -| 0.23.3 | 2024-02-26 | [\#35495](https://github.com/airbytehq/airbyte/pull/35495) | Fix Junit Interceptor to print better stacktraces | -| 0.23.2 | 2024-02-22 | [\#35385](https://github.com/airbytehq/airbyte/pull/35342) | Bugfix: inverted logic of disableTypeDedupe flag | -| 0.23.1 | 2024-02-22 | [\#35527](https://github.com/airbytehq/airbyte/pull/35527) | reduce shutdow timeouts | -| 0.23.0 | 2024-02-22 | [\#35342](https://github.com/airbytehq/airbyte/pull/35342) | Consolidate and perform upfront gathering of DB metadata state | -| 0.21.4 | 2024-02-21 | [\#35511](https://github.com/airbytehq/airbyte/pull/35511) | Reduce CDC state compression limit to 1MB | -| 0.21.3 | 2024-02-20 | [\#35394](https://github.com/airbytehq/airbyte/pull/35394) | Add Junit progress information to the test logs | -| 0.21.2 | 2024-02-20 | [\#34978](https://github.com/airbytehq/airbyte/pull/34978) | Reduce log noise in NormalizationLogParser. | -| 0.21.1 | 2024-02-20 | [\#35199](https://github.com/airbytehq/airbyte/pull/35199) | Add thread names to the logs. | -| 0.21.0 | 2024-02-16 | [\#35314](https://github.com/airbytehq/airbyte/pull/35314) | Delete S3StreamCopier classes. These have been superseded by the async destinations framework. | -| 0.20.9 | 2024-02-15 | [\#35240](https://github.com/airbytehq/airbyte/pull/35240) | Make state emission to platform inside state manager itself. | -| 0.20.8 | 2024-02-15 | [\#35285](https://github.com/airbytehq/airbyte/pull/35285) | Improve blobstore module structure. | -| 0.20.7 | 2024-02-13 | [\#35236](https://github.com/airbytehq/airbyte/pull/35236) | output logs to files in addition to stdout when running tests | -| 0.20.6 | 2024-02-12 | [\#35036](https://github.com/airbytehq/airbyte/pull/35036) | Add trace utility to emit analytics messages. | -| 0.20.5 | 2024-02-13 | [\#34869](https://github.com/airbytehq/airbyte/pull/34869) | Don't emit final state in SourceStateIterator there is an underlying stream failure. | -| 0.20.4 | 2024-02-12 | [\#35042](https://github.com/airbytehq/airbyte/pull/35042) | Use delegate's isDestinationV2 invocation in SshWrappedDestination. | -| 0.20.3 | 2024-02-09 | [\#34580](https://github.com/airbytehq/airbyte/pull/34580) | Support special chars in mysql/mssql database name. | -| 0.20.2 | 2024-02-12 | [\#35111](https://github.com/airbytehq/airbyte/pull/35144) | Make state emission from async framework synchronized. | -| 0.20.1 | 2024-02-11 | [\#35111](https://github.com/airbytehq/airbyte/pull/35111) | Fix GlobalAsyncStateManager stats counting logic. | -| 0.20.0 | 2024-02-09 | [\#34562](https://github.com/airbytehq/airbyte/pull/34562) | Add new test cases to BaseTypingDedupingTest to exercise special characters. | -| 0.19.0 | 2024-02-01 | [\#34745](https://github.com/airbytehq/airbyte/pull/34745) | Reorganize CDK module structure. | -| 0.18.0 | 2024-02-08 | [\#33606](https://github.com/airbytehq/airbyte/pull/33606) | Add updated Initial and Incremental Stream State definitions for DB Sources. | -| 0.17.1 | 2024-02-08 | [\#35027](https://github.com/airbytehq/airbyte/pull/35027) | Make state handling thread safe in async destination framework. | -| 0.17.0 | 2024-02-08 | [\#34502](https://github.com/airbytehq/airbyte/pull/34502) | Enable configuring async destination batch size. | -| 0.16.6 | 2024-02-07 | [\#34892](https://github.com/airbytehq/airbyte/pull/34892) | Improved testcontainers logging and support for unshared containers. | -| 0.16.5 | 2024-02-07 | [\#34948](https://github.com/airbytehq/airbyte/pull/34948) | Fix source state stats counting logic | -| 0.16.4 | 2024-02-01 | [\#34727](https://github.com/airbytehq/airbyte/pull/34727) | Add future based stdout consumer in BaseTypingDedupingTest | -| 0.16.3 | 2024-01-30 | [\#34669](https://github.com/airbytehq/airbyte/pull/34669) | Fix org.apache.logging.log4j:log4j-slf4j-impl version conflicts. | -| 0.16.2 | 2024-01-29 | [\#34630](https://github.com/airbytehq/airbyte/pull/34630) | expose NamingTransformer to sub-classes in destinations JdbcSqlGenerator. | -| 0.16.1 | 2024-01-29 | [\#34533](https://github.com/airbytehq/airbyte/pull/34533) | Add a safe method to execute DatabaseMetadata's Resultset returning queries. | -| 0.16.0 | 2024-01-26 | [\#34573](https://github.com/airbytehq/airbyte/pull/34573) | Untangle Debezium harness dependencies. | -| 0.15.2 | 2024-01-25 | [\#34441](https://github.com/airbytehq/airbyte/pull/34441) | Improve airbyte-api build performance. | -| 0.15.1 | 2024-01-25 | [\#34451](https://github.com/airbytehq/airbyte/pull/34451) | Async destinations: Better logging when we fail to parse an AirbyteMessage | -| 0.15.0 | 2024-01-23 | [\#34441](https://github.com/airbytehq/airbyte/pull/34441) | Removed connector registry and micronaut dependencies. | -| 0.14.2 | 2024-01-24 | [\#34458](https://github.com/airbytehq/airbyte/pull/34458) | Handle case-sensitivity in sentry error grouping | -| 0.14.1 | 2024-01-24 | [\#34468](https://github.com/airbytehq/airbyte/pull/34468) | Add wait for process to be done before ending sync in destination BaseTDTest | -| 0.14.0 | 2024-01-23 | [\#34461](https://github.com/airbytehq/airbyte/pull/34461) | Revert non backward compatible signature changes from 0.13.1 | -| 0.13.3 | 2024-01-23 | [\#34077](https://github.com/airbytehq/airbyte/pull/34077) | Denote if destinations fully support Destinations V2 | -| 0.13.2 | 2024-01-18 | [\#34364](https://github.com/airbytehq/airbyte/pull/34364) | Better logging in mongo db source connector | -| 0.13.1 | 2024-01-18 | [\#34236](https://github.com/airbytehq/airbyte/pull/34236) | Add postCreateTable hook in destination JdbcSqlGenerator | -| 0.13.0 | 2024-01-16 | [\#34177](https://github.com/airbytehq/airbyte/pull/34177) | Add `useExpensiveSafeCasting` param in JdbcSqlGenerator methods; add JdbcTypingDedupingTest fixture; other DV2-related changes | -| 0.12.1 | 2024-01-11 | [\#34186](https://github.com/airbytehq/airbyte/pull/34186) | Add hook for additional destination specific checks to JDBC destination check method | -| 0.12.0 | 2024-01-10 | [\#33875](https://github.com/airbytehq/airbyte/pull/33875) | Upgrade sshd-mina to 2.11.1 | -| 0.11.5 | 2024-01-10 | [\#34119](https://github.com/airbytehq/airbyte/pull/34119) | Remove wal2json support for postgres+debezium. | -| 0.11.4 | 2024-01-09 | [\#33305](https://github.com/airbytehq/airbyte/pull/33305) | Source stats in incremental syncs | -| 0.11.3 | 2023-01-09 | [\#33658](https://github.com/airbytehq/airbyte/pull/33658) | Always fail when debezium fails, even if it happened during the setup phase. | -| 0.11.2 | 2024-01-09 | [\#33969](https://github.com/airbytehq/airbyte/pull/33969) | Destination state stats implementation | -| 0.11.1 | 2024-01-04 | [\#33727](https://github.com/airbytehq/airbyte/pull/33727) | SSH bastion heartbeats for Destinations | -| 0.11.0 | 2024-01-04 | [\#33730](https://github.com/airbytehq/airbyte/pull/33730) | DV2 T+D uses Sql struct to represent transactions; other T+D-related changes | -| 0.10.4 | 2023-12-20 | [\#33071](https://github.com/airbytehq/airbyte/pull/33071) | Add the ability to parse JDBC parameters with another delimiter than '&' | -| 0.10.3 | 2024-01-03 | [\#33312](https://github.com/airbytehq/airbyte/pull/33312) | Send out count in AirbyteStateMessage | -| 0.10.1 | 2023-12-21 | [\#33723](https://github.com/airbytehq/airbyte/pull/33723) | Make memory-manager log message less scary | -| 0.10.0 | 2023-12-20 | [\#33704](https://github.com/airbytehq/airbyte/pull/33704) | JdbcDestinationHandler now properly implements `getInitialRawTableState`; reenable SqlGenerator test | -| 0.9.0 | 2023-12-18 | [\#33124](https://github.com/airbytehq/airbyte/pull/33124) | Make Schema Creation Separate from Table Creation, exclude the T&D module from the CDK | -| 0.8.0 | 2023-12-18 | [\#33506](https://github.com/airbytehq/airbyte/pull/33506) | Improve async destination shutdown logic; more JDBC async migration work; improve DAT test schema handling | -| 0.7.9 | 2023-12-18 | [\#33549](https://github.com/airbytehq/airbyte/pull/33549) | Improve MongoDB logging. | -| 0.7.8 | 2023-12-18 | [\#33365](https://github.com/airbytehq/airbyte/pull/33365) | Emit stream statuses more consistently | -| 0.7.7 | 2023-12-18 | [\#33434](https://github.com/airbytehq/airbyte/pull/33307) | Remove LEGACY state | -| 0.7.6 | 2023-12-14 | [\#32328](https://github.com/airbytehq/airbyte/pull/33307) | Add schema less mode for mongodb CDC. Fixes for non standard mongodb id type. | -| 0.7.4 | 2023-12-13 | [\#33232](https://github.com/airbytehq/airbyte/pull/33232) | Track stream record count during sync; only run T+D if a stream had nonzero records or the previous sync left unprocessed records. | -| 0.7.3 | 2023-12-13 | [\#33369](https://github.com/airbytehq/airbyte/pull/33369) | Extract shared JDBC T+D code. | -| 0.7.2 | 2023-12-11 | [\#33307](https://github.com/airbytehq/airbyte/pull/33307) | Fix DV2 JDBC type mappings (code changes in [\#33307](https://github.com/airbytehq/airbyte/pull/33307)). | -| 0.7.1 | 2023-12-01 | [\#33027](https://github.com/airbytehq/airbyte/pull/33027) | Add the abstract DB source debugger. | -| 0.7.0 | 2023-12-07 | [\#32326](https://github.com/airbytehq/airbyte/pull/32326) | Destinations V2 changes for JDBC destinations | -| 0.6.4 | 2023-12-06 | [\#33082](https://github.com/airbytehq/airbyte/pull/33082) | Improvements to schema snapshot error handling + schema snapshot history scope (scoped to configured DB). | -| 0.6.2 | 2023-11-30 | [\#32573](https://github.com/airbytehq/airbyte/pull/32573) | Update MSSQLConverter to enforce 6-digit microsecond precision for timestamp fields | -| 0.6.1 | 2023-11-30 | [\#32610](https://github.com/airbytehq/airbyte/pull/32610) | Support DB initial sync using binary as primary key. | -| 0.6.0 | 2023-11-30 | [\#32888](https://github.com/airbytehq/airbyte/pull/32888) | JDBC destinations now use the async framework | -| 0.5.3 | 2023-11-28 | [\#32686](https://github.com/airbytehq/airbyte/pull/32686) | Better attribution of debezium engine shutdown due to heartbeat. | -| 0.5.1 | 2023-11-27 | [\#32662](https://github.com/airbytehq/airbyte/pull/32662) | Debezium initialization wait time will now read from initial setup time. | -| 0.5.0 | 2023-11-22 | [\#32656](https://github.com/airbytehq/airbyte/pull/32656) | Introduce TestDatabase test fixture, refactor database source test base classes. | -| 0.4.11 | 2023-11-14 | [\#32526](https://github.com/airbytehq/airbyte/pull/32526) | Clean up memory manager logs. | -| 0.4.10 | 2023-11-13 | [\#32285](https://github.com/airbytehq/airbyte/pull/32285) | Fix UUID codec ordering for MongoDB connector | -| 0.4.9 | 2023-11-13 | [\#32468](https://github.com/airbytehq/airbyte/pull/32468) | Further error grouping improvements for DV2 connectors | -| 0.4.8 | 2023-11-09 | [\#32377](https://github.com/airbytehq/airbyte/pull/32377) | source-postgres tests: skip dropping database | -| 0.4.7 | 2023-11-08 | [\#31856](https://github.com/airbytehq/airbyte/pull/31856) | source-postgres: support for infinity date and timestamps | -| 0.4.5 | 2023-11-07 | [\#32112](https://github.com/airbytehq/airbyte/pull/32112) | Async destinations framework: Allow configuring the queue flush threshold | -| 0.4.4 | 2023-11-06 | [\#32119](https://github.com/airbytehq/airbyte/pull/32119) | Add STANDARD UUID codec to MongoDB debezium handler | -| 0.4.2 | 2023-11-06 | [\#32190](https://github.com/airbytehq/airbyte/pull/32190) | Improve error deinterpolation | -| 0.4.1 | 2023-11-02 | [\#32192](https://github.com/airbytehq/airbyte/pull/32192) | Add 's3-destinations' CDK module. | -| 0.4.0 | 2023-11-02 | [\#32050](https://github.com/airbytehq/airbyte/pull/32050) | Fix compiler warnings. | -| 0.3.0 | 2023-11-02 | [\#31983](https://github.com/airbytehq/airbyte/pull/31983) | Add deinterpolation feature to AirbyteExceptionHandler. | -| 0.2.4 | 2023-10-31 | [\#31807](https://github.com/airbytehq/airbyte/pull/31807) | Handle case of debezium update and delete of records in mongodb. | -| 0.2.3 | 2023-10-31 | [\#32022](https://github.com/airbytehq/airbyte/pull/32022) | Update Debezium version from 2.20 -> 2.4.0. | -| 0.2.2 | 2023-10-31 | [\#31976](https://github.com/airbytehq/airbyte/pull/31976) | Debezium tweaks to make tests run faster. | -| 0.2.0 | 2023-10-30 | [\#31960](https://github.com/airbytehq/airbyte/pull/31960) | Hoist top-level gradle subprojects into CDK. | -| 0.1.12 | 2023-10-24 | [\#31674](https://github.com/airbytehq/airbyte/pull/31674) | Fail sync when Debezium does not shut down properly. | -| 0.1.11 | 2023-10-18 | [\#31486](https://github.com/airbytehq/airbyte/pull/31486) | Update constants in AdaptiveSourceRunner. | -| 0.1.9 | 2023-10-12 | [\#31309](https://github.com/airbytehq/airbyte/pull/31309) | Use toPlainString() when handling BigDecimals in PostgresConverter | -| 0.1.8 | 2023-10-11 | [\#31322](https://github.com/airbytehq/airbyte/pull/31322) | Cap log line length to 32KB to prevent loss of records | -| 0.1.7 | 2023-10-10 | [\#31194](https://github.com/airbytehq/airbyte/pull/31194) | Deallocate unused per stream buffer memory when empty | -| 0.1.6 | 2023-10-10 | [\#31083](https://github.com/airbytehq/airbyte/pull/31083) | Fix precision of numeric values in async destinations | -| 0.1.5 | 2023-10-09 | [\#31196](https://github.com/airbytehq/airbyte/pull/31196) | Update typo in CDK (CDN_LSN -> CDC_LSN) | -| 0.1.4 | 2023-10-06 | [\#31139](https://github.com/airbytehq/airbyte/pull/31139) | Reduce async buffer | -| 0.1.1 | 2023-09-28 | [\#30835](https://github.com/airbytehq/airbyte/pull/30835) | JDBC destinations now avoid staging area name collisions by using the raw table name as the stage name. (previously we used the stream name as the stage name) | -| 0.1.0 | 2023-09-27 | [\#30445](https://github.com/airbytehq/airbyte/pull/30445) | First launch, including shared classes for all connectors. | -| 0.0.2 | 2023-08-21 | [\#28687](https://github.com/airbytehq/airbyte/pull/28687) | Version bump only (no other changes). | -| 0.0.1 | 2023-08-08 | [\#28687](https://github.com/airbytehq/airbyte/pull/28687) | Initial release for testing. | +| Version | Date | Pull Request | Subject | +|:------------|:-----------|:-------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------| +| 0.44.5 | 2024-08-09 | [\#43374](https://github.com/airbytehq/airbyte/pull/43374) | S3 destination V2 fields, conversion improvements, bugfixes | +| 0.44.4 | 2024-08-08 | [\#43410](https://github.com/airbytehq/airbyte/pull/43330) | Better logs for counting info to state message. | +| 0.44.3 | 2024-08-07 | [\#43330](https://github.com/airbytehq/airbyte/pull/43330) | make TypingDedupingTest aware of column name renaming. | +| 0.44.3 | 2024-08-07 | [\#43329](https://github.com/airbytehq/airbyte/pull/43329) | move generationIdHandling to its own class. | +| 0.44.2 | 2024-08-06 | [\#42869](https://github.com/airbytehq/airbyte/pull/42869) | Add logs about counting info to state message. | +| 0.44.1 | 2024-08-01 | [\#42550](https://github.com/airbytehq/airbyte/pull/42550) | Fix error on reporting counts. | +| 0.44.0 | 2024-08-01 | [\#42405](https://github.com/airbytehq/airbyte/pull/42405) | s3-destinations: Use async framework, adapt to support refreshes | +| 0.43.6 | 2024-07-30 | [\#42540](https://github.com/airbytehq/airbyte/pull/42540) | Fix generationId handling for destinations | +| 0.43.6 | 2024-07-30 | [\#42514](https://github.com/airbytehq/airbyte/pull/42514) | Add tests around generationId handling for destinations. | +| 0.43.4 | 2024-07-28 | [\#42839](https://github.com/airbytehq/airbyte/pull/42839) | Fix error translation framework to not rethrow ConfigErrorException and TransientErrorException. | +| 0.43.3 | 2024-07-22 | [\#42417](https://github.com/airbytehq/airbyte/pull/42417) | Handle null exception message in ConnectorExceptionHandler. | +| 0.43.2 | 2024-07-22 | [\#42431](https://github.com/airbytehq/airbyte/pull/42431) | Filter out debezium message change events | +| 0.43.1 | 2024-07-22 | [\#41622](https://github.com/airbytehq/airbyte/pull/41622) | Fix null safety bug in debezium event processing | +| 0.43.0 | 2024-07-17 | [\#41954](https://github.com/airbytehq/airbyte/pull/41954) | fix refreshes for connectors using the old SqlOperations | +| 0.43.0 | 2024-07-17 | [\#42017](https://github.com/airbytehq/airbyte/pull/42017) | bump postgres-jdbc version | +| 0.43.0 | 2024-07-17 | [\#42015](https://github.com/airbytehq/airbyte/pull/42015) | wait until migration before creating the Writeconfig objects | +| 0.43.0 | 2024-07-17 | [\#41953](https://github.com/airbytehq/airbyte/pull/41953) | add generationId and syncId to SqlOperations functions | +| 0.43.0 | 2024-07-17 | [\#41952](https://github.com/airbytehq/airbyte/pull/41952) | rename and add fields in WriteConfig | +| 0.43.0 | 2024-07-17 | [\#41951](https://github.com/airbytehq/airbyte/pull/41951) | remove nullables in JdbcBufferedConsumerFactory | +| 0.43.0 | 2024-07-17 | [\#41950](https://github.com/airbytehq/airbyte/pull/41950) | remove unused classes | +| 0.42.2 | 2024-07-21 | [\#42122](https://github.com/airbytehq/airbyte/pull/42122) | Support for Debezium resync and shutdown scenarios. | +| 0.42.2 | 2024-07-04 | [\#40208](https://github.com/airbytehq/airbyte/pull/40208) | Implement a new connector error handling and translation framework | +| 0.41.8 | 2024-07-18 | [\#42068](https://github.com/airbytehq/airbyte/pull/42068) | Add analytics message for WASS occurrence. | +| 0.41.7 | 2024-07-17 | [\#42055](https://github.com/airbytehq/airbyte/pull/42055) | Add debezium heartbeat timeout back to shutdown debezium. | +| 0.41.6 | 2024-07-17 | [\#41996](https://github.com/airbytehq/airbyte/pull/41996) | Fix java interop compilation issue in Config/TransientErrorException. | +| 0.41.5 | 2024-07-16 | [\#42011] (https://github.com/airbytehq/airbyte/pull/42011) | Async consumer accepts null default namespace | +| 0.41.4 | 2024-07-15 | [\#41959](https://github.com/airbytehq/airbyte/pull/41959) | Allow setting `internal_message` in Config/TransientErrorException. Destinations: shorten error message for INCOMPLETE stream status. | +| 0.41.3 | 2024-07-15 | [\#41680](https://github.com/airbytehq/airbyte/pull/41680) | Fix: CompletableFutures.allOf now handles empty list and `Throwable` | +| 0.41.2 | 2024-07-12 | [\#40567](https://github.com/airbytehq/airbyte/pull/40567) | Fix BaseSqlGenerator test case (generation_id support); update minimum platform version for refreshes support. | +| 0.41.1 | 2024-07-11 | [\#41212](https://github.com/airbytehq/airbyte/pull/41212) | Improve debezium logging. | +| 0.41.0 | 2024-07-11 | [\#38240](https://github.com/airbytehq/airbyte/pull/38240) | Sources : Changes in CDC interfaces to support WASS algorithm | +| 0.40.11 | 2024-07-08 | [\#41041](https://github.com/airbytehq/airbyte/pull/41041) | Destinations: Fix truncate refreshes incorrectly discarding data if successful attempt had 0 records | +| 0.40.10 | 2024-07-05 | [\#40719](https://github.com/airbytehq/airbyte/pull/40719) | Update test to refrlect isResumable field in catalog | +| 0.40.9 | 2024-07-01 | [\#39473](https://github.com/airbytehq/airbyte/pull/39473) | minor changes around error logging and testing | +| 0.40.8 | 2024-07-01 | [\#40499](https://github.com/airbytehq/airbyte/pull/40499) | Make JdbcDatabase SQL statement logging optional; add generation_id support to JdbcSqlGenerator | +| 0.40.7 | 2024-07-01 | [\#40516](https://github.com/airbytehq/airbyte/pull/40516) | Remove dbz hearbeat. | +| ~~0.40.6~~ | | | (this version does not exist) | +| 0.40.5 | 2024-06-26 | [\#40517](https://github.com/airbytehq/airbyte/pull/40517) | JdbcDatabase.executeWithinTransaction allows disabling SQL statement logging | +| 0.40.4 | 2024-06-18 | [\#40254](https://github.com/airbytehq/airbyte/pull/40254) | Destinations: Do not throw on unrecognized airbyte message type (ignore message instead) | +| 0.40.3 | 2024-06-18 | [\#39526](https://github.com/airbytehq/airbyte/pull/39526) | Destinations: INCOMPLETE stream status is a TRANSIENT error rather than SYSTEM | +| 0.40.2 | 2024-06-18 | [\#39552](https://github.com/airbytehq/airbyte/pull/39552) | Destinations: Throw error if the ConfiguredCatalog has no streams | +| 0.40.1 | 2024-06-14 | [\#39349](https://github.com/airbytehq/airbyte/pull/39349) | Source stats for full refresh streams | +| 0.40.0 | 2024-06-17 | [\#38622](https://github.com/airbytehq/airbyte/pull/38622) | Destinations: Implement refreshes logic in AbstractStreamOperation | +| 0.39.0 | 2024-06-17 | [\#38067](https://github.com/airbytehq/airbyte/pull/38067) | Destinations: Breaking changes for refreshes (fail on INCOMPLETE stream status; ignore OVERWRITE sync mode) | +| 0.38.3 | 2024-06-25 | [\#40499](https://github.com/airbytehq/airbyte/pull/40499) | (backport) Make JdbcDatabase SQL statement logging optional; add generation_id support to JdbcSqlGenerator | +| 0.38.2 | 2024-06-14 | [\#39460](https://github.com/airbytehq/airbyte/pull/39460) | Bump postgres JDBC driver version | +| 0.38.1 | 2024-06-13 | [\#39445](https://github.com/airbytehq/airbyte/pull/39445) | Sources: More CDK changes to handle big initial snapshots. | +| 0.38.0 | 2024-06-11 | [\#39405](https://github.com/airbytehq/airbyte/pull/39405) | Sources: Debezium properties manager interface changed to accept a list of streams to scope to | +| 0.37.1 | 2024-06-10 | [\#38075](https://github.com/airbytehq/airbyte/pull/38075) | Destinations: Track stream statuses in async framework | +| 0.37.0 | 2024-06-10 | [\#38121](https://github.com/airbytehq/airbyte/pull/38121) | Destinations: Set default namespace via CatalogParser | +| 0.36.8 | 2024-06-07 | [\#38763](https://github.com/airbytehq/airbyte/pull/38763) | Increase Jackson message length limit | +| 0.36.7 | 2024-06-06 | [\#39220](https://github.com/airbytehq/airbyte/pull/39220) | Handle null messages in ConnectorExceptionUtil | +| 0.36.6 | 2024-06-05 | [\#39106](https://github.com/airbytehq/airbyte/pull/39106) | Skip write to storage with 0 byte file | +| 0.36.5 | 2024-06-01 | [\#38792](https://github.com/airbytehq/airbyte/pull/38792) | Throw config exception if no selectable table exists in user provided schemas | +| 0.36.4 | 2024-05-31 | [\#38824](https://github.com/airbytehq/airbyte/pull/38824) | Param marked as non-null to nullable in JdbcDestinationHandler for NPE fix | +| 0.36.2 | 2024-05-29 | [\#38538](https://github.com/airbytehq/airbyte/pull/38357) | Exit connector when encountering a config error. | +| 0.36.0 | 2024-05-29 | [\#38358](https://github.com/airbytehq/airbyte/pull/38358) | Plumb generation_id / sync_id to destinations code | +| 0.35.16 | 2024-06-25 | [\#40517](https://github.com/airbytehq/airbyte/pull/40517) | (backport) JdbcDatabase.executeWithinTransaction allows disabling SQL statement logging | +| 0.35.15 | 2024-05-31 | [\#38824](https://github.com/airbytehq/airbyte/pull/38824) | Param marked as non-null to nullable in JdbcDestinationHandler for NPE fix | +| 0.35.14 | 2024-05-28 | [\#38738](https://github.com/airbytehq/airbyte/pull/38738) | make ThreadCreationInfo cast as nullable | +| 0.35.13 | 2024-05-28 | [\#38632](https://github.com/airbytehq/airbyte/pull/38632) | minor changes to allow conversion of snowflake tests to kotlin | +| 0.35.12 | 2024-05-23 | [\#38638](https://github.com/airbytehq/airbyte/pull/38638) | Minor change to support Snowflake conversion to Kotlin | +| 0.35.11 | 2024-05-23 | [\#38357](https://github.com/airbytehq/airbyte/pull/38357) | This release fixes an error on the previous release. | +| 0.35.10 | 2024-05-23 | [\#38357](https://github.com/airbytehq/airbyte/pull/38357) | Add shared code for db sources stream status trace messages and testing. | +| 0.35.9 | 2024-05-23 | [\#38586](https://github.com/airbytehq/airbyte/pull/38586) | code cleanup | +| 0.35.9 | 2024-05-23 | [\#37583](https://github.com/airbytehq/airbyte/pull/37583) | code cleanup | +| 0.35.9 | 2024-05-23 | [\#37555](https://github.com/airbytehq/airbyte/pull/37555) | code cleanup | +| 0.35.9 | 2024-05-23 | [\#37540](https://github.com/airbytehq/airbyte/pull/37540) | code cleanup | +| 0.35.9 | 2024-05-23 | [\#37539](https://github.com/airbytehq/airbyte/pull/37539) | code cleanup | +| 0.35.9 | 2024-05-23 | [\#37538](https://github.com/airbytehq/airbyte/pull/37538) | code cleanup | +| 0.35.9 | 2024-05-23 | [\#37537](https://github.com/airbytehq/airbyte/pull/37537) | code cleanup | +| 0.35.9 | 2024-05-23 | [\#37518](https://github.com/airbytehq/airbyte/pull/37518) | code cleanup | +| 0.35.8 | 2024-05-22 | [\#38572](https://github.com/airbytehq/airbyte/pull/38572) | Add a temporary static method to decouple SnowflakeDestination from AbstractJdbcDestination | +| 0.35.7 | 2024-05-20 | [\#38357](https://github.com/airbytehq/airbyte/pull/38357) | Decouple create namespace from per stream operation interface. | +| 0.35.6 | 2024-05-17 | [\#38107](https://github.com/airbytehq/airbyte/pull/38107) | New interfaces for Destination connectors to plug into AsyncStreamConsumer | +| 0.35.5 | 2024-05-17 | [\#38204](https://github.com/airbytehq/airbyte/pull/38204) | add assume-role authentication to s3 | +| 0.35.2 | 2024-05-13 | [\#38104](https://github.com/airbytehq/airbyte/pull/38104) | Handle transient error messages | +| 0.35.0 | 2024-05-13 | [\#38127](https://github.com/airbytehq/airbyte/pull/38127) | Destinations: Populate generation/sync ID on StreamConfig | +| 0.34.4 | 2024-05-10 | [\#37712](https://github.com/airbytehq/airbyte/pull/37712) | make sure the exceptionHandler always terminates | +| 0.34.3 | 2024-05-10 | [\#38095](https://github.com/airbytehq/airbyte/pull/38095) | Minor changes for databricks connector | +| 0.34.1 | 2024-05-07 | [\#38030](https://github.com/airbytehq/airbyte/pull/38030) | Add support for transient errors | +| 0.34.0 | 2024-05-01 | [\#37712](https://github.com/airbytehq/airbyte/pull/37712) | Destinations: Remove incremental T+D | +| 0.33.2 | 2024-05-03 | [\#37824](https://github.com/airbytehq/airbyte/pull/37824) | improve source acceptance tests | +| 0.33.1 | 2024-05-03 | [\#37824](https://github.com/airbytehq/airbyte/pull/37824) | Add a unit test for cursor based sync | +| 0.33.0 | 2024-05-03 | [\#36935](https://github.com/airbytehq/airbyte/pull/36935) | Destinations: Enable non-safe-casting DV2 tests | +| 0.32.0 | 2024-05-03 | [\#36929](https://github.com/airbytehq/airbyte/pull/36929) | Destinations: Assorted DV2 changes for mysql | +| 0.31.7 | 2024-05-02 | [\#36910](https://github.com/airbytehq/airbyte/pull/36910) | changes for destination-snowflake | +| 0.31.6 | 2024-05-02 | [\#37746](https://github.com/airbytehq/airbyte/pull/37746) | debuggability improvements. | +| 0.31.5 | 2024-04-30 | [\#37758](https://github.com/airbytehq/airbyte/pull/37758) | Set debezium max retries to zero | +| 0.31.4 | 2024-04-30 | [\#37754](https://github.com/airbytehq/airbyte/pull/37754) | Add DebeziumEngine notification log | +| 0.31.3 | 2024-04-30 | [\#37726](https://github.com/airbytehq/airbyte/pull/37726) | Remove debezium retries | +| 0.31.2 | 2024-04-30 | [\#37507](https://github.com/airbytehq/airbyte/pull/37507) | Better error messages when switching between global/per-stream modes. | +| 0.31.0 | 2024-04-26 | [\#37584](https://github.com/airbytehq/airbyte/pull/37584) | Update S3 destination deps to exclude zookeeper and hadoop-yarn-common | +| 0.30.11 | 2024-04-25 | [\#36899](https://github.com/airbytehq/airbyte/pull/36899) | changes for bigQuery destination. | +| 0.30.10 | 2024-04-24 | [\#37541](https://github.com/airbytehq/airbyte/pull/37541) | remove excessive logging | +| 0.30.9 | 2024-04-24 | [\#37477](https://github.com/airbytehq/airbyte/pull/37477) | remove unnecessary logs | +| 0.30.7 | 2024-04-23 | [\#37477](https://github.com/airbytehq/airbyte/pull/37477) | fix kotlin warnings in core CDK submodule | +| 0.30.7 | 2024-04-23 | [\#37484](https://github.com/airbytehq/airbyte/pull/37484) | fix kotlin warnings in dependencies CDK submodule | +| 0.30.7 | 2024-04-23 | [\#37479](https://github.com/airbytehq/airbyte/pull/37479) | fix kotlin warnings in azure-destination, datastore-{bigquery,mongo,postgres} CDK submodules | +| 0.30.7 | 2024-04-23 | [\#37481](https://github.com/airbytehq/airbyte/pull/37481) | fix kotlin warnings in destination CDK submodules | +| 0.30.7 | 2024-04-23 | [\#37482](https://github.com/airbytehq/airbyte/pull/37482) | fix kotlin warnings in db-sources CDK submodule | +| 0.30.6 | 2024-04-19 | [\#37442](https://github.com/airbytehq/airbyte/pull/37442) | Destinations: Rename File format related classes to be agnostic of S3 | +| 0.30.3 | 2024-04-12 | [\#37106](https://github.com/airbytehq/airbyte/pull/37106) | Destinations: Simplify constructors in `AsyncStreamConsumer` | +| 0.30.2 | 2024-04-12 | [\#36926](https://github.com/airbytehq/airbyte/pull/36926) | Destinations: Remove `JdbcSqlOperations#formatData`; misc changes for java interop | +| 0.30.1 | 2024-04-11 | [\#36919](https://github.com/airbytehq/airbyte/pull/36919) | Fix regression in sources conversion of null values | +| 0.30.0 | 2024-04-11 | [\#36974](https://github.com/airbytehq/airbyte/pull/36974) | Destinations: Pass config to jdbc sqlgenerator; allow cascade drop | +| 0.29.13 | 2024-04-10 | [\#36981](https://github.com/airbytehq/airbyte/pull/36981) | DB sources : Emit analytics for data type serialization errors. | +| 0.29.12 | 2024-04-10 | [\#36973](https://github.com/airbytehq/airbyte/pull/36973) | Destinations: Make flush batch size configurable for JdbcInsertFlush | +| 0.29.11 | 2024-04-10 | [\#36865](https://github.com/airbytehq/airbyte/pull/36865) | Sources : Remove noisy log line. | +| 0.29.10 | 2024-04-10 | [\#36805](https://github.com/airbytehq/airbyte/pull/36805) | Destinations: Enhance CatalogParser name collision handling; add DV2 tests for long identifiers | +| 0.29.9 | 2024-04-09 | [\#36047](https://github.com/airbytehq/airbyte/pull/36047) | Destinations: CDK updates for raw-only destinations | +| 0.29.8 | 2024-04-08 | [\#36868](https://github.com/airbytehq/airbyte/pull/36868) | Destinations: s3-destinations Compilation fixes for connector | +| 0.29.7 | 2024-04-08 | [\#36768](https://github.com/airbytehq/airbyte/pull/36768) | Destinations: Make destination state fetch/commit logic more resilient to errors | +| 0.29.6 | 2024-04-05 | [\#36577](https://github.com/airbytehq/airbyte/pull/36577) | Do not send system_error trace message for config exceptions. | +| 0.29.5 | 2024-04-05 | [\#36620](https://github.com/airbytehq/airbyte/pull/36620) | Missed changes - open for extension for destination-postgres | +| 0.29.3 | 2024-04-04 | [\#36759](https://github.com/airbytehq/airbyte/pull/36759) | Minor fixes. | +| 0.29.3 | 2024-04-04 | [\#36706](https://github.com/airbytehq/airbyte/pull/36706) | Enabling spotbugs for s3-destination. | +| 0.29.3 | 2024-04-03 | [\#36705](https://github.com/airbytehq/airbyte/pull/36705) | Enabling spotbugs for db-sources. | +| 0.29.3 | 2024-04-03 | [\#36704](https://github.com/airbytehq/airbyte/pull/36704) | Enabling spotbugs for datastore-postgres. | +| 0.29.3 | 2024-04-03 | [\#36703](https://github.com/airbytehq/airbyte/pull/36703) | Enabling spotbugs for gcs-destination. | +| 0.29.3 | 2024-04-03 | [\#36702](https://github.com/airbytehq/airbyte/pull/36702) | Enabling spotbugs for db-destinations. | +| 0.29.3 | 2024-04-03 | [\#36701](https://github.com/airbytehq/airbyte/pull/36701) | Enabling spotbugs for typing_and_deduping. | +| 0.29.3 | 2024-04-03 | [\#36612](https://github.com/airbytehq/airbyte/pull/36612) | Enabling spotbugs for dependencies. | +| 0.29.5 | 2024-04-05 | [\#36577](https://github.com/airbytehq/airbyte/pull/36577) | Do not send system_error trace message for config exceptions. | +| 0.29.3 | 2024-04-04 | [\#36759](https://github.com/airbytehq/airbyte/pull/36759) | Minor fixes. | +| 0.29.3 | 2024-04-04 | [\#36706](https://github.com/airbytehq/airbyte/pull/36706) | Enabling spotbugs for s3-destination. | +| 0.29.3 | 2024-04-03 | [\#36705](https://github.com/airbytehq/airbyte/pull/36705) | Enabling spotbugs for db-sources. | +| 0.29.3 | 2024-04-03 | [\#36704](https://github.com/airbytehq/airbyte/pull/36704) | Enabling spotbugs for datastore-postgres. | +| 0.29.3 | 2024-04-03 | [\#36703](https://github.com/airbytehq/airbyte/pull/36703) | Enabling spotbugs for gcs-destination. | +| 0.29.3 | 2024-04-03 | [\#36702](https://github.com/airbytehq/airbyte/pull/36702) | Enabling spotbugs for db-destinations. | +| 0.29.3 | 2024-04-03 | [\#36701](https://github.com/airbytehq/airbyte/pull/36701) | Enabling spotbugs for typing_and_deduping. | +| 0.29.3 | 2024-04-03 | [\#36612](https://github.com/airbytehq/airbyte/pull/36612) | Enabling spotbugs for dependencies. | +| 0.29.2 | 2024-04-04 | [\#36845](https://github.com/airbytehq/airbyte/pull/36772) | Changes to make source-mongo compileable | +| 0.29.1 | 2024-04-03 | [\#36772](https://github.com/airbytehq/airbyte/pull/36772) | Changes to make source-mssql compileable | +| 0.29.0 | 2024-04-02 | [\#36759](https://github.com/airbytehq/airbyte/pull/36759) | Build artifact publication changes and fixes. | +| 0.28.21 | 2024-04-02 | [\#36673](https://github.com/airbytehq/airbyte/pull/36673) | Change the destination message parsing to use standard java/kotlin classes. Adds logging to catch empty lines. | +| 0.28.20 | 2024-04-01 | [\#36584](https://github.com/airbytehq/airbyte/pull/36584) | Changes to make source-postgres compileable | +| 0.28.19 | 2024-03-29 | [\#36619](https://github.com/airbytehq/airbyte/pull/36619) | Changes to make destination-postgres compileable | +| 0.28.19 | 2024-03-29 | [\#36588](https://github.com/airbytehq/airbyte/pull/36588) | Changes to make destination-redshift compileable | +| 0.28.19 | 2024-03-29 | [\#36610](https://github.com/airbytehq/airbyte/pull/36610) | remove airbyte-api generation, pull depdendency jars instead | +| 0.28.19 | 2024-03-29 | [\#36611](https://github.com/airbytehq/airbyte/pull/36611) | disable spotbugs for CDK tes and testFixtures tasks | +| 0.28.18 | 2024-03-28 | [\#36606](https://github.com/airbytehq/airbyte/pull/36574) | disable spotbugs for CDK tes and testFixtures tasks | +| 0.28.18 | 2024-03-28 | [\#36574](https://github.com/airbytehq/airbyte/pull/36574) | Fix ContainerFactory | +| 0.28.18 | 2024-03-27 | [\#36570](https://github.com/airbytehq/airbyte/pull/36570) | Convert missing s3-destinations tests to Kotlin | +| 0.28.18 | 2024-03-27 | [\#36446](https://github.com/airbytehq/airbyte/pull/36446) | Convert dependencies submodule to Kotlin | +| 0.28.18 | 2024-03-27 | [\#36445](https://github.com/airbytehq/airbyte/pull/36445) | Convert functional out Checked interfaces to kotlin | +| 0.28.18 | 2024-03-27 | [\#36444](https://github.com/airbytehq/airbyte/pull/36444) | Use apache-commons classes in our Checked functional interfaces | +| 0.28.18 | 2024-03-27 | [\#36467](https://github.com/airbytehq/airbyte/pull/36467) | Convert #36465 to Kotlin | +| 0.28.18 | 2024-03-27 | [\#36473](https://github.com/airbytehq/airbyte/pull/36473) | Convert convert #36396 to Kotlin | +| 0.28.18 | 2024-03-27 | [\#36439](https://github.com/airbytehq/airbyte/pull/36439) | Convert db-destinations submodule to Kotlin | +| 0.28.18 | 2024-03-27 | [\#36438](https://github.com/airbytehq/airbyte/pull/36438) | Convert db-sources submodule to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36437](https://github.com/airbytehq/airbyte/pull/36437) | Convert gsc submodule to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36421](https://github.com/airbytehq/airbyte/pull/36421) | Convert typing-deduping submodule to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36420](https://github.com/airbytehq/airbyte/pull/36420) | Convert s3-destinations submodule to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36419](https://github.com/airbytehq/airbyte/pull/36419) | Convert azure submodule to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36413](https://github.com/airbytehq/airbyte/pull/36413) | Convert postgres submodule to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36412](https://github.com/airbytehq/airbyte/pull/36412) | Convert mongodb submodule to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36411](https://github.com/airbytehq/airbyte/pull/36411) | Convert datastore-bigquery submodule to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36205](https://github.com/airbytehq/airbyte/pull/36205) | Convert core/main to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36204](https://github.com/airbytehq/airbyte/pull/36204) | Convert core/test to Kotlin | +| 0.28.18 | 2024-03-26 | [\#36190](https://github.com/airbytehq/airbyte/pull/36190) | Convert core/testFixtures to Kotlin | +| 0.28.0 | 2024-03-26 | [\#36514](https://github.com/airbytehq/airbyte/pull/36514) | Bump CDK version to 0.28.0 | +| 0.27.7 | 2024-03-26 | [\#36466](https://github.com/airbytehq/airbyte/pull/36466) | Destinations: fix support for case-sensitive fields in destination state. | +| 0.27.6 | 2024-03-26 | [\#36432](https://github.com/airbytehq/airbyte/pull/36432) | Sources support for AirbyteRecordMessageMeta during reading source data types. | +| 0.27.5 | 2024-03-25 | [\#36461](https://github.com/airbytehq/airbyte/pull/36461) | Destinations: Handle case-sensitive columns in destination state handling. | +| 0.27.4 | 2024-03-25 | [\#36333](https://github.com/airbytehq/airbyte/pull/36333) | Sunset DebeziumSourceDecoratingIterator. | +| 0.27.1 | 2024-03-22 | [\#36296](https://github.com/airbytehq/airbyte/pull/36296) | Destinations: (async framework) Do not log invalid message data. | +| 0.27.0 | 2024-03-21 | [\#36364](https://github.com/airbytehq/airbyte/pull/36364) | Sources: Increase debezium initial record wait time to 40 minute. | +| 0.26.1 | 2024-03-19 | [\#35599](https://github.com/airbytehq/airbyte/pull/35599) | Sunset SourceDecoratingIterator. | +| 0.26.0 | 2024-03-19 | [\#36263](https://github.com/airbytehq/airbyte/pull/36263) | Improve conversion of debezium Date type for some edge case in mssql. | +| 0.25.0 | 2024-03-18 | [\#36203](https://github.com/airbytehq/airbyte/pull/36203) | Wiring of Transformer to StagingConsumerFactory and JdbcBufferedConsumerFactory; import changes for Kotlin conversion; State message logs to debug | +| 0.24.1 | 2024-03-13 | [\#36022](https://github.com/airbytehq/airbyte/pull/36022) | Move log4j2-test.xml to test fixtures, away from runtime classpath. | +| 0.24.0 | 2024-03-13 | [\#35944](https://github.com/airbytehq/airbyte/pull/35944) | Add `_airbyte_meta` in raw table and test fixture updates | +| 0.23.20 | 2024-03-12 | [\#36011](https://github.com/airbytehq/airbyte/pull/36011) | Debezium configuration for conversion of null value on a column with default value. | +| 0.23.19 | 2024-03-11 | [\#35904](https://github.com/airbytehq/airbyte/pull/35904) | Add retries to the debezium engine. | +| 0.23.18 | 2024-03-07 | [\#35899](https://github.com/airbytehq/airbyte/pull/35899) | Null check when retrieving destination state | +| 0.23.16 | 2024-03-06 | [\#35842](https://github.com/airbytehq/airbyte/pull/35842) | Improve logging in debezium processing. | +| 0.23.15 | 2024-03-05 | [\#35827](https://github.com/airbytehq/airbyte/pull/35827) | improving the Junit interceptor. | +| 0.23.14 | 2024-03-05 | [\#35739](https://github.com/airbytehq/airbyte/pull/35739) | Add logging to the CDC queue size. Fix the ContainerFactory. | +| 0.23.13 | 2024-03-04 | [\#35774](https://github.com/airbytehq/airbyte/pull/35774) | minor changes to the CDK test fixtures. | +| 0.23.12 | 2024-03-01 | [\#35767](https://github.com/airbytehq/airbyte/pull/35767) | introducing a timeout for java tests. | +| 0.23.11 | 2024-03-01 | [\#35313](https://github.com/airbytehq/airbyte/pull/35313) | Preserve timezone offset in CSV writer for destinations | +| 0.23.10 | 2024-03-01 | [\#35303](https://github.com/airbytehq/airbyte/pull/35303) | Migration framework with DestinationState for softReset | +| 0.23.9 | 2024-02-29 | [\#35720](https://github.com/airbytehq/airbyte/pull/35720) | various improvements for tests TestDataHolder | +| 0.23.8 | 2024-02-28 | [\#35529](https://github.com/airbytehq/airbyte/pull/35529) | Refactor on state iterators | +| 0.23.7 | 2024-02-28 | [\#35376](https://github.com/airbytehq/airbyte/pull/35376) | Extract typereduper migrations to separte method | +| 0.23.6 | 2024-02-26 | [\#35647](https://github.com/airbytehq/airbyte/pull/35647) | Add a getNamespace into TestDataHolder | +| 0.23.5 | 2024-02-26 | [\#35512](https://github.com/airbytehq/airbyte/pull/35512) | Remove @DisplayName from all CDK tests. | +| 0.23.4 | 2024-02-26 | [\#35507](https://github.com/airbytehq/airbyte/pull/35507) | Add more logs into TestDatabase. | +| 0.23.3 | 2024-02-26 | [\#35495](https://github.com/airbytehq/airbyte/pull/35495) | Fix Junit Interceptor to print better stacktraces | +| 0.23.2 | 2024-02-22 | [\#35385](https://github.com/airbytehq/airbyte/pull/35342) | Bugfix: inverted logic of disableTypeDedupe flag | +| 0.23.1 | 2024-02-22 | [\#35527](https://github.com/airbytehq/airbyte/pull/35527) | reduce shutdow timeouts | +| 0.23.0 | 2024-02-22 | [\#35342](https://github.com/airbytehq/airbyte/pull/35342) | Consolidate and perform upfront gathering of DB metadata state | +| 0.21.4 | 2024-02-21 | [\#35511](https://github.com/airbytehq/airbyte/pull/35511) | Reduce CDC state compression limit to 1MB | +| 0.21.3 | 2024-02-20 | [\#35394](https://github.com/airbytehq/airbyte/pull/35394) | Add Junit progress information to the test logs | +| 0.21.2 | 2024-02-20 | [\#34978](https://github.com/airbytehq/airbyte/pull/34978) | Reduce log noise in NormalizationLogParser. | +| 0.21.1 | 2024-02-20 | [\#35199](https://github.com/airbytehq/airbyte/pull/35199) | Add thread names to the logs. | +| 0.21.0 | 2024-02-16 | [\#35314](https://github.com/airbytehq/airbyte/pull/35314) | Delete S3StreamCopier classes. These have been superseded by the async destinations framework. | +| 0.20.9 | 2024-02-15 | [\#35240](https://github.com/airbytehq/airbyte/pull/35240) | Make state emission to platform inside state manager itself. | +| 0.20.8 | 2024-02-15 | [\#35285](https://github.com/airbytehq/airbyte/pull/35285) | Improve blobstore module structure. | +| 0.20.7 | 2024-02-13 | [\#35236](https://github.com/airbytehq/airbyte/pull/35236) | output logs to files in addition to stdout when running tests | +| 0.20.6 | 2024-02-12 | [\#35036](https://github.com/airbytehq/airbyte/pull/35036) | Add trace utility to emit analytics messages. | +| 0.20.5 | 2024-02-13 | [\#34869](https://github.com/airbytehq/airbyte/pull/34869) | Don't emit final state in SourceStateIterator there is an underlying stream failure. | +| 0.20.4 | 2024-02-12 | [\#35042](https://github.com/airbytehq/airbyte/pull/35042) | Use delegate's isDestinationV2 invocation in SshWrappedDestination. | +| 0.20.3 | 2024-02-09 | [\#34580](https://github.com/airbytehq/airbyte/pull/34580) | Support special chars in mysql/mssql database name. | +| 0.20.2 | 2024-02-12 | [\#35111](https://github.com/airbytehq/airbyte/pull/35144) | Make state emission from async framework synchronized. | +| 0.20.1 | 2024-02-11 | [\#35111](https://github.com/airbytehq/airbyte/pull/35111) | Fix GlobalAsyncStateManager stats counting logic. | +| 0.20.0 | 2024-02-09 | [\#34562](https://github.com/airbytehq/airbyte/pull/34562) | Add new test cases to BaseTypingDedupingTest to exercise special characters. | +| 0.19.0 | 2024-02-01 | [\#34745](https://github.com/airbytehq/airbyte/pull/34745) | Reorganize CDK module structure. | +| 0.18.0 | 2024-02-08 | [\#33606](https://github.com/airbytehq/airbyte/pull/33606) | Add updated Initial and Incremental Stream State definitions for DB Sources. | +| 0.17.1 | 2024-02-08 | [\#35027](https://github.com/airbytehq/airbyte/pull/35027) | Make state handling thread safe in async destination framework. | +| 0.17.0 | 2024-02-08 | [\#34502](https://github.com/airbytehq/airbyte/pull/34502) | Enable configuring async destination batch size. | +| 0.16.6 | 2024-02-07 | [\#34892](https://github.com/airbytehq/airbyte/pull/34892) | Improved testcontainers logging and support for unshared containers. | +| 0.16.5 | 2024-02-07 | [\#34948](https://github.com/airbytehq/airbyte/pull/34948) | Fix source state stats counting logic | +| 0.16.4 | 2024-02-01 | [\#34727](https://github.com/airbytehq/airbyte/pull/34727) | Add future based stdout consumer in BaseTypingDedupingTest | +| 0.16.3 | 2024-01-30 | [\#34669](https://github.com/airbytehq/airbyte/pull/34669) | Fix org.apache.logging.log4j:log4j-slf4j-impl version conflicts. | +| 0.16.2 | 2024-01-29 | [\#34630](https://github.com/airbytehq/airbyte/pull/34630) | expose NamingTransformer to sub-classes in destinations JdbcSqlGenerator. | +| 0.16.1 | 2024-01-29 | [\#34533](https://github.com/airbytehq/airbyte/pull/34533) | Add a safe method to execute DatabaseMetadata's Resultset returning queries. | +| 0.16.0 | 2024-01-26 | [\#34573](https://github.com/airbytehq/airbyte/pull/34573) | Untangle Debezium harness dependencies. | +| 0.15.2 | 2024-01-25 | [\#34441](https://github.com/airbytehq/airbyte/pull/34441) | Improve airbyte-api build performance. | +| 0.15.1 | 2024-01-25 | [\#34451](https://github.com/airbytehq/airbyte/pull/34451) | Async destinations: Better logging when we fail to parse an AirbyteMessage | +| 0.15.0 | 2024-01-23 | [\#34441](https://github.com/airbytehq/airbyte/pull/34441) | Removed connector registry and micronaut dependencies. | +| 0.14.2 | 2024-01-24 | [\#34458](https://github.com/airbytehq/airbyte/pull/34458) | Handle case-sensitivity in sentry error grouping | +| 0.14.1 | 2024-01-24 | [\#34468](https://github.com/airbytehq/airbyte/pull/34468) | Add wait for process to be done before ending sync in destination BaseTDTest | +| 0.14.0 | 2024-01-23 | [\#34461](https://github.com/airbytehq/airbyte/pull/34461) | Revert non backward compatible signature changes from 0.13.1 | +| 0.13.3 | 2024-01-23 | [\#34077](https://github.com/airbytehq/airbyte/pull/34077) | Denote if destinations fully support Destinations V2 | +| 0.13.2 | 2024-01-18 | [\#34364](https://github.com/airbytehq/airbyte/pull/34364) | Better logging in mongo db source connector | +| 0.13.1 | 2024-01-18 | [\#34236](https://github.com/airbytehq/airbyte/pull/34236) | Add postCreateTable hook in destination JdbcSqlGenerator | +| 0.13.0 | 2024-01-16 | [\#34177](https://github.com/airbytehq/airbyte/pull/34177) | Add `useExpensiveSafeCasting` param in JdbcSqlGenerator methods; add JdbcTypingDedupingTest fixture; other DV2-related changes | +| 0.12.1 | 2024-01-11 | [\#34186](https://github.com/airbytehq/airbyte/pull/34186) | Add hook for additional destination specific checks to JDBC destination check method | +| 0.12.0 | 2024-01-10 | [\#33875](https://github.com/airbytehq/airbyte/pull/33875) | Upgrade sshd-mina to 2.11.1 | +| 0.11.5 | 2024-01-10 | [\#34119](https://github.com/airbytehq/airbyte/pull/34119) | Remove wal2json support for postgres+debezium. | +| 0.11.4 | 2024-01-09 | [\#33305](https://github.com/airbytehq/airbyte/pull/33305) | Source stats in incremental syncs | +| 0.11.3 | 2023-01-09 | [\#33658](https://github.com/airbytehq/airbyte/pull/33658) | Always fail when debezium fails, even if it happened during the setup phase. | +| 0.11.2 | 2024-01-09 | [\#33969](https://github.com/airbytehq/airbyte/pull/33969) | Destination state stats implementation | +| 0.11.1 | 2024-01-04 | [\#33727](https://github.com/airbytehq/airbyte/pull/33727) | SSH bastion heartbeats for Destinations | +| 0.11.0 | 2024-01-04 | [\#33730](https://github.com/airbytehq/airbyte/pull/33730) | DV2 T+D uses Sql struct to represent transactions; other T+D-related changes | +| 0.10.4 | 2023-12-20 | [\#33071](https://github.com/airbytehq/airbyte/pull/33071) | Add the ability to parse JDBC parameters with another delimiter than '&' | +| 0.10.3 | 2024-01-03 | [\#33312](https://github.com/airbytehq/airbyte/pull/33312) | Send out count in AirbyteStateMessage | +| 0.10.1 | 2023-12-21 | [\#33723](https://github.com/airbytehq/airbyte/pull/33723) | Make memory-manager log message less scary | +| 0.10.0 | 2023-12-20 | [\#33704](https://github.com/airbytehq/airbyte/pull/33704) | JdbcDestinationHandler now properly implements `getInitialRawTableState`; reenable SqlGenerator test | +| 0.9.0 | 2023-12-18 | [\#33124](https://github.com/airbytehq/airbyte/pull/33124) | Make Schema Creation Separate from Table Creation, exclude the T&D module from the CDK | +| 0.8.0 | 2023-12-18 | [\#33506](https://github.com/airbytehq/airbyte/pull/33506) | Improve async destination shutdown logic; more JDBC async migration work; improve DAT test schema handling | +| 0.7.9 | 2023-12-18 | [\#33549](https://github.com/airbytehq/airbyte/pull/33549) | Improve MongoDB logging. | +| 0.7.8 | 2023-12-18 | [\#33365](https://github.com/airbytehq/airbyte/pull/33365) | Emit stream statuses more consistently | +| 0.7.7 | 2023-12-18 | [\#33434](https://github.com/airbytehq/airbyte/pull/33307) | Remove LEGACY state | +| 0.7.6 | 2023-12-14 | [\#32328](https://github.com/airbytehq/airbyte/pull/33307) | Add schema less mode for mongodb CDC. Fixes for non standard mongodb id type. | +| 0.7.4 | 2023-12-13 | [\#33232](https://github.com/airbytehq/airbyte/pull/33232) | Track stream record count during sync; only run T+D if a stream had nonzero records or the previous sync left unprocessed records. | +| 0.7.3 | 2023-12-13 | [\#33369](https://github.com/airbytehq/airbyte/pull/33369) | Extract shared JDBC T+D code. | +| 0.7.2 | 2023-12-11 | [\#33307](https://github.com/airbytehq/airbyte/pull/33307) | Fix DV2 JDBC type mappings (code changes in [\#33307](https://github.com/airbytehq/airbyte/pull/33307)). | +| 0.7.1 | 2023-12-01 | [\#33027](https://github.com/airbytehq/airbyte/pull/33027) | Add the abstract DB source debugger. | +| 0.7.0 | 2023-12-07 | [\#32326](https://github.com/airbytehq/airbyte/pull/32326) | Destinations V2 changes for JDBC destinations | +| 0.6.4 | 2023-12-06 | [\#33082](https://github.com/airbytehq/airbyte/pull/33082) | Improvements to schema snapshot error handling + schema snapshot history scope (scoped to configured DB). | +| 0.6.2 | 2023-11-30 | [\#32573](https://github.com/airbytehq/airbyte/pull/32573) | Update MSSQLConverter to enforce 6-digit microsecond precision for timestamp fields | +| 0.6.1 | 2023-11-30 | [\#32610](https://github.com/airbytehq/airbyte/pull/32610) | Support DB initial sync using binary as primary key. | +| 0.6.0 | 2023-11-30 | [\#32888](https://github.com/airbytehq/airbyte/pull/32888) | JDBC destinations now use the async framework | +| 0.5.3 | 2023-11-28 | [\#32686](https://github.com/airbytehq/airbyte/pull/32686) | Better attribution of debezium engine shutdown due to heartbeat. | +| 0.5.1 | 2023-11-27 | [\#32662](https://github.com/airbytehq/airbyte/pull/32662) | Debezium initialization wait time will now read from initial setup time. | +| 0.5.0 | 2023-11-22 | [\#32656](https://github.com/airbytehq/airbyte/pull/32656) | Introduce TestDatabase test fixture, refactor database source test base classes. | +| 0.4.11 | 2023-11-14 | [\#32526](https://github.com/airbytehq/airbyte/pull/32526) | Clean up memory manager logs. | +| 0.4.10 | 2023-11-13 | [\#32285](https://github.com/airbytehq/airbyte/pull/32285) | Fix UUID codec ordering for MongoDB connector | +| 0.4.9 | 2023-11-13 | [\#32468](https://github.com/airbytehq/airbyte/pull/32468) | Further error grouping improvements for DV2 connectors | +| 0.4.8 | 2023-11-09 | [\#32377](https://github.com/airbytehq/airbyte/pull/32377) | source-postgres tests: skip dropping database | +| 0.4.7 | 2023-11-08 | [\#31856](https://github.com/airbytehq/airbyte/pull/31856) | source-postgres: support for infinity date and timestamps | +| 0.4.5 | 2023-11-07 | [\#32112](https://github.com/airbytehq/airbyte/pull/32112) | Async destinations framework: Allow configuring the queue flush threshold | +| 0.4.4 | 2023-11-06 | [\#32119](https://github.com/airbytehq/airbyte/pull/32119) | Add STANDARD UUID codec to MongoDB debezium handler | +| 0.4.2 | 2023-11-06 | [\#32190](https://github.com/airbytehq/airbyte/pull/32190) | Improve error deinterpolation | +| 0.4.1 | 2023-11-02 | [\#32192](https://github.com/airbytehq/airbyte/pull/32192) | Add 's3-destinations' CDK module. | +| 0.4.0 | 2023-11-02 | [\#32050](https://github.com/airbytehq/airbyte/pull/32050) | Fix compiler warnings. | +| 0.3.0 | 2023-11-02 | [\#31983](https://github.com/airbytehq/airbyte/pull/31983) | Add deinterpolation feature to AirbyteExceptionHandler. | +| 0.2.4 | 2023-10-31 | [\#31807](https://github.com/airbytehq/airbyte/pull/31807) | Handle case of debezium update and delete of records in mongodb. | +| 0.2.3 | 2023-10-31 | [\#32022](https://github.com/airbytehq/airbyte/pull/32022) | Update Debezium version from 2.20 -> 2.4.0. | +| 0.2.2 | 2023-10-31 | [\#31976](https://github.com/airbytehq/airbyte/pull/31976) | Debezium tweaks to make tests run faster. | +| 0.2.0 | 2023-10-30 | [\#31960](https://github.com/airbytehq/airbyte/pull/31960) | Hoist top-level gradle subprojects into CDK. | +| 0.1.12 | 2023-10-24 | [\#31674](https://github.com/airbytehq/airbyte/pull/31674) | Fail sync when Debezium does not shut down properly. | +| 0.1.11 | 2023-10-18 | [\#31486](https://github.com/airbytehq/airbyte/pull/31486) | Update constants in AdaptiveSourceRunner. | +| 0.1.9 | 2023-10-12 | [\#31309](https://github.com/airbytehq/airbyte/pull/31309) | Use toPlainString() when handling BigDecimals in PostgresConverter | +| 0.1.8 | 2023-10-11 | [\#31322](https://github.com/airbytehq/airbyte/pull/31322) | Cap log line length to 32KB to prevent loss of records | +| 0.1.7 | 2023-10-10 | [\#31194](https://github.com/airbytehq/airbyte/pull/31194) | Deallocate unused per stream buffer memory when empty | +| 0.1.6 | 2023-10-10 | [\#31083](https://github.com/airbytehq/airbyte/pull/31083) | Fix precision of numeric values in async destinations | +| 0.1.5 | 2023-10-09 | [\#31196](https://github.com/airbytehq/airbyte/pull/31196) | Update typo in CDK (CDN_LSN -> CDC_LSN) | +| 0.1.4 | 2023-10-06 | [\#31139](https://github.com/airbytehq/airbyte/pull/31139) | Reduce async buffer | +| 0.1.1 | 2023-09-28 | [\#30835](https://github.com/airbytehq/airbyte/pull/30835) | JDBC destinations now avoid staging area name collisions by using the raw table name as the stage name. (previously we used the stream name as the stage name) | +| 0.1.0 | 2023-09-27 | [\#30445](https://github.com/airbytehq/airbyte/pull/30445) | First launch, including shared classes for all connectors. | +| 0.0.2 | 2023-08-21 | [\#28687](https://github.com/airbytehq/airbyte/pull/28687) | Version bump only (no other changes). | +| 0.0.1 | 2023-08-08 | [\#28687](https://github.com/airbytehq/airbyte/pull/28687) | Initial release for testing. | diff --git a/airbyte-cdk/java/airbyte-cdk/core/src/main/kotlin/io/airbyte/cdk/integrations/destination/async/buffers/BufferManager.kt b/airbyte-cdk/java/airbyte-cdk/core/src/main/kotlin/io/airbyte/cdk/integrations/destination/async/buffers/BufferManager.kt index 522947b982eb8..888466d74af39 100644 --- a/airbyte-cdk/java/airbyte-cdk/core/src/main/kotlin/io/airbyte/cdk/integrations/destination/async/buffers/BufferManager.kt +++ b/airbyte-cdk/java/airbyte-cdk/core/src/main/kotlin/io/airbyte/cdk/integrations/destination/async/buffers/BufferManager.kt @@ -26,7 +26,7 @@ constructor( * This probably doesn't belong here, but it's the easiest place where both [BufferEnqueue] and * [io.airbyte.cdk.integrations.destination.async.AsyncStreamConsumer] can both get to it. */ - public val defaultNamespace: String?, + val defaultNamespace: String?, maxMemory: Long = (Runtime.getRuntime().maxMemory() * MEMORY_LIMIT_RATIO).toLong(), ) { @get:VisibleForTesting val buffers: ConcurrentMap diff --git a/airbyte-cdk/java/airbyte-cdk/core/src/main/resources/version.properties b/airbyte-cdk/java/airbyte-cdk/core/src/main/resources/version.properties index cd902125b15e4..cab0c1c2c8b3d 100644 --- a/airbyte-cdk/java/airbyte-cdk/core/src/main/resources/version.properties +++ b/airbyte-cdk/java/airbyte-cdk/core/src/main/resources/version.properties @@ -1 +1 @@ -version=0.44.9 +version=0.44.11 diff --git a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_coerced_schemaless_messages_out.txt b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_coerced_schemaless_messages_out.txt index 801b4960a4255..f49ff80bcf6f5 100644 --- a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_coerced_schemaless_messages_out.txt +++ b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_coerced_schemaless_messages_out.txt @@ -1,3 +1,3 @@ -{"schemaless_object":"{\"uuid\":\"38F52396-736D-4B23-B5B4-F504D8894B97\",\"probability\":1.5}","schematized_object":{"id":1,"name":"Joe"},"combined_type":"string1","union_type":10,"schemaless_array":"[10,\"foo\",null,{\"bar\":\"qua\"}]","mixed_array_integer_and_schemaless_object":[15,null,"{\"hello\":\"world\"}"],"array_of_union_integer_and_schemaless_array":[25,null,"[\"goodbye\",\"cruel world\"]"],"union_of_objects_with_properties_identical":{"id":10,"name":"Joe"},"union_of_objects_with_properties_overlapping":{"id":20,"name":"Jane","flagged":true},"union_of_objects_with_properties_nonoverlapping":{"id":30,"name":"Phil","flagged":false,"description":"Very Phil"}, "union_of_objects_with_properties_contradicting": { "id": 1, "name": "Jenny" }, "empty_object": "{}", "object_with_null_properties": "{}"} -{"schemaless_object":"{\"address\":{\"street\":\"113 Hickey Rd\",\"zip\":\"37932\"},\"flags\":[true,false,false]}","schematized_object":{"id":2,"name":"Jane"},"combined_type":20,"union_type":"string2","schemaless_array":"[]","mixed_array_integer_and_schemaless_object":[],"array_of_union_integer_and_schemaless_array":[],"union_of_objects_with_properties_identical":{"id":null,"name":null},"union_of_objects_with_properties_overlapping":{"id":null,"name":null,"flagged":null},"union_of_objects_with_properties_nonoverlapping":{"id":null,"name":null,"flagged":null,"description":null}, "union_of_objects_with_properties_contradicting": { "id": "seal-one-hippity", "name": "James" }, "empty_object": "{\"extra\":\"stuff\"}", "object_with_null_properties": "{\"more\":{\"extra\":\"stuff\"}}"} -{ "schemaless_object": null, "schematized_object": null, "combined_type": null, "union_type": null, "schemaless_array": null, "mixed_array_integer_and_schemaless_object": null, "array_of_union_integer_and_schemaless_array": null, "union_of_objects_with_properties_identical": null, "union_of_objects_with_properties_overlapping": null, "union_of_objects_with_properties_nonoverlapping": null, "union_of_objects_with_properties_contradicting":null, "empty_object": null, "object_with_null_properties": null } \ No newline at end of file +{"schemaless_object":"{\"uuid\":\"38F52396-736D-4B23-B5B4-F504D8894B97\",\"probability\":1.5}","schematized_object":{"id":1,"name":"Joe"},"combined_type":"string1","union_type":10,"schemaless_array":"[10,\"foo\",null,{\"bar\":\"qua\"}]","mixed_array_integer_and_schemaless_object":[15,null,"{\"hello\":\"world\"}"],"array_of_union_integer_and_schemaless_array":[25,null,"[\"goodbye\",\"cruel world\"]"],"union_of_objects_with_properties_identical":{"id":10,"name":"Joe"},"union_of_objects_with_properties_overlapping":{"id":20,"name":"Jane","flagged":true},"union_of_objects_with_properties_nonoverlapping":{"id":30,"name":"Phil","flagged":false,"description":"Very Phil"}, "union_of_objects_with_properties_contradicting": { "id": 1, "name": "Jenny" }, "empty_object": "{}", "object_with_null_properties": "{}", "combined_with_null": "foobar", "union_with_null": "barfoo", "combined_nulls": null} +{"schemaless_object":"{\"address\":{\"street\":\"113 Hickey Rd\",\"zip\":\"37932\"},\"flags\":[true,false,false]}","schematized_object":{"id":2,"name":"Jane"},"combined_type":20,"union_type":"string2","schemaless_array":"[]","mixed_array_integer_and_schemaless_object":[],"array_of_union_integer_and_schemaless_array":[],"union_of_objects_with_properties_identical":{"id":null,"name":null},"union_of_objects_with_properties_overlapping":{"id":null,"name":null,"flagged":null},"union_of_objects_with_properties_nonoverlapping":{"id":null,"name":null,"flagged":null,"description":null}, "union_of_objects_with_properties_contradicting": { "id": "seal-one-hippity", "name": "James" }, "empty_object": "{\"extra\":\"stuff\"}", "object_with_null_properties": "{\"more\":{\"extra\":\"stuff\"}}", "combined_with_null": "foobar2", "union_with_null": "barfoo2", "combined_nulls": null} +{ "schemaless_object": null, "schematized_object": null, "combined_type": null, "union_type": null, "schemaless_array": null, "mixed_array_integer_and_schemaless_object": null, "array_of_union_integer_and_schemaless_array": null, "union_of_objects_with_properties_identical": null, "union_of_objects_with_properties_overlapping": null, "union_of_objects_with_properties_nonoverlapping": null, "union_of_objects_with_properties_contradicting":null, "empty_object": null, "object_with_null_properties": null, "combined_with_null": null, "union_with_null": null, "combined_nulls": null } \ No newline at end of file diff --git a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_coerced_schemaless_schema.json b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_coerced_schemaless_schema.json index 6ec2df708f1a5..3b7e2c79d07a5 100644 --- a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_coerced_schemaless_schema.json +++ b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_coerced_schemaless_schema.json @@ -138,6 +138,22 @@ }, "object_with_null_properties": { "type": "string" + }, + "combined_with_null": { + "type": ["string", "null"] + }, + "union_with_null": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ] + }, + "combined_nulls": { + "type": "null" } } } diff --git a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_configured_catalog.json b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_configured_catalog.json index 8a026792da0ef..1683d953b90c0 100644 --- a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_configured_catalog.json +++ b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_configured_catalog.json @@ -173,6 +173,22 @@ "object_with_null_properties": { "type": "object", "properties": null + }, + "combined_with_null": { + "type": ["string", "null"] + }, + "union_with_null": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ] + }, + "combined_nulls": { + "type": ["null", "null"] } } } diff --git a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_disjoint_union_messages_out.txt b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_disjoint_union_messages_out.txt index 2b9bc04a68077..426c85ad5b845 100644 --- a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_disjoint_union_messages_out.txt +++ b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_disjoint_union_messages_out.txt @@ -1,3 +1,3 @@ -{"schemaless_object":"{\"uuid\":\"38F52396-736D-4B23-B5B4-F504D8894B97\",\"probability\":1.5}","schematized_object":{"id":1,"name":"Joe"},"combined_type":{"type":"string","string":"string1","integer":null},"union_type":{"type":"integer","string":null,"integer":10},"schemaless_array":"[10,\"foo\",null,{\"bar\":\"qua\"}]","mixed_array_integer_and_schemaless_object":[15,null,"{\"hello\":\"world\"}"],"array_of_union_integer_and_schemaless_array":[{"type":"integer","integer":25,"string":null},null,{"type":"string","integer":null,"string":"[\"goodbye\",\"cruel world\"]"}],"union_of_objects_with_properties_identical":{"id":10,"name":"Joe"},"union_of_objects_with_properties_overlapping":{"id":20,"name":"Jane","flagged":true},"union_of_objects_with_properties_nonoverlapping":{"id":30,"name":"Phil","flagged":false,"description":"Very Phil"}, "union_of_objects_with_properties_contradicting": { "id": {"type":"integer","integer":1,"string":null}, "name": "Jenny" }, "empty_object": "{}", "object_with_null_properties": "{}"}} -{"schemaless_object":"{\"address\":{\"street\":\"113 Hickey Rd\",\"zip\":\"37932\"},\"flags\":[true,false,false]}","schematized_object":{"id":2,"name":"Jane"},"combined_type":{"type":"integer","string":null,"integer":20},"union_type":{"type":"string","string":"string2","integer":null},"schemaless_array":"[]","mixed_array_integer_and_schemaless_object":[],"array_of_union_integer_and_schemaless_array":[],"union_of_objects_with_properties_identical":{"id":null,"name":null},"union_of_objects_with_properties_overlapping":{"id":null,"name":null,"flagged":null},"union_of_objects_with_properties_nonoverlapping":{"id":null,"name":null,"flagged":null,"description":null}, "union_of_objects_with_properties_contradicting": { "id": {"type":"string","integer":null,"string":"seal-one-hippity"}, "name": "James" }, "empty_object": "{\"extra\":\"stuff\"}", "object_with_null_properties": "{\"more\":{\"extra\":\"stuff\"}}"} -{ "schemaless_object": null, "schematized_object": null, "combined_type": null, "union_type": null, "schemaless_array": null, "mixed_array_integer_and_schemaless_object": null, "array_of_union_integer_and_schemaless_array": null, "union_of_objects_with_properties_identical": null, "union_of_objects_with_properties_overlapping": null, "union_of_objects_with_properties_nonoverlapping": null, "union_of_objects_with_properties_contradicting": null, "empty_object": null, "object_with_null_properties": null } \ No newline at end of file +{"schemaless_object":"{\"uuid\":\"38F52396-736D-4B23-B5B4-F504D8894B97\",\"probability\":1.5}","schematized_object":{"id":1,"name":"Joe"},"combined_type":{"type":"string","string":"string1","integer":null},"union_type":{"type":"integer","string":null,"integer":10},"schemaless_array":"[10,\"foo\",null,{\"bar\":\"qua\"}]","mixed_array_integer_and_schemaless_object":[15,null,"{\"hello\":\"world\"}"],"array_of_union_integer_and_schemaless_array":[{"type":"integer","integer":25,"string":null},null,{"type":"string","integer":null,"string":"[\"goodbye\",\"cruel world\"]"}],"union_of_objects_with_properties_identical":{"id":10,"name":"Joe"},"union_of_objects_with_properties_overlapping":{"id":20,"name":"Jane","flagged":true},"union_of_objects_with_properties_nonoverlapping":{"id":30,"name":"Phil","flagged":false,"description":"Very Phil"}, "union_of_objects_with_properties_contradicting": { "id": {"type":"integer","integer":1,"string":null}, "name": "Jenny" }, "empty_object": "{}","object_with_null_properties": "{}", "combined_with_null": "foobar", "union_with_null":"barfoo", "combined_nulls": null }} +{"schemaless_object":"{\"address\":{\"street\":\"113 Hickey Rd\",\"zip\":\"37932\"},\"flags\":[true,false,false]}","schematized_object":{"id":2,"name":"Jane"},"combined_type":{"type":"integer","string":null,"integer":20},"union_type":{"type":"string","string":"string2","integer":null},"schemaless_array":"[]","mixed_array_integer_and_schemaless_object":[],"array_of_union_integer_and_schemaless_array":[],"union_of_objects_with_properties_identical":{"id":null,"name":null},"union_of_objects_with_properties_overlapping":{"id":null,"name":null,"flagged":null},"union_of_objects_with_properties_nonoverlapping":{"id":null,"name":null,"flagged":null,"description":null}, "union_of_objects_with_properties_contradicting": { "id": {"type":"string","integer":null,"string":"seal-one-hippity"}, "name": "James" }, "empty_object": "{\"extra\":\"stuff\"}", "object_with_null_properties": "{\"more\":{\"extra\":\"stuff\"}}", "combined_with_null": "foobar2", "union_with_null": "barfoo2", "combined_nulls": null} +{ "schemaless_object": null, "schematized_object": null, "combined_type": null, "union_type": null, "schemaless_array": null, "mixed_array_integer_and_schemaless_object": null, "array_of_union_integer_and_schemaless_array": null, "union_of_objects_with_properties_identical": null, "union_of_objects_with_properties_overlapping": null, "union_of_objects_with_properties_nonoverlapping": null, "union_of_objects_with_properties_contradicting": null, "empty_object": null, "object_with_null_properties": null, "combined_with_null": null, "union_with_null": null, "combined_nulls": null } \ No newline at end of file diff --git a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_disjoint_union_schema.json b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_disjoint_union_schema.json index c3598c1471650..a17f29bdc3d38 100644 --- a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_disjoint_union_schema.json +++ b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_disjoint_union_schema.json @@ -147,6 +147,22 @@ }, "object_with_null_properties": { "type": "string" + }, + "combined_with_null": { + "type": ["string", "null"] + }, + "union_with_null": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ] + }, + "combined_nulls": { + "type": "null" } } } diff --git a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_messages_in.txt b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_messages_in.txt index 8047dc18d3a47..145cf7dfe7937 100644 --- a/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_messages_in.txt +++ b/airbyte-cdk/java/airbyte-cdk/db-destinations/src/testFixtures/resources/v0/problematic_types_messages_in.txt @@ -1,5 +1,5 @@ -{"type": "RECORD", "record": {"stream": "problematic_types", "emitted_at": 1602637589100, "data": { "schemaless_object": { "uuid": "38F52396-736D-4B23-B5B4-F504D8894B97", "probability": 1.5 }, "schematized_object": { "id": 1, "name": "Joe" }, "combined_type": "string1", "union_type": 10, "schemaless_array": [ 10, "foo", null, { "bar": "qua" } ], "mixed_array_integer_and_schemaless_object": [ 15, null, { "hello": "world" } ], "array_of_union_integer_and_schemaless_array": [ 25, null, ["goodbye", "cruel world"] ], "union_of_objects_with_properties_identical": { "id": 10, "name": "Joe" }, "union_of_objects_with_properties_overlapping": { "id": 20, "name": "Jane", "flagged": true }, "union_of_objects_with_properties_contradicting": { "id": 1, "name": "Jenny" }, "union_of_objects_with_properties_nonoverlapping": { "id": 30, "name": "Phil", "flagged": false, "description":"Very Phil" }, "empty_object": {},"object_with_null_properties": {} } } } -{"type": "RECORD", "record": {"stream": "problematic_types", "emitted_at": 1602637589200, "data": { "schemaless_object": { "address": { "street": "113 Hickey Rd", "zip": "37932" }, "flags": [ true, false, false ] }, "schematized_object": { "id": 2, "name": "Jane" }, "combined_type": 20, "union_type": "string2", "schemaless_array": [], "mixed_array_integer_and_schemaless_object": [ ], "array_of_union_integer_and_schemaless_array": [ ], "union_of_objects_with_properties_identical": { }, "union_of_objects_with_properties_overlapping": {}, "union_of_objects_with_properties_nonoverlapping": {}, "union_of_objects_with_properties_contradicting": { "id": "seal-one-hippity", "name": "James" }, "empty_object": {"extra": "stuff"}, "object_with_null_properties": { "more": { "extra": "stuff" } } } } } -{"type": "RECORD", "record": {"stream": "problematic_types", "emitted_at": 1602637589300, "data": { "schemaless_object": null, "schematized_object": null, "combined_type": null, "union_type": null, "schemaless_array": null, "mixed_array_integer_and_schemaless_object": null, "array_of_union_integer_and_schemaless_array": null, "union_of_objects_with_properties_identical": null, "union_of_objects_with_properties_overlapping": null, "union_of_objects_with_properties_nonoverlapping": null, "empty_object": null, "object_with_null_properties": null } } } +{"type": "RECORD", "record": {"stream": "problematic_types", "emitted_at": 1602637589100, "data": { "schemaless_object": { "uuid": "38F52396-736D-4B23-B5B4-F504D8894B97", "probability": 1.5 }, "schematized_object": { "id": 1, "name": "Joe" }, "combined_type": "string1", "union_type": 10, "schemaless_array": [ 10, "foo", null, { "bar": "qua" } ], "mixed_array_integer_and_schemaless_object": [ 15, null, { "hello": "world" } ], "array_of_union_integer_and_schemaless_array": [ 25, null, ["goodbye", "cruel world"] ], "union_of_objects_with_properties_identical": { "id": 10, "name": "Joe" }, "union_of_objects_with_properties_overlapping": { "id": 20, "name": "Jane", "flagged": true }, "union_of_objects_with_properties_contradicting": { "id": 1, "name": "Jenny" }, "union_of_objects_with_properties_nonoverlapping": { "id": 30, "name": "Phil", "flagged": false, "description":"Very Phil" }, "empty_object": {},"object_with_null_properties": {}, "combined_with_null": "foobar", "union_with_null": "barfoo", "combined_nulls": null } } } +{"type": "RECORD", "record": {"stream": "problematic_types", "emitted_at": 1602637589200, "data": { "schemaless_object": { "address": { "street": "113 Hickey Rd", "zip": "37932" }, "flags": [ true, false, false ] }, "schematized_object": { "id": 2, "name": "Jane" }, "combined_type": 20, "union_type": "string2", "schemaless_array": [], "mixed_array_integer_and_schemaless_object": [ ], "array_of_union_integer_and_schemaless_array": [ ], "union_of_objects_with_properties_identical": { }, "union_of_objects_with_properties_overlapping": {}, "union_of_objects_with_properties_nonoverlapping": {}, "union_of_objects_with_properties_contradicting": { "id": "seal-one-hippity", "name": "James" }, "empty_object": {"extra": "stuff"}, "object_with_null_properties": { "more": { "extra": "stuff" } }, "combined_with_null": "foobar2", "union_with_null": "barfoo2", "combined_nulls": null } } } +{"type": "RECORD", "record": {"stream": "problematic_types", "emitted_at": 1602637589300, "data": { "schemaless_object": null, "schematized_object": null, "combined_type": null, "union_type": null, "schemaless_array": null, "mixed_array_integer_and_schemaless_object": null, "array_of_union_integer_and_schemaless_array": null, "union_of_objects_with_properties_identical": null, "union_of_objects_with_properties_overlapping": null, "union_of_objects_with_properties_nonoverlapping": null, "empty_object": null, "object_with_null_properties": null, "combined_with_null": null, "union_with_null": null, "combined_nulls": null } } } {"type": "STATE", "state": { "data": {"start_date": "2022-02-14"}}} {"type": "TRACE", "trace": { "type": "STREAM_STATUS", "stream_status": {"stream_descriptor": {"name": "problematic_types"}, "status": "COMPLETE"}, "emitted_at": 1721428636000}} \ No newline at end of file diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/S3ConsumerFactory.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/S3ConsumerFactory.kt index 435efd8d0eec0..46fc3ce7d0f56 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/S3ConsumerFactory.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/S3ConsumerFactory.kt @@ -184,6 +184,16 @@ class S3ConsumerFactory { }, useV2FieldNames = true ) + + // Parquet has significantly higher overhead. This small adjustment + // results in a ~5x performance improvement. + val adjustedMemoryRatio = + if (s3Config.formatConfig!!.format == FileUploadFormat.PARQUET) { + memoryRatio * 0.6 // ie 0.5 => 0.3 + } else { + memoryRatio + } + return AsyncStreamConsumer( outputRecordCollector, onStartFunction(storageOps, writeConfigs), @@ -209,7 +219,7 @@ class S3ConsumerFactory { // is simply omitted from the path. BufferManager( defaultNamespace = null, - maxMemory = (Runtime.getRuntime().maxMemory() * memoryRatio).toLong() + maxMemory = (Runtime.getRuntime().maxMemory() * adjustedMemoryRatio).toLong() ), workerPool = Executors.newFixedThreadPool(nThreads) ) diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/avro/JsonToAvroSchemaConverter.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/avro/JsonToAvroSchemaConverter.kt index 6d7e92b99c043..b9e68bdea0877 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/avro/JsonToAvroSchemaConverter.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/avro/JsonToAvroSchemaConverter.kt @@ -184,7 +184,7 @@ class JsonToAvroSchemaConverter { addStringToLogicalTypes, ) val nextStep = fieldBuilder.type(parsed) - if (parsed.isUnion) { + if (parsed.isUnion || parsed == NULL_SCHEMA) { nextStep.withDefault(null) } else { nextStep.noDefault() diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/AirbyteJsonSchemaType.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/AirbyteJsonSchemaType.kt index a48346835b1eb..3721de16a0e1b 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/AirbyteJsonSchemaType.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/AirbyteJsonSchemaType.kt @@ -10,6 +10,7 @@ import com.fasterxml.jackson.databind.node.ObjectNode import io.airbyte.commons.jackson.MoreMappers enum class AirbyteJsonSchemaType { + NULL, BOOLEAN, INTEGER, NUMBER, @@ -30,6 +31,7 @@ enum class AirbyteJsonSchemaType { fun matchesValue(tree: JsonNode): Boolean { return when (this) { + NULL -> tree.isNull BOOLEAN -> tree.isBoolean INTEGER -> tree.isIntegralNumber || tree.isInt || tree.isBigInteger NUMBER -> @@ -97,6 +99,7 @@ enum class AirbyteJsonSchemaType { val airbyteType = schema["airbyte_type"]?.asText() return when (typeStr) { + "null" -> NULL "boolean" -> BOOLEAN "integer" -> INTEGER "number" -> { diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonRecordIdentityMapper.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonRecordIdentityMapper.kt index bc7e061179d51..5df551d7d840f 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonRecordIdentityMapper.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonRecordIdentityMapper.kt @@ -11,6 +11,10 @@ import com.fasterxml.jackson.databind.node.ObjectNode import io.airbyte.commons.jackson.MoreMappers open class JsonRecordIdentityMapper : JsonRecordMapper() { + override fun mapNull(record: JsonNode?, schema: ObjectNode): JsonNode? { + return record?.deepCopy() + } + override fun mapBoolean(record: JsonNode?, schema: ObjectNode): JsonNode? { return record?.deepCopy() } diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonRecordMapper.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonRecordMapper.kt index d5f6b2e4594c1..52450eca16b4a 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonRecordMapper.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonRecordMapper.kt @@ -12,6 +12,7 @@ abstract class JsonRecordMapper { val schemaType = AirbyteJsonSchemaType.fromJsonSchema(schema) return when (schemaType) { + AirbyteJsonSchemaType.NULL -> mapNull(record, schema) AirbyteJsonSchemaType.BOOLEAN -> mapBoolean(record, schema) AirbyteJsonSchemaType.INTEGER -> mapInteger(record, schema) AirbyteJsonSchemaType.NUMBER -> mapNumber(record, schema) @@ -34,6 +35,7 @@ abstract class JsonRecordMapper { } } + abstract fun mapNull(record: JsonNode?, schema: ObjectNode): R abstract fun mapBoolean(record: JsonNode?, schema: ObjectNode): R abstract fun mapInteger(record: JsonNode?, schema: ObjectNode): R abstract fun mapNumber(record: JsonNode?, schema: ObjectNode): R diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaIdentityMapper.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaIdentityMapper.kt index 518597ad8db46..1695a61f507a4 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaIdentityMapper.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaIdentityMapper.kt @@ -9,6 +9,10 @@ import io.airbyte.commons.jackson.MoreMappers open class JsonSchemaIdentityMapper : JsonSchemaMapper() { + override fun mapNull(schema: ObjectNode): ObjectNode { + return schema.deepCopy() + } + override fun mapObjectWithProperties(schema: ObjectNode): ObjectNode { val newSchema = MoreMappers.initMapper().createObjectNode() val newProperties = MoreMappers.initMapper().createObjectNode() diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaMapper.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaMapper.kt index 773372dbd7b53..f0e7caf91e1b3 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaMapper.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaMapper.kt @@ -11,6 +11,7 @@ abstract class JsonSchemaMapper { val schemaType = AirbyteJsonSchemaType.fromJsonSchema(schema) return when (schemaType) { + AirbyteJsonSchemaType.NULL -> mapNull(schema) AirbyteJsonSchemaType.OBJECT_WITH_PROPERTIES -> mapObjectWithProperties(schema) AirbyteJsonSchemaType.OBJECT_WITHOUT_PROPERTIES -> mapObjectWithoutProperties(schema) AirbyteJsonSchemaType.ARRAY_WITH_ITEMS -> mapArrayWithItems(schema) @@ -31,6 +32,7 @@ abstract class JsonSchemaMapper { } } + abstract fun mapNull(schema: ObjectNode): ObjectNode abstract fun mapObjectWithProperties(schema: ObjectNode): ObjectNode abstract fun mapObjectWithoutProperties(schema: ObjectNode): ObjectNode abstract fun mapArrayWithItems(schema: ObjectNode): ObjectNode diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaUnionMerger.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaUnionMerger.kt index 5eaf8272ea94b..b8f918b553ea5 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaUnionMerger.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/jsonschema/JsonSchemaUnionMerger.kt @@ -72,6 +72,15 @@ class JsonSchemaUnionMerger : JsonSchemaIdentityMapper() { val options = schema["oneOf"] ?: schema["anyOf"] ?: schema["allOf"] for (oldOption in options) { val remappedOldOption = mapSchema(oldOption as ObjectNode) + + // Drop null types from the union. + if ( + AirbyteJsonSchemaType.fromJsonSchema(remappedOldOption) == + AirbyteJsonSchemaType.NULL + ) { + continue + } + if (seenSet.contains(remappedOldOption)) { continue } @@ -94,10 +103,21 @@ class JsonSchemaUnionMerger : JsonSchemaIdentityMapper() { // Special case: only one option remains: this is no longer a union if (newOptions.size() == 1) { return newOptions[0] as ObjectNode + } else if (newOptions.size() == 0) { + // If there are no options, it's because they were all nulls + // Which probably shouldn't happen. + val nullSchema = MoreMappers.initMapper().createObjectNode() + nullSchema.put("type", "null") + return nullSchema } newSchema.replace("oneOf", newOptions) return newSchema } + + override fun mapCombined(schema: ObjectNode): ObjectNode { + val toUnion = super.mapCombined(schema) + return mapUnion(toUnion) + } } diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/parquet/JsonSchemaParquetPreprocessor.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/parquet/JsonSchemaParquetPreprocessor.kt index 3bfa9da33b0d7..3ac2b3137552f 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/parquet/JsonSchemaParquetPreprocessor.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/parquet/JsonSchemaParquetPreprocessor.kt @@ -14,6 +14,10 @@ class JsonSchemaParquetPreprocessor : JsonSchemaIdentityMapper() { companion object { fun typeFieldName(schema: ObjectNode): String { return when (AirbyteJsonSchemaType.fromJsonSchema(schema)) { + AirbyteJsonSchemaType.NULL -> + throw IllegalStateException( + "Null typed fields in disjoint unions not supported" + ) AirbyteJsonSchemaType.BOOLEAN -> "boolean" AirbyteJsonSchemaType.INTEGER -> "integer" AirbyteJsonSchemaType.NUMBER -> "number" diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/kotlin/io/airbyte/cdk/integrations/destination/s3/avro/JsonSchemaTransformerTest.kt b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/kotlin/io/airbyte/cdk/integrations/destination/s3/avro/JsonSchemaTransformerTest.kt index 0344a5e6ce137..db56fbbcc5d98 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/kotlin/io/airbyte/cdk/integrations/destination/s3/avro/JsonSchemaTransformerTest.kt +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/kotlin/io/airbyte/cdk/integrations/destination/s3/avro/JsonSchemaTransformerTest.kt @@ -13,15 +13,19 @@ import org.junit.jupiter.api.Assertions import org.junit.jupiter.api.Test class JsonSchemaTransformerTest { - private fun mangleAltCombined(node: ObjectNode) { + private fun mangleAltCombined( + node: ObjectNode, + type1: String = "integer", + type2: String = "string" + ) { val oneOf = MoreMappers.initMapper().createArrayNode() val option1 = MoreMappers.initMapper().createObjectNode() - option1.put("type", "integer") + option1.put("type", type1) oneOf.add(option1) val option2 = MoreMappers.initMapper().createObjectNode() - option2.put("type", "string") + option2.put("type", type2) oneOf.add(option2) node.remove("type") @@ -41,6 +45,9 @@ class JsonSchemaTransformerTest { // Assert transformedSchema is equal to jsonSchema, accounting for a little normalization transformedSchema.remove("type") mangleAltCombined(jsonSchema["properties"]["combined_type_alt"] as ObjectNode) + mangleAltCombined(jsonSchema["properties"]["combined_null_string"] as ObjectNode, "null") + mangleAltCombined(jsonSchema["properties"]["redundant_null"] as ObjectNode, "null", "null") + Assertions.assertEquals(jsonSchema, transformedSchema) } @@ -122,4 +129,18 @@ class JsonSchemaTransformerTest { val transformedSchema = JsonSchemaUnionMerger().mapSchema(inputSchema) Assertions.assertEquals(outputSchema, transformedSchema) } + + @Test + fun testMergingNulls() { + val inputSchemaStr = javaClass.getResource("/avro/complex_schema.json")?.readText() + val inputSchema = MoreMappers.initMapper().readTree(inputSchemaStr) as ObjectNode + val merged = JsonSchemaUnionMerger().mapSchema(inputSchema) + + val properties = merged["properties"] as ObjectNode + val nullType = MoreMappers.initMapper().createObjectNode().put("type", "null") + val stringType = MoreMappers.initMapper().createObjectNode().put("type", "string") + Assertions.assertEquals(properties["null_type"], nullType) + Assertions.assertEquals(properties["redundant_null"], nullType) + Assertions.assertEquals(properties["combined_null_string"], stringType) + } } diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/resources/avro/complex_schema.json b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/resources/avro/complex_schema.json index eba70dd550441..9ca9a4c6f694e 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/resources/avro/complex_schema.json +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/resources/avro/complex_schema.json @@ -1,5 +1,14 @@ { "properties": { + "null_type": { + "type": "null" + }, + "combined_null_string": { + "type": ["null", "string"] + }, + "redundant_null": { + "type": ["null", "null"] + }, "integer_type": { "type": "integer" }, diff --git a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/resources/parquet/json_schema_converter/type_conversion_test_cases_v1.json b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/resources/parquet/json_schema_converter/type_conversion_test_cases_v1.json index 2ded36eb9ee0b..a9825e7cc5bc0 100644 --- a/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/resources/parquet/json_schema_converter/type_conversion_test_cases_v1.json +++ b/airbyte-cdk/java/airbyte-cdk/s3-destinations/src/test/resources/parquet/json_schema_converter/type_conversion_test_cases_v1.json @@ -204,7 +204,11 @@ "jsonFieldSchema": { "$ref": "WellKnownTypes.json#/definitions/TimeWithTimezone" }, - "avroFieldType": ["null", {"type":"long","logicalType":"time-micros"}, "string"] + "avroFieldType": [ + "null", + { "type": "long", "logicalType": "time-micros" }, + "string" + ] }, { "fieldName": "array_field_without_items", diff --git a/airbyte-integrations/connectors/destination-s3/build.gradle b/airbyte-integrations/connectors/destination-s3/build.gradle index 76f4db91107af..49fa99cf82ce0 100644 --- a/airbyte-integrations/connectors/destination-s3/build.gradle +++ b/airbyte-integrations/connectors/destination-s3/build.gradle @@ -4,23 +4,19 @@ plugins { } airbyteJavaConnector { - cdkVersionRequired = '0.44.9' + cdkVersionRequired = '0.44.11' features = ['db-destinations', 's3-destinations'] - useLocalCdk = false // TODO: Version CDK, bump required version, and set this to false + useLocalCdk = false } airbyteJavaConnector.addCdkDependencies() application { mainClass = 'io.airbyte.integrations.destination.s3.S3DestinationRunner' - applicationDefaultJvmArgs = ['-XX:+ExitOnOutOfMemoryError', '-XX:MaxRAMPercentage=75.0', '--add-opens', 'java.base/java.lang=ALL-UNNAMED'] + applicationDefaultJvmArgs = ['-XX:+ExitOnOutOfMemoryError', '-XX:MaxRAMPercentage=75.0'] } -run { - standardInput = System.in -} - dependencies { // csv diff --git a/airbyte-integrations/connectors/destination-s3/metadata.yaml b/airbyte-integrations/connectors/destination-s3/metadata.yaml index 1fc2ba920c5c2..60b1beb6c93b8 100644 --- a/airbyte-integrations/connectors/destination-s3/metadata.yaml +++ b/airbyte-integrations/connectors/destination-s3/metadata.yaml @@ -2,7 +2,7 @@ data: connectorSubtype: file connectorType: destination definitionId: 4816b78f-1489-44c1-9060-4b19d5fa9362 - dockerImageTag: 0.6.7 + dockerImageTag: 1.0.0 dockerRepository: airbyte/destination-s3 githubIssueLabel: destination-s3 icon: s3.svg @@ -14,12 +14,18 @@ data: oss: enabled: true releaseStage: generally_available + releases: + breakingChanges: + 1.0.0: + message: > + **This release includes breaking changes, including major revisions to the schema of stored data. Do not upgrade without reviewing the migration guide.** + upgradeDeadline: "2024-10-08" resourceRequirements: jobSpecific: - jobType: sync resourceRequirements: - memory_limit: 1Gi - memory_request: 1Gi + memory_limit: 2Gi + memory_request: 2Gi documentationUrl: https://docs.airbyte.com/integrations/destinations/s3 tags: - language:java diff --git a/docs/integrations/destinations/s3-migrations.md b/docs/integrations/destinations/s3-migrations.md new file mode 100644 index 0000000000000..d6529c6f8a2b8 --- /dev/null +++ b/docs/integrations/destinations/s3-migrations.md @@ -0,0 +1,209 @@ +# S3 Migration Guide + +## Upgrading to 1.0.0 + +This version introduces changes to the schema of data written to S3, which make it isomorphic to our [V2 certified database destinations](https://docs.airbyte.com/release_notes/upgrading_to_destinations_v2/#what-is-destinations-v2), as well as various improvements to format conversion. + +* Conversion failures are captured: values not matching the client schema will no longer break syncs using Avro or Parquet formats. +* Changes introduced by the source or platform, such as NULLing bad values or truncating long ones, are visible in the metadata. +* There is improved handling of various types in Avro and Parquet formats, including simplification of date and time types, more human-readable schemaless objects and arrays, and better support for unions in Parquet. +* Sync and generation ids are made available, providing more information for debugging. + +## Schema Changes + +The schema changes are as follows: + +| Old field name | New field name | JSON Type | Avro Type | Parquet Type | Description | +|-----------------------|--------------------------|--------------------|-----------------------------------------------------------|----------------------------------------------|--------------------------------------------------------------------------------------------------------| +| `_airbyte_ab_id` | `_airbyte_raw_id` | string (UUID) | `{ "type": "string", "logicalType": "uuid" }` | `String` | Airbyte-added unique identifier. | +| `_airbyte_emitted_at` | `_airbyte_extracted_at` | integer (epoch ms) | `{ "type": "string", "logicalType": "timestamp-millis" }` | `Int64(logicalType=Timestamp(Milliseconds))` | Time at which the data was extracted from the source. | +| [NONEXISTENT] | `_airbyte_generation_id` | integer | `{ "type": "long" }` | `Int64` | Monotonically-increasing refresh id, if applicable. | +| [NONEXISTENT] | `_airbyte_meta` | object (see below) | Record (see below) | Record (see below) | Additional metadata, including change data capture info and sync id. | +| `_airbyte_data` | [UNCHANGED] | optional object | (from client schema) | (from client schema) | Data payload when [flattening is disabled](https://docs.airbyte.com/integrations/destinations/s3#csv). | + +The `_airbyte_meta` field is an object that currently has one field: + +| Field Name | JSON Type | Description | +|----------------|-----------|--------------------------------------------------------------------------------------------| + +| `changes` | list | A record of any changes Airbyte made to the data for compatibility and/or to handle errors | +| `sync_id` | integer | Monotonically-increasing integer representing the current sync | + +The `changes` field is a list of objects, each of which represents a change to the data. Each object has the following fields: + +| Field Name | JSON Type* | Description | +|------------|------------|------------------------------------------------------------------------------------| +| `field` | string | The name of the field that was affected | +| `change` | string | The type of change (currently only `NULLED` or `TRUNCATED`) | +| `reason` | string | The reason for the change, including its origin (source, platform, or destination) | + +These schemas are subject to change, however any change that is not backward compatible (ie, additive) will be accompanied by a breaking change notice. + +## Data Changes + +This version introduces changes to the data types used when writing to S3 in Avro or Parquet. The changes are as follows: + +### Primitive Types + +All primitive types are unchanged: + +| Airbyte JSON Schema Type | Old Avro Type | Old Parquet Type | New Avro Type | New Parquet Type | +|--------------------------|--------------------------|-----------------------|-----------------|------------------| +| string | `["null", "string"]` | `Optional String` | [UNCHANGED] | [UNCHANGED] | +| boolean | `["null", "boolean"]` | `Optional Boolean` | [UNCHANGED] | [UNCHANGED] | +| integer | `["null", "long"]` | `Optional Int64` | [UNCHANGED] | [UNCHANGED] | +| number | `["null", "double"]` | `Optional Double` | [UNCHANGED] | [UNCHANGED] | + +### Date and Time Types + +This change introduces [simplification of the handling of dates, times, and timestamps in Parquet and Avro](https://github.com/airbytehq/airbyte-internal-issues/issues/8973). Date and time types were stored in Avro as unions of integral logical types and strings, which in Parquet were represented as disjoint records. In practice, the date and time types were always converted, making the unions redundant. The resulting disjoint records were confusing. + +Now all time types are converted to integral logical types. Values that cannot be converted will be nulled and tracked in `_airbyte_meta.changes[]`. + +#### Resulting Avro Changes + +Avro users should only see changes at the schema level, the resulting data will appear the same. + +| Airbyte Time Type | Old Avro Type | New Avro Type | +|------------------------------|-----------------------------------------------------------------------------|------------------------------------------------| +| date | `["null", { "type": "int", "logicalType": "date" }, "string"]` | `[null, int(logicalType="date")]` | +| time without timezone | `["null", { "type": "long", "logicalType": "time-micros" }, "string"]` | `[null, long(logicalType="time-micros")]` | +| time with timezone | `["null", { "type": "long", "logicalType": "time-micros" }, "string"]` | `[null, long(logicalType="time-micros")]` | +| timestamp without timezone | `["null", { "type": "long", "logicalType": "timestamp-micros" }, "string"]` | `[null, long(logicalType="timestamp-micros")]` | +| timestamp with timezone | `["null", { "type": "long", "logicalType": "timestamp-micros" }, "string"]` | `[null, long(logicalType="timestamp-micros")]` | + +#### Resulting Parquet Changes + +| Airbyte Time Type | Old Parquet Type | New Parquet Type | +|-------------------------------|---------------------------------------------------------------------------------------------------------------|--------------------------------------------------------| +| date | `Optional Record { member0: Optional Int32(logical_type=Date), member1: Optional String }` | `Optional Int32(logical_type=Date)` | +| time without timezone | `Optional Record { member0: Optional Int64(logical_type=Time(Microseconds)), member1: Optional String }` | `Optional Int64(logical_type=Time(Microseconds))` | +| time with timezone | `Optional Record { member0: Optional Int64(logical_type=Time(Microseconds)), member1: Optional String }` | `Optional Int64(logical_type=Time(Microseconds))` | +| timestamp without timezone | `Optional Record { member0: Optional Int64(logical_type=Timestamp(Microseconds)), member1: Optional String }` | `Optional Int64(logical_type=Timestamp(Microseconds))` | +| timestamp with timezone | `Optional Record { member0: Optional Int64(logical_type=Timestamp(Microseconds)), member1: Optional String }` | `Optional Int64(logical_type=Timestamp(Microseconds))` | + +Note: the times and timestamps with timezones are converted to UTC. In Parquet, the field metadata `is_adjusted_to_utc` will always be `true`. + +#### Time of Day Conversion Bugfix + +Formerly, for the Airbyte type `time_of_day_with_timezone`, [timezones were not respected when converting](https://github.com/airbytehq/airbyte/issues/43019). This has been fixed. + +| Input Data | Old Output Data | New Output Data | +|--------------------|------------------------------|------------------------------| +| `"12:00:00-01:00"` | `43200000000` (12:00:00 UTC) | `46800000000` (13:00:00 UTC) | +| `"04:00:00+02:00"` | `14400000000` (04:00:00 UTC) | `720000000` (02:00:00 UTC) | +| `"04:00:00+05:30"` | `14400000000` (04:00:00 UTC) | `81000000000` (22:30:00 UTC) | + +### Object Types + +Formerly, in both Avro and Parquet formats, objects without schemas (`"type": "object"` without `properties`), and properties not listed in the `properties` field, were accumulated in `_airbyte_additional_properties`. The new behavior is + +* undocumented fields in object with schemas are silently dropped +* objects without schemas are serialized into a JSON string +* `_airbyte_additional_properties` is dropped entirely + +For example, the following input schemas and data will result in the following outputs: + +| Input Schema | Input Data | Output Schema | Output Data | +|-----------------------------------------------------------------------|--------------------------------|-----------------------------------------------------------------------|--------------------------------------| +| `{ "type": "object", "properties": { "id": { "type": "integer" } } }` | `{ "id": 1, "name": "Alice" }` | `{ "type": "object", "properties": { "id": { "type": "integer" } } }` | `{ "id": 1 }` | +| `{ "type": "object" }` | `{ "id": 1, "name": "Alice" }` | `{ "type": "string" }` | `"{\"id\": 1, \"name\": \"Alice\"}"` | + +Note: dropped fields will not appear in `_airbyte_meta.changes[]`. Additionally, n object with null or empty properties (`"properties": {}`) will be treated as a schemaless object. This is because this usually indicates an upstream source is failing to report its schema properly, and the data is not actually extraneous. + + +### Array Types + +Formerly, arrays without types (`"type": "array"` with no `items` field) were converted to native arrays of string serializations of the underlying types. The same behavior was applied to arrays of union types (`{ "type": "array", "items": { "oneOf": [ /* various types */ ] }`). + +Now: + +* Arrays without types are serialized into JSON array strings. +* Arrays of unions are treated as arrays of mixed types. + +For example, the following input schemas and data formerly resulted in: + +| Input Schema | Input Data | Old Output Schema | Old Output Data | +|----------------------------------------------------------------------------------------|----------------|------------------------------------------------------|------------------| +| `{ "type": "array", "items": { "type": "integer" } }` | `[1, "Alice"]` | `{ "type": "array", "items": ["null", "integer"] }` | [SYNC FAILED] | +| `{ "type": "array" }` | `[1, "Alice"]` | `{ "type": "array", "items": [ "null", "string" ] }` | `["1", "Alice"]` | +| `{ "type": "array", "items": { "oneOf": [ {"type": "integer", "type": "string"} ] } }` | `[1, "Alice"]` | `{ "type": "array", "items": [ "null", "string" ] }` | `["1", "Alice"]` | +| `{ "type": "array", "items": { "oneOf": [ {"type": "integer", "type": "string"} ] } }` | `[1, false]` | `{ "type": "array", "items": [ "null", "string" ] }` | `["1", "false"]` | + +Now: + +| Input Schema | Input Data | New Output Schema | New Output Data | +|----------------------------------------------------------------------------------------|----------------|-----------------------------------------------------------------|--------------------| +| `{ "type": "array", "items": { "type": "integer" } }` | `[1, "Alice"]` | `{ "type": "array", "items": ["null", "integer"] }` | `[1, null*]` | +| `{ "type": "array" }` | `[1, "Alice"]` | `{ "type": "string" }` | `"[1, \"Alice\"]"` | +| `{ "type": "array", "items": { "oneOf": [ {"type": "integer", "type": "string"} ] } }` | `[1, "Alice"]` | `{ "type": "array", "items": [ "null", "integer", "string" ] }` | `[1, "Alice"]` | +| `{ "type": "array", "items": { "oneOf": [ {"type": "integer", "type": "string"} ] } }` | `[1, false]` | `{ "type": "array", "items": [ "null", "integer", "string" ] }` | `[1, null*]` | + +*The nulled fields represent conversion failures and will appear in `_airbyte_meta.changes[]`. + +This behavior will be applied to both Avro and Parquet formats. + +### Union Types (Parquet Only) + +#### Disjoint Record Improvements + +Formerly, unions in Parquet were represented as anonymous disjoint records. For example: + +| Airbyte Union Type | Old Parquet Type | +|----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------| +| `{"oneOf": [ {"type": "integer"}, {"type": "string"} ] }` | `Optional Record { member0: Optional Int32, member1: Optional String }` | +| `{"oneOf": [ {"type": "boolean"}, {"type": "object", "properties": { "id": { "type": "integer" }, "name": { "type": string" } } } ] }` | `Optional Record { member0: Optional Boolean, member1: Optional Record { id: Optional Uint32, name: Optional String } }` | + +Now unions will be represented as typed disjoint records with named fields. For example: + +| Airbyte Union Type | New Parquet Type | +|----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------| +| `{"oneOf": [ {"type": "integer"}, {"type": "string"} ] }` | `Optional Record { type: String, integer: Optional Int32, string: Optional String }` | +| `{"oneOf": [ {"type": "boolean"}, {"type": "object", "properties": { "id": { "type": "integer" }, "name": { "type": string" } } } ] }` | `Optional Record { type: String, boolean: Optional Boolean, object: Optional Record { id: Optional Uint32, name: Optional String } }` | + +Where +* Only one field will always be set. +* The `type` field will always be set and always be equal to the name of the field that was set. + +Following the examples above: + +| Input Data | New Output Data | +|-------------------------------|-----------------------------------------------------------------| +| `1` | `{ "type": "integer", "integer": 1 }` | +| `{"id": 10, "name": "Alice"}` | `{ "type": "object", "object": { "id": 10, "name": "Alice" } }` | + +#### Disjoint Record Type Names + +The following type and field names will be used for each Airbyte type: + +| Airbyte Type | Parquet Type | Field Name | +|--------------------------------------------------------------------------------------------|------------------------------------------------------------------------|------------------------------| +| `{"type": "integer" }` | `Optional Int64` | `integer` | +| `{"type": "string" }` | `Optional String` | `string` | +| `{"type": "boolean" }` | `Optional Boolean` | `boolean` | +| `{"type": "object" }` | `Optional Record` | `object` | +| `{"type": "array" }` | `Optional List` | `array` | +| `{"type": "string", "format": "date" }` | `Optional Int32(logicalType=Date)` | `date` | +| `{"type": "string", "format": "time", "airbyte_type": "time_with_timezone" }` | `Optional Int64(Optional Int64(logical_type=Time(Microseconds))` | `time_with_timezone` | +| `{"type": "string", "format": "time", "airbyte_type": "time_without_timezone" }` | `Optional Int64(Optional Int64(logical_type=Time(Microseconds))` | `time_without_timezone` | +| `{"type": "string", "format": "date-time", "airbyte_type": "timestamp_with_timezone" }` | `Optional Int64(logical_type=Timestamp(Microseconds)))` | `timestamp_with_timezone` | +| `{"type": "string", "format": "date-time", "airbyte_type": "timestamp_without_timezone" }` | `Optional Int64(Optional Int64(logical_type=Timestamp(Microseconds)))` | `timestamp_without_timezone` | + +#### Merging Like Options in Unions + +The above behavior applies only to unions of distinct types. Unions of the same type will continue to be merged. If the result is a single option, the union will be demoted to a field of that type. For example: + +| Airbyte Union Type | Avro Type | Parquet Type | +|---------------------------------------------------------------------------------|---------------------------------|-----------------------------------------------------------------------------------------------| +| `{"oneOf": [ {"type": "integer"}, {"type": "integer"} ] }` | `["null", "integer"]` | `Optional Int64` | +| `{"oneOf": [ {"type": "integer"}, {"type": "string"}, {"type": "integer" } ] }` | `["null", "integer", "string"]` | `Optional Record { type: Optional String, integer: Optional Int64, string: Optional String }` | + +Unions of objects will continue to be merged into a single object with a union schema. If this is not possible due to field type conflicts, an exception will be thrown and the sync will fail. + +| Object 1 Properties | Object 2 Properties | Merged Object | +|-------------------------------|---------------------------------|-----------------------------------------------| +| `{id: integer}` | `{name: string}` | `{id: integer, name: string}` | +| `{id: integer, name: string}` | `{id: integer, birthday: date}` | `{id: integer, name: string, birthday: date}` | +| `{id: integer}` | `{id: string}` | [SYNC FAILED] | + +Unions of arrays with different item schemas continue not to be supported. \ No newline at end of file diff --git a/docs/integrations/destinations/s3.md b/docs/integrations/destinations/s3.md index b5899113b082a..317f3e714df37 100644 --- a/docs/integrations/destinations/s3.md +++ b/docs/integrations/destinations/s3.md @@ -14,6 +14,7 @@ If you are using STS Assume Role, you must provide the following: - **Role ARN** + Otherwise, if you are using AWS credentials you must provide the following: - **Access Key ID** @@ -377,14 +378,31 @@ Like most of the other Airbyte destination connectors, usually the output has th an emission timestamp, and the data blob. With the CSV output, it is possible to normalize \(flatten\) the data blob to multiple columns. -| Column | Condition | Description | -| :-------------------- | :------------------------------------------------------------------------------------------------ | :----------------------------------------------------------------------- | -| `_airbyte_ab_id` | Always exists | A uuid assigned by Airbyte to each processed record. | -| `_airbyte_emitted_at` | Always exists. | A timestamp representing when the event was pulled from the data source. | -| `_airbyte_data` | When no normalization \(flattening\) is needed, all data reside under this column as a json blob. | | -| root level fields | When root level normalization \(flattening\) is selected, the root level fields are expanded. | | +| Column | Condition | Description | +|:-------------------------|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------| +| `_airbyte_raw_id` | Always exists. | A uuid assigned by Airbyte to each processed record. | +| `_airbyte_extracted_at` | Always exists. | A timestamp representing when the event was extracted from the data source. | +| `_airbyte_generation_id` | Always exists. | An integer id that increases with each new refresh. | +| `_airbyte_meta` | Always exists. | A structured object containing metadata about the record. | +| `_airbyte_data` | When no normalization \(flattening\) is needed, all data resides under this column as a JSON blob. | | +| root level fields | When root level normalization \(flattening\) is selected, the root level fields are expanded. | | + +The schema for `_airbyte_meta` is: + +| Field Name | Type | Description | +|:-----------|:--------|:----------------------------------------| +| `changes` | list | A list of structured change objects. | +| `sync_id` | integer | An integer identifier for the sync job. | + +The schema for a change object is: + +| Field Name | Type | Description | +|:-----------|:-------|:-------------------------------------------------------------------------------------------------------------------------| +| `field` | string | The name of the field that changed. | +| `change` | string | The type of change (eg, `NULLED`, `TRUNCATED`). | +| `reason` | string | The reason for the change, including its system of origin (ie, whether it was a source, destination, or platform error). | -For example, given the following json object from a source: +For example, given the following JSON object from a source: ```json { @@ -398,15 +416,15 @@ For example, given the following json object from a source: With no normalization, the output CSV is: -| `_airbyte_ab_id` | `_airbyte_emitted_at` | `_airbyte_data` | -| :------------------------------------- | :-------------------- | :------------------------------------------------------------- | -| `26d73cde-7eb1-4e1e-b7db-a4c03b4cf206` | 1622135805000 | `{ "user_id": 123, name: { "first": "John", "last": "Doe" } }` | +| `_airbyte_raw_id` | `_airbyte_extracted_at` | `_airbyte_generation_id` | `_airbyte_meta` | `_airbyte_data` | +|:---------------------------------------|:------------------------|:-------------------------|-------------------------------------|:---------------------------------------------------------------| +| `26d73cde-7eb1-4e1e-b7db-a4c03b4cf206` | 1622135805000 | 11 | `{"changes":[], "sync_id": 10111 }` | `{ "user_id": 123, name: { "first": "John", "last": "Doe" } }` | With root level normalization, the output CSV is: -| `_airbyte_ab_id` | `_airbyte_emitted_at` | `user_id` | `name` | -| :------------------------------------- | :-------------------- | :-------- | :----------------------------------- | -| `26d73cde-7eb1-4e1e-b7db-a4c03b4cf206` | 1622135805000 | 123 | `{ "first": "John", "last": "Doe" }` | +| `_airbyte_raw_id` | `_airbyte_extracted_at` | `_airbyte_generation_id` | `_airbyte_meta` | `user_id` | `name.first` | `name.last` | +|:---------------------------------------|:------------------------|:-------------------------|-------------------------------------|:---------:|:------------:|:-----------:| +| `26d73cde-7eb1-4e1e-b7db-a4c03b4cf206` | 1622135805000 | 11 | `{"changes":[], "sync_id": 10111 }` | 123 | John | Doe | Output files can be compressed. The default option is GZIP compression. If compression is selected, the output filename will have an extra extension (GZIP: `.csv.gz`). @@ -418,13 +436,15 @@ structure as follows: ```json { - "_airbyte_ab_id": "", - "_airbyte_emitted_at": "", + "_airbyte_raw_id": "", + "_airbyte_extracted_at": "", + "_airbyte_generation_id": "", + "_airbyte_meta": "", "_airbyte_data": "" } ``` -For example, given the following two json objects from a source: +For example, given the following two JSON objects from a source: ```json [ @@ -448,8 +468,8 @@ For example, given the following two json objects from a source: They will be like this in the output file: ```text -{ "_airbyte_ab_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_emitted_at": "1622135805000", "_airbyte_data": { "user_id": 123, "name": { "first": "John", "last": "Doe" } } } -{ "_airbyte_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_emitted_at": "1631948170000", "_airbyte_data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } } +{ "_airbyte_raw_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_extracted_at": "1622135805000", "_airbyte_generation_id": "11", "_airbyte_meta": { "changes": [], "sync_id": 10111 }, "_airbyte_data": { "user_id": 123, "name": { "first": "John", "last": "Doe" } } } +{ "_airbyte_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_extracted_at": "1631948170000", "_airbyte_generation_id": "12", "_airbyte_meta": { "changes": [], "sync_id": 10112 }, "_airbyte_data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } } ``` Output files can be compressed. The default option is GZIP compression. If compression is selected, @@ -514,7 +534,8 @@ To see connector limitations, or troubleshoot your S3 connector, see more [in ou | Version | Date | Pull Request | Subject | |:--------|:-----------|:-----------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------| -| 0.6.7 | 2024-08-11 | [43713](https://github.com/airbytehq/airbyte/issues/43713) | Decreased memory ratio (0.7 -> 0.5) and thread allocation (5 -> 1) for async S3 uploads. | +| 1.0.0 | 2024-08-08 | [42409](https://github.com/airbytehq/airbyte/pull/42409) | Major breaking changes: new destination schema, change capture, Avro/Parquet improvements, bugfixes | +| 0.6.7 | 2024-08-11 | [43713](https://github.com/airbytehq/airbyte/issues/43713) | Decreased memory ratio (0.7 -> 0.5) and thread allocation (5 -> 2) for async S3 uploads. | | 0.6.6 | 2024-08-06 | [43343](https://github.com/airbytehq/airbyte/pull/43343) | Use Kotlin 2.0.0 | | 0.6.5 | 2024-08-01 | [42405](https://github.com/airbytehq/airbyte/pull/42405) | S3 parallelizes workloads, checkpoints, submits counts, support for generationId in metadata for refreshes. | | 0.6.4 | 2024-04-16 | [42006](https://github.com/airbytehq/airbyte/pull/42006) | remove unnecessary zookeeper dependency | diff --git a/docs/understanding-airbyte/json-avro-conversion.md b/docs/understanding-airbyte/json-avro-conversion.md index e2abde02918ba..20b533ecdfcb6 100644 --- a/docs/understanding-airbyte/json-avro-conversion.md +++ b/docs/understanding-airbyte/json-avro-conversion.md @@ -1,18 +1,18 @@ -# Json to Avro Conversion for Blob Storage Destinations +# JSON to Avro Conversion for Blob Storage Destinations -When an Airbyte data stream is synced to the Avro or Parquet format (e.g. Parquet on S3), the source Json schema is converted to an Avro schema, then the Json object is converted to an Avro record based on the Avro schema (and further to Parquet if necessary). Because the data stream can come from any data source, the Json to Avro conversion process has the following rules and limitations. +When an Airbyte data stream is synced to the Avro or Parquet format (e.g. Parquet on S3), the source JSON schema is converted to an Avro schema, then the JSON object is converted to an Avro record based on the Avro schema (and further to Parquet if necessary). Because the data stream can come from any data source, the JSON to Avro conversion process has the following rules and limitations. ## Conversion Rules ### Type Mapping -Json schema types are mapped to Avro types as follows: +JSON schema types are mapped to Avro types as follows: -| Json Data Type | Avro Data Type | +| JSON Data Type | Avro Data Type | | :------------: | :------------: | | string | string | | number | double | -| integer | int | +| integer | long | | boolean | boolean | | null | null | | object | record | @@ -20,19 +20,19 @@ Json schema types are mapped to Avro types as follows: ### Nullable Fields -All fields are nullable. For example, a `string` Json field will be typed as `["null", "string"]` in Avro. This is necessary because the incoming data stream may have optional fields. +All fields are nullable. For example, a `string` JSON field will be typed as `["null", "string"]` in Avro. This is necessary because the incoming data stream may have optional fields. ### Built-in Formats -The following built-in Json formats will be mapped to Avro logical types. +The following built-in JSON formats will be mapped to Avro logical types. -| Json Type | Json Built-in Format | Avro Type | Avro Logical Type | Meaning | +| JSON Type | JSON Built-in Format | Avro Type | Avro Logical Type | Meaning | | --------- | -------------------- | --------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | `string` | `date` | `int` | `date` | Number of epoch days from 1970-01-01 ([reference](https://avro.apache.org/docs/current/spec.html#Date)). | | `string` | `time` | `long` | `time-micros` | Number of microseconds after midnight ([reference](https://avro.apache.org/docs/current/spec.html#Time+%28microsecond+precision%29)). | | `string` | `date-time` | `long` | `timestamp-micros` | Number of microseconds from `1970-01-01T00:00:00Z` ([reference](https://avro.apache.org/docs/current/spec.html#Timestamp+%28microsecond+precision%29)). | -In the final Avro schema, these Avro logical type fields will be a union of the logical type and string. The rationale is that the incoming Json objects may contain invalid Json built-in formats. If that's the case, and the conversion from the Json built-in format to Avro built-in format fails, the field will fall back to a string. The extra string type can cause problem for some users in the destination. We may re-evaluate this conversion rule in the future. This issue is tracked [here](https://github.com/airbytehq/airbyte/issues/17011). +In the final Avro schema, these logical type fields will be typed as a union of null and the logical type. The logical type will be stored as UTC, respecting timezone as/applicable. If the incoming data cannot be converted, the field will be nulled, and the failure will be captured in `_airbyte_meta.changes[]`. **Date** @@ -65,8 +65,7 @@ and the Avro schema is: { "type": "int", "logicalType": "date" - }, - "string" + } ] } ``` @@ -102,8 +101,7 @@ and the Avro schema is: { "type": "long", "logicalType": "time-micros" - }, - "string" + } ] } ``` @@ -139,15 +137,14 @@ and the Avro schema is: { "type": "long", "logicalType": "timestamp-micros" - }, - "string" + } ] } ``` ### Combined Restrictions -Combined restrictions \(`allOf`, `anyOf`, and `oneOf`\) will be converted to type unions. The corresponding Avro schema can be less stringent. For example, the following Json schema +Combined restrictions \(`allOf`, `anyOf`, and `oneOf`\) will be converted to type unions. The corresponding Avro schema can be less stringent. For example, the following JSON schema ```json { @@ -163,6 +160,15 @@ will become this in Avro schema: } ``` +Some union edge cases can result in unexpected behavior or break syncs. + +* A union of two of the same time types (`timestamp_with_timezone` and `timestamp_without_timezone`; or `time_with_timezone` and `time_without_timezone`) will work as expected, but a union of any `time_...` type with any `timestamp_...` type will break the sync. +* A union of a `date` with `time` or `timestamp` type will result in undefined behavior. +* A union of a `string` and a time type will not work as expected. The synced data will always be a string, regardless of whether it is a legal timestamp. +* A union of an `integer` with a time type will not work as expected when processing timestamps. Before version 1.0, this would break the sync. Now it will result in the timestamp being nulled and the failure added to change capture. + +See [this issue](https://github.com/airbytehq/airbyte/issues/43378). + ### Keyword `not` Keyword `not` is not supported, as there is no equivalent validation mechanism in Avro schema. @@ -175,7 +181,7 @@ Field name cannot start with a number, so an underscore will be added to those f ### Array Types -For array fields in Json schema, when the `items` property is an array, it means that each element in the array should follow its own schema sequentially. For example, the following specification means the first item in the array should be a string, and the second a number. +For array fields in JSON schema, when the `items` property is an array, it means that each element in the array should follow its own schema sequentially. For example, the following specification means the first item in the array should be a string, and the second a number. ```json { @@ -202,9 +208,9 @@ This is not supported in Avro schema. As a compromise, the converter creates a u } ``` -If the Json array has multiple object items, these objects will be recursively merged into one Avro record. For example, the following Json array expects two different objects. The first object has an `id` field, and second has an `id` and `message` field. Their `id` fields have slightly different types. +If the JSON array has multiple object items, these objects will be recursively merged into one Avro record. For example, the following JSON array expects two different objects. The first object has an `id` field, and second has an `id` and `message` field. Their `id` fields have slightly different types. -Json schema: +JSON schema: ```json { @@ -243,7 +249,7 @@ Json schema: } ``` -Json object: +JSON object: ```json { @@ -265,7 +271,7 @@ Json object: } ``` -After conversion, the two object schemas will be merged into one. Furthermore, the fields under the `id` record, `id_part_1` and `id_part_2`, will also be merged. In this way, all possible valid elements from the Json array can be converted to Avro records. +After conversion, the two object schemas will be merged into one. Furthermore, the fields under the `id` record, `id_part_1` and `id_part_2`, will also be merged. In this way, all possible valid elements from the JSON array can be converted to Avro records. Avro schema: @@ -319,7 +325,7 @@ Avro schema: } ``` -Note that `id_part_1` is a union of `int` and `string`, which comes from the first and second `id` definitions, respectively, in the original Json `items` specification. +Note that `id_part_1` is a union of `int` and `string`, which comes from the first and second `id` definitions, respectively, in the original JSON `items` specification. Avro object: @@ -348,9 +354,9 @@ Note that the first object in `array_field` originally does not have a `message` ### Untyped Array -When a Json array field has no `items`, the element in that array field may have any type. However, Avro requires that each array has a clear type specification. To solve this problem, the elements in the array are forced to be `string`s. +When a JSON array field has no `items`, the element in that array field may have any type. However, Avro requires that each array has a clear type specification. To solve this problem, the json array is serialized into its string representation. -For example, given the following Json schema and object: +For example, given the following JSON schema and object: ```json { @@ -379,10 +385,7 @@ the corresponding Avro schema and object will be: "name": "identifier", "type": [ "null", - { - "type": "array", - "items": ["null", "string"] - } + "string" ], "default": null } @@ -392,35 +395,26 @@ the corresponding Avro schema and object will be: ```json { - "identifier": ["151", "152", "true", "{\"id\": 153}", null] + "identifier": "[151, 152, true, {\"id\": 153}, null]" } ``` -Note that every non-null element inside the `identifier` array field is converted to string. - ### Airbyte-Specific Fields Three Airbyte specific fields will be added to each Avro record: | Field | Schema | Document | | :------------------------------- | :----------------- | :-----------------------------------------------------------------------------------------: | -| `_airbyte_ab_id` | `uuid` | [link](http://avro.apache.org/docs/current/spec.html#UUID) | -| `_airbyte_emitted_at` | `timestamp-millis` | [link](http://avro.apache.org/docs/current/spec.html#Timestamp+%28millisecond+precision%29) | -| `_airbyte_additional_properties` | `map` of `string` | See explanation below. | +| `_airbyte_raw_id` | `uuid` | [link](http://avro.apache.org/docs/current/spec.html#UUID) | +| `_airbyte_extracted_at` | `timestamp-millis` | [link](http://avro.apache.org/docs/current/spec.html#Timestamp+%28millisecond+precision%29) | +| `_airbyte_generation_id` | `long` | https://github.com/airbytehq/airbyte/issues/17011 | +| `_airbyte_meta` | `record` | | ### Additional Properties -A Json object can have additional properties of unknown types, which is not compatible with the Avro schema. To solve this problem during Json to Avro object conversion, we introduce a special field: `_airbyte_additional_properties` typed as a nullable `map` from `string` to `string`: - -```json -{ - "name": "_airbyte_additional_properties", - "type": ["null", { "type": "map", "values": "string" }], - "default": null -} -``` +A JSON object can have additional properties of unknown types, which is not compatible with the Avro schema. These properties will be silently dropped. -For example, given the following Json schema: +For example, given the following JSON schema: ```json { @@ -433,7 +427,7 @@ For example, given the following Json schema: } ``` -this Json object +this JSON object ```json { @@ -453,22 +447,15 @@ will be converted to the following Avro object: ```json { - "username": "admin", - "_airbyte_additional_properties": { - "active": "true", - "age": "21", - "auth": "{\"auth_type\":\"ssl\",\"api_key\":\"abcdefg/012345\",\"admin\":false,\"id\":1000}" - } + "username": "admin" } ``` -Note that all fields other than the `username` is moved under `_ab_additional_properties` as serialized strings, including the original object `auth`. - ### Untyped Object -If an `object` field has no `properties` specification, all fields within this `object` will be put into the aforementioned `_airbyte_additional_properties` field. +If an `object` field has no `properties` specification, the entire json object will be serialized into its string representation. -For example, given the following Json schema and object: +For example, given the following JSON schema and object: ```json { @@ -488,26 +475,12 @@ the corresponding Avro schema and record will be: ```json { - "type": "record", - "name": "record_without_properties", - "fields": [ - { - "name": "_airbyte_additional_properties", - "type": ["null", { "type": "map", "values": "string" }], - "default": null - } - ] + "type": "string" } ``` ```json -{ - "_airbyte_additional_properties": { - "username": "343-guilty-spark", - "password": "1439", - "active": "true" - } -} +"{\"username\":\"343-guilty-spark\",\"password\":1439,\"active\":true}" ``` ### Untyped Field @@ -516,7 +489,7 @@ Any field without property type specification will default to a `string` field, ## Example -Based on the above rules, here is an overall example. Given the following Json schema: +Based on the above rules, here is an overall example. Given the following JSON schema: ```json { @@ -553,19 +526,61 @@ Its corresponding Avro schema will be: "type": "record", "fields": [ { - "name": "_airbyte_ab_id", + "name": "_airbyte_raw_id", "type": { "type": "string", "logicalType": "uuid" } }, { - "name": "_airbyte_emitted_at", + "name": "_airbyte_extracted_at", "type": { "type": "long", "logicalType": "timestamp-millis" } }, + { + "name": "_airbyte_generation_id", + "type": "long" + }, + { + "name" : "_airbyte_meta", + "type" : { + "type" : "record", + "name" : "_airbyte_meta", + "namespace" : "", + "fields" : [ + { + "name" : "sync_id", + "type" : "long" + }, + { + "name" : "changes", + "type" : { + "type" : "array", + "items" : { + "type" : "record", + "name" : "change", + "fields" : [ + { + "name" : "field", + "type" : "string" + }, + { + "name" : "change", + "type" : "string" + }, + { + "name" : "reason", + "type" : "string" + } + ] + } + } + } + ] + } + }, { "name": "id", "type": ["null", "int"], @@ -589,11 +604,6 @@ Its corresponding Avro schema will be: "type": ["null", "int"], "doc": "_airbyte_original_name:field_with_spécial_character", "default": null - }, - { - "name": "_airbyte_additional_properties", - "type": ["null", { "type": "map", "values": "string" }], - "default": null } ] } @@ -604,22 +614,14 @@ Its corresponding Avro schema will be: "name": "created_at", "type": [ "null", - { "type": "long", "logicalType": "timestamp-micros" }, - "string" + { "type": "long", "logicalType": "timestamp-micros" } ], "default": null - }, - { - "name": "_airbyte_additional_properties", - "type": ["null", { "type": "map", "values": "string" }], - "default": null } ] } ``` -More examples can be found in the Json to Avro conversion [test cases](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/bases/base-java-s3/src/test/resources/parquet/json_schema_converter). - ## Implementation - Schema conversion: [JsonToAvroSchemaConverter](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/base-java-s3/src/main/java/io/airbyte/integrations/destination/s3/avro/JsonToAvroSchemaConverter.java) diff --git a/docusaurus/sidebars.js b/docusaurus/sidebars.js index d7c97e0ad5d35..1e9bccada8529 100644 --- a/docusaurus/sidebars.js +++ b/docusaurus/sidebars.js @@ -155,11 +155,16 @@ const destinationS3 = { id: "integrations/destinations/s3", }, items: [ + { + type: "doc", + label: "Migration Guide", + id: "integrations/destinations/s3-migrations", + }, { type: "doc", label: "Troubleshooting", id: "integrations/destinations/s3/s3-troubleshooting", - }, + } ], };