Update dependency io.openlineage:openlineage-java to v1.13.1 #2789

renovate · 2024-04-01T09:30:37Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
io.openlineage:openlineage-java	`1.9.1` -> `1.13.1`

Release Notes

OpenLineage/OpenLineage (io.openlineage:openlineage-java)

`v1.13.1`

Compare Source

Added

Java: allow timeout for circuit breakers #2609 @pawel-big-lebowski
Extends the circuit breaker mechanism to contain a global timeout that stops running OpenLineage integration code when a specified amount of time has elapsed.
Java: handle DataSetEvent and JobEvent in Transport.emit #2611 @dolfinus
Adds overloads Transport.emit(OpenLineage.DatasetEvent) and Transport.emit(OpenLineage.JobEvent), reusing the implementation of Transport.emit(OpenLineage.RunEvent). Please note: Transport.emit(String) is now deprecated and will be removed in 1.16.0.
Java/Python: add GZIP compression to HttpTransport #2603 #2604 @dolfinus
Adds a compression option to HttpTransport config in the Java and Python clients, with gzip implementation.
Java/Python/Proxy: properly set Kafka message key #2571 #2597 #2598 @dolfinus
Adds a new messageKey option to KafkaTransport config in the Python and Java clients, as well as the Proxy. This option replaces the localServerId option, which is now deprecated. Default value is generated using the run id (for RunEvent), job name (for JobEvent) or dataset name (for DatasetEvent). This value is used by the Kafka producer to distribute messages along topic partitions, instead of sending all the events to the same partition. This allows for full utilization of Kafka performance advantages.
Flink: add support for Micrometer metrics #2633 @mobuchowski
Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation for Flink as has been implemented for Spark. Included: MeterRegistry, CompositeMeterRegistry, SimpleMeterRegistry, and MicrometerProvider.
Python: generate Python facets from JSON schemas #2520 @JDarDagran
Objects specified with JSON Schema needed to be manually developed and checked in Python, leading to many discrepancies, including wrong schema URLs. This adds a datamodel-code-generator for parsing JSON Schema and generating Pydantic or dataclasses classes, etc. In order to use attrs (a more modern version of dataclasses) and overcome some limitations of the tool, a number of steps have been added in order to customize code to meet OpenLineage requirements. Included: updated references to the latest base JSON Schema spec for all child facets. Please note: newly generated code creates a v2 interface that will be implemented in existing integrations in a future release. The v2 interface introduces some breaking changes: facets are put into separate modules per JSON Schema spec file, some names are changed, and several classes are now kw_only.
Spark/Flink/Java: support YAML config files together with SparkConf/FlinkConf #2583 @pawel-big-lebowski
Creates a SparkOpenlineageConfig and FlinkOpenlineageConfig for a more uniform configuration experience for the user. Renames OpenLineageYaml to OpenLineageConfig and modifies the code to use only OpenLineageConfig classes. Includes a doc update to mention that both ways can be used interchangeably and final documentation will merge all values provided.
Spark: add custom token provider support #2613 @tnazarew
Adds support for FQCN as spark.openlineage.transport.auth.type.
Spark: enable timeout for circuit breakers #2609 @pawel-big-lebowski
Implements within the circuit breaker an optional timeout that switches off the OpenLineage integration code.
Spark: add custom token provider support #2613 @tnazarew
Adds a TokenProviderTypeIdResolver to handle both FQCN and (for backward compatibility) api_key types in spark.openlineage.transport.auth.type.
Spark/Flink: support YAML config files together with SparkConf & FlinkConf approaches #2583 @pawel-big-lebowski
Adds support for config entries being provided by both YAML file and integration-specific configuration (SparkConf/FlinkConf). Allows each integration to have its own config entries.
Spark/Flink: job ownership facet #2533 @pawel-big-lebowski
Enables configuration entries specifying ownership of the job that will result in an OwnershipJobFacet being attached to job facets.

Changed

Java: sync Kinesis partitionKey format with Kafka implementation #2620 @dolfinus
Changes the format of Kinesis partitionKey from {jobNamespace}:{jobName} to run:{jobNamespace}/{jobName} to match the Kafka transport implementation.

Fixed

Python: make load_config return an empty dict instead of None when file empty #2596 @kacpermuda
utils.load_config() now returns an empty dict instead of None in the case of an empty file to prevent an OpenLineageClient crash.
Java: render lombok-generated methods in javadoc #2614 @dolfinus
Fixes rendering of javadoc for methods generated by lombok annotations by adding a delombok step.
Spark/Snowflake: parse NPE when query option is used and table is empty #2599 @mobuchowski
Fixes NPE when using query option when reading from Snowflake.

`v1.12.0`

Compare Source

Added

Airflow: add lineage_job_namespace and lineage_job_name macros #2582 @dolfinus
Adds new Airflow macros lineage_job_namespace(), lineage_job_name(task) that return an Airflow namespace and Airflow job name, respectively.
Spec: allow nested struct fields in SchemaDatasetFacet #2548 @dolfinus
Allows nested fields support to SchemaDatasetFacet.

Fixed

Airflow: fix format returned by airflow.macros.lineage_parent_id #2578 @blacklight
Fixes the run format returned by the lineage_parent_id Airflow macro and simplifies the format of the lineage_parent_id and lineage_run_id macros.
Dbt: propagate the dbt return code also when no OpenLineage events are emitted #2591 @blacklight
dbt-ol now propagates the exit code of the underlying dbt process even if no lineage events are emitted.
Dagster: limit Dagster version to 1.6.9 #2579 @JDarDagran
Adds an upper limit on supported versions of Dagster as the integration is no longer actively maintained and recent releases introduce breaking changes.
Java: make sure string isn't empty to prevent going out of bounds #2585 @harels
String lookup was not accounting for empty strings and causing a java.lang.StringIndexOutOfBoundsException.
Java: fix javadoc #2624 @pawel-big-lebowski
Improves developer experience by fixing issues resulting in warnings on build.
Python: fix missing pkg_resources module on Python 3.12 #2572 @dolfinus
Removes pkg_resources dependency and replaces it with the packaging lib.
Spark: use HashSet in column-level lineage instead of iterating through LinkedList #2584 @mobuchowski
Takes advantage of performance gains available from using HashSet for collection.

`v1.11.3`

Compare Source

Added

Common: add support for SCRIPT-type jobs in BigQuery #2564 @kacpermuda
In the case of SCRIPT-type jobs in BigQuery, no lineage was being extracted because the SCRIPT job had no lineage information - it only spawned child jobs that had that information. With this change, the integration extracts lineage information from child jobs when dealing with SCRIPT-type jobs.
Spark: support for built-in lineage extraction #2272 @pawel-big-lebowski
This PR adds a spark-interfaces-scala package that allows lineage extraction to be implemented within Spark extensions (Iceberg, Delta, GCS, etc.). The Openlineage integration, when traversing the query plan, verifies if nodes implement defined interfaces. If so, interface methods are used to extract lineage. Refer to the README for more details.
Spark/Java: add support for Micrometer metrics #2496 @mobuchowski
Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation. Included: MeterRegistryFactory, MicrometerProvider, StatsDMetricsBuilder, metrics config in OpenLineage config, and a Java client implementation.
Spark: add support for telemetry mechanism #2528 @mobuchowski
Adds timers, counters and additional instrumentation in order to implement Micrometer metrics collection.
Spark: support query option on table read #2556 @mobuchowski
Adds support for the Spark-BigQuery connector's query input option, which executes a query directly on BigQuery, storing the result in an intermediate dataset, bypassing Spark's computation layer. Due to this, the lineage is retrieved using the SQL parser, similarly to JDBCRelation.
Spark: change SparkPropertyFacetBuilder to support recording Spark runtime #2523 @Ruihua98
Modifies SparkPropertyFacetBuilder to capture the RuntimeConfig of the Spark session because the existing SparkPropertyFacet can only capture the static config of the Spark context. This facet will be added in both RDD-related and SQL-related runs.
Spec: add fileCount to dataset stat facets #2562 @dolfinus
Adds a fileCount field to DataQualityMetricsInputDatasetFacet and OutputStatisticsOutputDatasetFacet specification.

Fixed

dbt: dbt-ol should transparently exit with the same exit code as the child dbt process #2560 @blacklight
Makes dbt-ol transparently exit with the same exit code as the child dbt process.
Flink: disable module metadata generation #2531 @HuangZhenQiu
Disables the module metadata generation for Flink to fix the problem of having gradle dependencies to submodules within openlineage-flink.jar.
Flink: fixes to version 1.19 #2507 @pawel-big-lebowski
Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven pom dependency on subprojects.
Python: small improvements to .emit() method logging & annotations #2539 @dolfinus
Updates OpenLineage.emit debug messages and annotations.
SQL: show error message when OpenLineageSql cannot find native library #2547 @dolfinus
When the OpenLineageSql class could not load a native library, if returned None for all operations. But because the error message was suppressed, the user could not determine the reason.
SQL: update code to conform to upstream sqlparser-rs changes #2510 @mobuchowski
Includes tests and cosmetic improvements.
Spark: fix access to active Spark session #2535 @pawel-big-lebowski
Changes behavior so IllegalStateException is always caught when accessing SparkSession.
Spark: fix Databricks environment #2537 @pawel-big-lebowski
Fixes the ClassNotFoundError occurring on Databricks runtime and extends the integration test to verify DatabricksEnvironmentFacet.
Spark: fixed memory leak in JobMetricsHolder #2565 @d-m-h
The JobMetricsHolder#cleanUp(int) method now correctly purges unneeded state from both maps.
Spark: fixed memory leak in UnknownEntryFacetListener #2557 @pawel-big-lebowski
Prevents storing the state when a facet is disabled, purging the state after populating run facets.
Spark: fix parsing JDBCOptions(table=...) containing subquery #2546 @dolfinus
Prevents openlineage-spark from producing datasets with names like database.(select * from table) for JDBC sources.
Spark/Snowflake: support query option via SQL parser #2563 @mobuchowski
When a Snowflake job is bypassing Spark's computation layer, now the SQL parser will be used to get the lineage.
Spark: always catch IllegalStateException when accessing SparkSession #2535 @pawel-big-lebowski
IllegalStateException was not being caught.

`v1.10.2`

Compare Source

Added

Dagster: add new provider for version 1.6.10 #2518 @JDarDagran
Adds the new provider required by the latest version of Dagster.
Flink: support lineage for a hybrid source #2491 @HuangZhenQiu
Adds support for hybrid source lineage for users of Kafka and Iceberg sources in backfill usecases.
Flink: improve Cassandra lineage metadata #2479 @HuangZhenQiu
Cassandra cluster info to be used as the dataset namespace, and the keyspace to be combined with the table name as the dataset name.
Flink: bump Flink JDBC connector version #2472 @HuangZhenQiu
Bumps the Flink JDBC connector version to 3.1.2-1.18 for Flink 1.18.
Java: add a OpenLineageClientUtils#loadOpenLineageJson(InputStream) and change OpenLineageClientUtils#loadOpenLineageYaml(InputStream) methods #2490 @d-m-h
This improves the explicitness of the methods. Previously, loadOpenLineageYaml(InputStream) wanted the InputStream to contain bytes that represented JSON.
Java: add info from the HTTP response to the client exception #2486 @davidjgoss
Adds the status code and body as properties on the thrown exception when a non-success response is encountered in the HTTP transport.
Python: add support for MSK IAM authentication with a new transport #2478 @mattiabertorello
Eases publication of events to MSK with IAM authentication.

Removed

Airflow: remove redundant information from facets #2524 @kacpermuda
Refines the operator's attribute inclusion logic in facets to include only those known to be important or compact, ensuring that custom operator attributes with substantial data do not inflate the event size.

Fixed

Airflow: proceed without rendering templates if task_instance copy fails #2492 @kacpermuda
Airflow will now proceed without rendering templates if task_instance copy fails in listener.on_task_instance_running.
Spark: fix the HttpTransport timeout #2475 @pawel-big-lebowski
The existing timeout config parameter is ambiguous: implementation treats the value as double in seconds, although the documentation claims it's milliseconds. A new config param timeoutInMillis has been added. the Existing timeout has been removed from docs and will be deprecated in 1.13.
Spark: prevent NPE if the context is null #2515 @pawel-big-lebowski
Adds a check for a null context before executing end(jobEnd).
Flink: fix class not found issue for Cassandra #2507 @pawel-big-lebowski
Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven POM dependency on subprojects.
Flink: refine the JDBC table name #2512 @HuangZhenQiu
Enables the JDBC table name with a schema prefix.
Flink: fix JDBC dataset naming #2508 @pawel-big-lebowski
For JDBC, the Flink integration is not adjusted to the Openlineage naming convention. There is code that extracts the dataset namespace/name from the JDBC connection url, but it's in the Spark integration. As a solution, this code has to be extracted into the Java client and reused by the Spark and Flink integrations.
Flink: fix failure due to missing Cassandra classes #2507 @pawel-big-lebowski
Flink is failing when no Cassandra classes are present on the class path. This is happening because of CassandraUtils class which has a static hasClasses method, but it imports Cassandra-related classes in the header. Also, the Flink subproject contains an unnecessary maven-publish plugin.
Flink: fix release runtime dependencies #2504 @HuangZhenQiu
The shadow jar of Flink is not minimized, so some internal jars are listed as runtime dependences. This removes them from the final pom.xml file in the Flink module.
Spec: improve Cassandra lineage metadata #2479 @HuangZhenQiu
Following the namespace definition, we should use cassandra://host:port.

Configuration

📅 Schedule: Branch creation - "every 3 months on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

netlify · 2024-04-01T09:31:03Z

✅ Deploy Preview for peppy-sprite-186812 canceled.

Name	Link
🔨 Latest commit	`f260c68`
🔍 Latest deploy log	https://app.netlify.com/sites/peppy-sprite-186812/deploys/662aae99f142e900088d990f

codecov · 2024-04-01T09:40:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.47%. Comparing base (c553401) to head (f260c68).

Additional details and impacted files

@@            Coverage Diff            @@
##               main    #2789   +/-   ##
=========================================
  Coverage     84.47%   84.47%           
  Complexity     1429     1429           
=========================================
  Files           251      251           
  Lines          6460     6460           
  Branches        299      299           
=========================================
  Hits           5457     5457           
  Misses          850      850           
  Partials        153      153

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

…Project#2789) Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

renovate bot force-pushed the renovate/openlineageversion branch 2 times, most recently from cede554 to d58a172 Compare April 5, 2024 02:26

renovate bot changed the title ~~fix(deps): update dependency io.openlineage:openlineage-java to v1.10.2~~ fix(deps): update dependency io.openlineage:openlineage-java to v1.11.3 Apr 5, 2024

renovate bot force-pushed the renovate/openlineageversion branch from d58a172 to 540e5f3 Compare April 9, 2024 16:12

renovate bot changed the title ~~fix(deps): update dependency io.openlineage:openlineage-java to v1.11.3~~ fix(deps): update dependency io.openlineage:openlineage-java to v1.12.0 Apr 9, 2024

renovate bot force-pushed the renovate/openlineageversion branch 2 times, most recently from 057da1e to 8e086a0 Compare April 17, 2024 18:22

renovate bot changed the title ~~fix(deps): update dependency io.openlineage:openlineage-java to v1.12.0~~ Update dependency io.openlineage:openlineage-java to v1.12.0 Apr 17, 2024

renovate bot force-pushed the renovate/openlineageversion branch 2 times, most recently from a1c0c68 to e105f5d Compare April 22, 2024 23:10

Update dependency io.openlineage:openlineage-java to v1.13.1

f260c68

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

renovate bot force-pushed the renovate/openlineageversion branch from e105f5d to f260c68 Compare April 25, 2024 19:27

renovate bot changed the title ~~Update dependency io.openlineage:openlineage-java to v1.12.0~~ Update dependency io.openlineage:openlineage-java to v1.13.1 Apr 25, 2024

wslulciuc approved these changes Apr 26, 2024

View reviewed changes

wslulciuc merged commit 7b20098 into main Apr 26, 2024
17 checks passed

wslulciuc deleted the renovate/openlineageversion branch April 26, 2024 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency io.openlineage:openlineage-java to v1.13.1 #2789

Update dependency io.openlineage:openlineage-java to v1.13.1 #2789

renovate bot commented Apr 1, 2024 •

edited

Loading

netlify bot commented Apr 1, 2024 •

edited

Loading

codecov bot commented Apr 1, 2024 •

edited

Loading

Update dependency io.openlineage:openlineage-java to v1.13.1 #2789

Update dependency io.openlineage:openlineage-java to v1.13.1 #2789

Conversation

renovate bot commented Apr 1, 2024 • edited Loading

Release Notes

v1.13.1

Added

Changed

Fixed

v1.12.0

Added

Fixed

v1.11.3

Added

Fixed

v1.10.2

Added

Removed

Fixed

Configuration

netlify bot commented Apr 1, 2024 • edited Loading

✅ Deploy Preview for peppy-sprite-186812 canceled.

codecov bot commented Apr 1, 2024 • edited Loading

Codecov Report

renovate bot commented Apr 1, 2024 •

edited

Loading

`v1.13.1`

`v1.12.0`

`v1.11.3`

`v1.10.2`

netlify bot commented Apr 1, 2024 •

edited

Loading

codecov bot commented Apr 1, 2024 •

edited

Loading