Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency io.openlineage:openlineage-java to v1.13.1 #2789

Merged
merged 1 commit into from
Apr 26, 2024

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Apr 1, 2024

Mend Renovate

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
io.openlineage:openlineage-java 1.9.1 -> 1.13.1 age adoption passing confidence

Release Notes

OpenLineage/OpenLineage (io.openlineage:openlineage-java)

v1.13.1

Compare Source

Added
  • Java: allow timeout for circuit breakers #2609 @​pawel-big-lebowski
    Extends the circuit breaker mechanism to contain a global timeout that stops running OpenLineage integration code when a specified amount of time has elapsed.
  • Java: handle DataSetEvent and JobEvent in Transport.emit #2611 @​dolfinus
    Adds overloads Transport.emit(OpenLineage.DatasetEvent) and Transport.emit(OpenLineage.JobEvent), reusing the implementation of Transport.emit(OpenLineage.RunEvent). Please note: Transport.emit(String) is now deprecated and will be removed in 1.16.0.
  • Java/Python: add GZIP compression to HttpTransport #2603 #2604 @​dolfinus
    Adds a compression option to HttpTransport config in the Java and Python clients, with gzip implementation.
  • Java/Python/Proxy: properly set Kafka message key #2571 #2597 #2598 @​dolfinus
    Adds a new messageKey option to KafkaTransport config in the Python and Java clients, as well as the Proxy. This option replaces the localServerId option, which is now deprecated. Default value is generated using the run id (for RunEvent), job name (for JobEvent) or dataset name (for DatasetEvent). This value is used by the Kafka producer to distribute messages along topic partitions, instead of sending all the events to the same partition. This allows for full utilization of Kafka performance advantages.
  • Flink: add support for Micrometer metrics #2633 @​mobuchowski
    Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation for Flink as has been implemented for Spark. Included: MeterRegistry, CompositeMeterRegistry, SimpleMeterRegistry, and MicrometerProvider.
  • Python: generate Python facets from JSON schemas #2520 @​JDarDagran
    Objects specified with JSON Schema needed to be manually developed and checked in Python, leading to many discrepancies, including wrong schema URLs. This adds a datamodel-code-generator for parsing JSON Schema and generating Pydantic or dataclasses classes, etc. In order to use attrs (a more modern version of dataclasses) and overcome some limitations of the tool, a number of steps have been added in order to customize code to meet OpenLineage requirements. Included: updated references to the latest base JSON Schema spec for all child facets. Please note: newly generated code creates a v2 interface that will be implemented in existing integrations in a future release. The v2 interface introduces some breaking changes: facets are put into separate modules per JSON Schema spec file, some names are changed, and several classes are now kw_only.
  • Spark/Flink/Java: support YAML config files together with SparkConf/FlinkConf #2583 @​pawel-big-lebowski
    Creates a SparkOpenlineageConfig and FlinkOpenlineageConfig for a more uniform configuration experience for the user. Renames OpenLineageYaml to OpenLineageConfig and modifies the code to use only OpenLineageConfig classes. Includes a doc update to mention that both ways can be used interchangeably and final documentation will merge all values provided.
  • Spark: add custom token provider support #2613 @​tnazarew
    Adds support for FQCN as spark.openlineage.transport.auth.type.
  • Spark: enable timeout for circuit breakers #2609 @​pawel-big-lebowski
    Implements within the circuit breaker an optional timeout that switches off the OpenLineage integration code.
  • Spark: add custom token provider support #2613 @​tnazarew
    Adds a TokenProviderTypeIdResolver to handle both FQCN and (for backward compatibility) api_key types in spark.openlineage.transport.auth.type.
  • Spark/Flink: support YAML config files together with SparkConf & FlinkConf approaches #2583 @​pawel-big-lebowski
    Adds support for config entries being provided by both YAML file and integration-specific configuration (SparkConf/FlinkConf). Allows each integration to have its own config entries.
  • Spark/Flink: job ownership facet #2533 @​pawel-big-lebowski
    Enables configuration entries specifying ownership of the job that will result in an OwnershipJobFacet being attached to job facets.
Changed
  • Java: sync Kinesis partitionKey format with Kafka implementation #2620 @​dolfinus
    Changes the format of Kinesis partitionKey from {jobNamespace}:{jobName} to run:{jobNamespace}/{jobName} to match the Kafka transport implementation.
Fixed
  • Python: make load_config return an empty dict instead of None when file empty #2596 @​kacpermuda
    utils.load_config() now returns an empty dict instead of None in the case of an empty file to prevent an OpenLineageClient crash.
  • Java: render lombok-generated methods in javadoc #2614 @​dolfinus
    Fixes rendering of javadoc for methods generated by lombok annotations by adding a delombok step.
  • Spark/Snowflake: parse NPE when query option is used and table is empty #2599 @​mobuchowski
    Fixes NPE when using query option when reading from Snowflake.

v1.12.0

Compare Source

Added
  • Airflow: add lineage_job_namespace and lineage_job_name macros #2582 @​dolfinus
    Adds new Airflow macros lineage_job_namespace(), lineage_job_name(task) that return an Airflow namespace and Airflow job name, respectively.
  • Spec: allow nested struct fields in SchemaDatasetFacet #2548 @​dolfinus
    Allows nested fields support to SchemaDatasetFacet.
Fixed
  • Airflow: fix format returned by airflow.macros.lineage_parent_id #2578 @​blacklight
    Fixes the run format returned by the lineage_parent_id Airflow macro and simplifies the format of the lineage_parent_id and lineage_run_id macros.
  • Dbt: propagate the dbt return code also when no OpenLineage events are emitted #2591 @​blacklight
    dbt-ol now propagates the exit code of the underlying dbt process even if no lineage events are emitted.
  • Dagster: limit Dagster version to 1.6.9 #2579 @​JDarDagran
    Adds an upper limit on supported versions of Dagster as the integration is no longer actively maintained and recent releases introduce breaking changes.
  • Java: make sure string isn't empty to prevent going out of bounds #2585 @​harels
    String lookup was not accounting for empty strings and causing a java.lang.StringIndexOutOfBoundsException.
  • Java: fix javadoc #2624 @​pawel-big-lebowski
    Improves developer experience by fixing issues resulting in warnings on build.
  • Python: fix missing pkg_resources module on Python 3.12 #2572 @​dolfinus
    Removes pkg_resources dependency and replaces it with the packaging lib.
  • Spark: use HashSet in column-level lineage instead of iterating through LinkedList #2584 @​mobuchowski
    Takes advantage of performance gains available from using HashSet for collection.

v1.11.3

Compare Source

Added
  • Common: add support for SCRIPT-type jobs in BigQuery #2564 @​kacpermuda
    In the case of SCRIPT-type jobs in BigQuery, no lineage was being extracted because the SCRIPT job had no lineage information - it only spawned child jobs that had that information. With this change, the integration extracts lineage information from child jobs when dealing with SCRIPT-type jobs.
  • Spark: support for built-in lineage extraction #2272 @​pawel-big-lebowski
    This PR adds a spark-interfaces-scala package that allows lineage extraction to be implemented within Spark extensions (Iceberg, Delta, GCS, etc.). The Openlineage integration, when traversing the query plan, verifies if nodes implement defined interfaces. If so, interface methods are used to extract lineage. Refer to the README for more details.
  • Spark/Java: add support for Micrometer metrics #2496 @​mobuchowski
    Adds a mechanism for forwarding metrics to any Micrometer-compatible implementation. Included: MeterRegistryFactory, MicrometerProvider, StatsDMetricsBuilder, metrics config in OpenLineage config, and a Java client implementation.
  • Spark: add support for telemetry mechanism #2528 @​mobuchowski
    Adds timers, counters and additional instrumentation in order to implement Micrometer metrics collection.
  • Spark: support query option on table read #2556 @​mobuchowski
    Adds support for the Spark-BigQuery connector's query input option, which executes a query directly on BigQuery, storing the result in an intermediate dataset, bypassing Spark's computation layer. Due to this, the lineage is retrieved using the SQL parser, similarly to JDBCRelation.
  • Spark: change SparkPropertyFacetBuilder to support recording Spark runtime #2523 @​Ruihua98
    Modifies SparkPropertyFacetBuilder to capture the RuntimeConfig of the Spark session because the existing SparkPropertyFacet can only capture the static config of the Spark context. This facet will be added in both RDD-related and SQL-related runs.
  • Spec: add fileCount to dataset stat facets #2562 @​dolfinus
    Adds a fileCount field to DataQualityMetricsInputDatasetFacet and OutputStatisticsOutputDatasetFacet specification.
Fixed
  • dbt: dbt-ol should transparently exit with the same exit code as the child dbt process #2560 @​blacklight
    Makes dbt-ol transparently exit with the same exit code as the child dbt process.
  • Flink: disable module metadata generation #2531 @​HuangZhenQiu
    Disables the module metadata generation for Flink to fix the problem of having gradle dependencies to submodules within openlineage-flink.jar.
  • Flink: fixes to version 1.19 #2507 @​pawel-big-lebowski
    Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven pom dependency on subprojects.
  • Python: small improvements to .emit() method logging & annotations #2539 @​dolfinus
    Updates OpenLineage.emit debug messages and annotations.
  • SQL: show error message when OpenLineageSql cannot find native library #2547 @​dolfinus
    When the OpenLineageSql class could not load a native library, if returned None for all operations. But because the error message was suppressed, the user could not determine the reason.
  • SQL: update code to conform to upstream sqlparser-rs changes #2510 @​mobuchowski
    Includes tests and cosmetic improvements.
  • Spark: fix access to active Spark session #2535 @​pawel-big-lebowski
    Changes behavior so IllegalStateException is always caught when accessing SparkSession.
  • Spark: fix Databricks environment #2537 @​pawel-big-lebowski
    Fixes the ClassNotFoundError occurring on Databricks runtime and extends the integration test to verify DatabricksEnvironmentFacet.
  • Spark: fixed memory leak in JobMetricsHolder #2565 @​d-m-h
    The JobMetricsHolder#cleanUp(int) method now correctly purges unneeded state from both maps.
  • Spark: fixed memory leak in UnknownEntryFacetListener #2557 @​pawel-big-lebowski
    Prevents storing the state when a facet is disabled, purging the state after populating run facets.
  • Spark: fix parsing JDBCOptions(table=...) containing subquery #2546 @​dolfinus
    Prevents openlineage-spark from producing datasets with names like database.(select * from table) for JDBC sources.
  • Spark/Snowflake: support query option via SQL parser #2563 @​mobuchowski
    When a Snowflake job is bypassing Spark's computation layer, now the SQL parser will be used to get the lineage.
  • Spark: always catch IllegalStateException when accessing SparkSession #2535 @​pawel-big-lebowski
    IllegalStateException was not being caught.

v1.10.2

Compare Source

Added
  • Dagster: add new provider for version 1.6.10 #2518 @​JDarDagran
    Adds the new provider required by the latest version of Dagster.
  • Flink: support lineage for a hybrid source #2491 @​HuangZhenQiu
    Adds support for hybrid source lineage for users of Kafka and Iceberg sources in backfill usecases.
  • Flink: improve Cassandra lineage metadata #2479 @​HuangZhenQiu
    Cassandra cluster info to be used as the dataset namespace, and the keyspace to be combined with the table name as the dataset name.
  • Flink: bump Flink JDBC connector version #2472 @​HuangZhenQiu
    Bumps the Flink JDBC connector version to 3.1.2-1.18 for Flink 1.18.
  • Java: add a OpenLineageClientUtils#loadOpenLineageJson(InputStream) and change OpenLineageClientUtils#loadOpenLineageYaml(InputStream) methods #2490 @​d-m-h
    This improves the explicitness of the methods. Previously, loadOpenLineageYaml(InputStream) wanted the InputStream to contain bytes that represented JSON.
  • Java: add info from the HTTP response to the client exception #2486 @​davidjgoss
    Adds the status code and body as properties on the thrown exception when a non-success response is encountered in the HTTP transport.
  • Python: add support for MSK IAM authentication with a new transport #2478 @​mattiabertorello
    Eases publication of events to MSK with IAM authentication.
Removed
  • Airflow: remove redundant information from facets #2524 @​kacpermuda
    Refines the operator's attribute inclusion logic in facets to include only those known to be important or compact, ensuring that custom operator attributes with substantial data do not inflate the event size.
Fixed
  • Airflow: proceed without rendering templates if task_instance copy fails #2492 @​kacpermuda
    Airflow will now proceed without rendering templates if task_instance copy fails in listener.on_task_instance_running.
  • Spark: fix the HttpTransport timeout #2475 @​pawel-big-lebowski
    The existing timeout config parameter is ambiguous: implementation treats the value as double in seconds, although the documentation claims it's milliseconds. A new config param timeoutInMillis has been added. the Existing timeout has been removed from docs and will be deprecated in 1.13.
  • Spark: prevent NPE if the context is null #2515 @​pawel-big-lebowski
    Adds a check for a null context before executing end(jobEnd).
  • Flink: fix class not found issue for Cassandra #2507 @​pawel-big-lebowski
    Fixes the class not found issue when checking for Cassandra classes. Also fixes the Maven POM dependency on subprojects.
  • Flink: refine the JDBC table name #2512 @​HuangZhenQiu
    Enables the JDBC table name with a schema prefix.
  • Flink: fix JDBC dataset naming #2508 @​pawel-big-lebowski
    For JDBC, the Flink integration is not adjusted to the Openlineage naming convention. There is code that extracts the dataset namespace/name from the JDBC connection url, but it's in the Spark integration. As a solution, this code has to be extracted into the Java client and reused by the Spark and Flink integrations.
  • Flink: fix failure due to missing Cassandra classes #2507 @​pawel-big-lebowski
    Flink is failing when no Cassandra classes are present on the class path. This is happening because of CassandraUtils class which has a static hasClasses method, but it imports Cassandra-related classes in the header. Also, the Flink subproject contains an unnecessary maven-publish plugin.
  • Flink: fix release runtime dependencies #2504 @​HuangZhenQiu
    The shadow jar of Flink is not minimized, so some internal jars are listed as runtime dependences. This removes them from the final pom.xml file in the Flink module.
  • Spec: improve Cassandra lineage metadata #2479 @​HuangZhenQiu
    Following the namespace definition, we should use cassandra://host:port.

Configuration

📅 Schedule: Branch creation - "every 3 months on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

Copy link

netlify bot commented Apr 1, 2024

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
🔨 Latest commit f260c68
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/662aae99f142e900088d990f

Copy link

codecov bot commented Apr 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.47%. Comparing base (c553401) to head (f260c68).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2789   +/-   ##
=========================================
  Coverage     84.47%   84.47%           
  Complexity     1429     1429           
=========================================
  Files           251      251           
  Lines          6460     6460           
  Branches        299      299           
=========================================
  Hits           5457     5457           
  Misses          850      850           
  Partials        153      153           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@renovate renovate bot force-pushed the renovate/openlineageversion branch 2 times, most recently from cede554 to d58a172 Compare April 5, 2024 02:26
@renovate renovate bot changed the title fix(deps): update dependency io.openlineage:openlineage-java to v1.10.2 fix(deps): update dependency io.openlineage:openlineage-java to v1.11.3 Apr 5, 2024
@renovate renovate bot force-pushed the renovate/openlineageversion branch from d58a172 to 540e5f3 Compare April 9, 2024 16:12
@renovate renovate bot changed the title fix(deps): update dependency io.openlineage:openlineage-java to v1.11.3 fix(deps): update dependency io.openlineage:openlineage-java to v1.12.0 Apr 9, 2024
@renovate renovate bot force-pushed the renovate/openlineageversion branch 2 times, most recently from 057da1e to 8e086a0 Compare April 17, 2024 18:22
@renovate renovate bot changed the title fix(deps): update dependency io.openlineage:openlineage-java to v1.12.0 Update dependency io.openlineage:openlineage-java to v1.12.0 Apr 17, 2024
@renovate renovate bot force-pushed the renovate/openlineageversion branch 2 times, most recently from a1c0c68 to e105f5d Compare April 22, 2024 23:10
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@renovate renovate bot force-pushed the renovate/openlineageversion branch from e105f5d to f260c68 Compare April 25, 2024 19:27
@renovate renovate bot changed the title Update dependency io.openlineage:openlineage-java to v1.12.0 Update dependency io.openlineage:openlineage-java to v1.13.1 Apr 25, 2024
@wslulciuc wslulciuc merged commit 7b20098 into main Apr 26, 2024
17 checks passed
@wslulciuc wslulciuc deleted the renovate/openlineageversion branch April 26, 2024 17:42
jonathanpmoraes pushed a commit to nubank/NuMarquez that referenced this pull request Feb 6, 2025
…Project#2789)

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant