Reduce log flood in Python PostCommit flink task #23635

Abacn · 2022-10-14T02:03:27Z

Fixes #23631

Flink runner support log_level_overrides
FlinkPortableRunner respect flinkConfDir pipeline option
Decrease Flink Runner Log Spam for Python PostCommit

Please add a meaningful description for your change here

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

codecov · 2022-10-14T02:48:46Z

Codecov Report

Merging #23635 (2a8077b) into master (00e5525) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #23635      +/-   ##
==========================================
- Coverage   73.35%   73.35%   -0.01%     
==========================================
  Files         719      719              
  Lines       95799    95807       +8     
==========================================
+ Hits        70276    70281       +5     
- Misses      24211    24214       +3     
  Partials     1312     1312

Flag	Coverage Δ
python	`83.05% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
.../python/apache_beam/testing/test_stream_service.py	`88.09% <0.00%> (-4.77%)`	⬇️
...che_beam/runners/interactive/interactive_runner.py	`90.50% <0.00%> (-1.27%)`	⬇️
...apache_beam/typehints/native_type_compatibility.py	`85.52% <0.00%> (-1.06%)`	⬇️
...thon/apache_beam/runners/worker/sdk_worker_main.py	`77.71% <0.00%> (-0.78%)`	⬇️
.../python/apache_beam/typehints/trivial_inference.py	`96.15% <0.00%> (-0.27%)`	⬇️
sdks/python/apache_beam/typehints/opcodes.py	`85.35% <0.00%> (-0.26%)`	⬇️
...on/apache_beam/runners/dataflow/dataflow_runner.py	`80.80% <0.00%> (-0.09%)`	⬇️
sdks/python/apache_beam/typehints/typehints.py	`93.37% <0.00%> (-0.06%)`	⬇️
setup.py	`0.00% <0.00%> (ø)`
sdks/python/apache_beam/typehints/row_type.py	`100.00% <0.00%> (ø)`
... and 9 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Abacn · 2022-10-17T16:12:44Z

runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java

@@ -73,7 +74,7 @@ public static ExecutionEnvironment createBatchExecutionEnvironment(FlinkPipeline
  static ExecutionEnvironment createBatchExecutionEnvironment(
      FlinkPipelineOptions options, List<String> filesToStage, @Nullable String confDir) {

-    LOG.info("Creating a Batch Execution Environment.");
+    LOG.info("Creating a Batch Execution Environment with config {}.", confDir);


Most log spams comes from org.apache.flink.runtime during flink minicluster initiating. Tested that if assign confDir to a directory that has org.apache.flink.runtime WARN logging, it works. However, currently this parameter is always null (seems like beam not respecting parsed in flinkConfDir)

Abacn · 2022-10-18T04:10:43Z

Run Python 3.8 PostCommit

…nner * Flink runner support log_level_overrides * FlinkPortableRunner respect flinkConfDir pipeline option * Enable backslashed quotes in integration test pipeline options * Decrease Flink Runner Log Spam for Python PostCommit

Abacn · 2022-10-18T04:37:00Z

Run Python 3.8 PostCommit

Abacn · 2022-10-18T13:05:54Z

Logs reduced from ~60 MB to 10 MB: https://ci-beam.apache.org/job/beam_PostCommit_Python38_PR/653/console

Abacn · 2022-10-18T13:08:47Z

Run Java PreCommit

Abacn · 2022-10-18T13:09:04Z

Run Typescript PreCommit

Abacn · 2022-10-18T13:09:16Z

Run SQL_Java17 PreCommit

Abacn · 2022-10-18T13:09:29Z

Run Python PreCommit

Abacn · 2022-10-18T13:09:43Z

Run Java_Kafka_IO_Direct PreCommit

Abacn · 2022-10-18T13:10:59Z

R: @tvalentyn

github-actions · 2022-10-18T14:28:39Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Abacn · 2022-10-18T16:01:22Z

Run Java_Spark3_Versions PreCommit

Abacn · 2022-10-18T16:46:20Z

...-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java

+   * Configure log manager's default log level and log level overrides from the sdk harness options,
+   * and return the list of configured loggers.
+   */
+  public static List<java.util.logging.Logger> getConfiguredLoggerFromOptions(


Was trying to find the best place to put these piece of code that sets log levels. It is now used by both :sdks:java:harness and :runners:flink. Seems like this module is a good place as there are other static public methods related to environments here.

is SdkHarnessOptions.java accessible from :runners:flink ?

I see you do import that file in FlinkPipelineRunner.java so it must be

have you considered it as a home for this helper?

SdkHarnessOptions.java is in sdks:java:core and yes it is accessible. Considered it and I was not sure if it is a good pattern to include some code that have side effect on the environment in the option class. If it sounds reasonable could move it in.

Took another look, agree that it is looks somewhat unnatural to modify the global context, especially the root logger. We can ask for second opinion on this from someone who does more work on Java SDK.
@kennknowles might have good advice.

tvalentyn · 2022-10-18T20:28:37Z

runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java

    this.filesToStage = filesToStage;
  }

  @Override
  public PortablePipelineResult run(final Pipeline pipeline, JobInfo jobInfo) throws Exception {
    MetricsEnvironment.setMetricsSupported(false);

+    // Apply log levels settings at the beginning of pipeline run


do we need a similar change in https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java ? Can be a separate PR.

Yeah, not yet put it in because haven't tested it on spark runner. Entered #23713

Abacn · 2022-10-19T00:26:46Z

Java precommit fail due to flakes: #21480 #21714

Abacn · 2022-10-19T00:26:58Z

Run Java_Spark3_Versions PreCommit

tvalentyn · 2022-10-19T00:47:48Z

sdks/java/core/src/main/java/org/apache/beam/sdk/options/SdkHarnessOptions.java

+
+    // Use the passed in logging options to configure the various logger levels.
+    if (loggingOptions.getDefaultSdkHarnessLogLevel() != null) {
+      rootLogger.setLevel(


from reading this method it's not clear why rootLogger is not included into the configuredLoggers. should we modify the root logger separately, for example where this method is called?

tvalentyn · 2022-10-19T00:56:44Z

...-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java

+   * Configure log manager's default log level and log level overrides from the sdk harness options,
+   * and return the list of configured loggers.
+   */
+  public static List<java.util.logging.Logger> getConfiguredLoggerFromOptions(


Took another look, agree that it is looks somewhat unnatural to modify the global context, especially the root logger. We can ask for second opinion on this from someone who does more work on Java SDK.
@kennknowles might have good advice.

kennknowles · 2022-10-19T02:58:46Z

runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java

@@ -67,14 +68,18 @@ public class FlinkPipelineRunner implements PortablePipelineRunner {
  public FlinkPipelineRunner(
      FlinkPipelineOptions pipelineOptions, @Nullable String confDir, List<String> filesToStage) {
    this.pipelineOptions = pipelineOptions;
-    this.confDir = confDir;
+    // confDir takes precedence than pipelineOptions.getFlinkConfDir


Put this in the javadoc. A user needs to know.

Thanks, done. This is a fix, originally pipelineOptions's flinkConfDir did not take effect at all.

Shouldn't pipeline option take precedence though?

At a first glance I understood confDir parameter as an override to pipelineOptions.getFlinkConfDir in FlinkPipelineRunner's constructor.

One use case is that the command lines argument of

beam/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java

Line 160 in 3edff6a

public static void main(String[] args) throws Exception {

could pass to FlinkPipelineRunnerConfiguration and finally appears as confDir parameter here. According to the comment of

beam/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java

Line 213 in 3edff6a

+ "These properties will be set to all jobs submitted to Flink and take precedence "

I understand that if confDir is set it should take precedence.

Found origins of this flag:
From: https://issues.apache.org/jira/browse/BEAM-14492

Sometimes it is necessary to be able to set any flink option via PipelineOptions to the runner - mostly when we submit job from vanilla Java, not being run via flink run

#17715

I think the intent is consistent with the usage that if a user specifies this via pipeline options, it should take precedence over predetermined runner configuration.
cc'ing @je-ik just in case and FYI since you mention that originally pipelineOptions's flinkConfDir did not take effect at all.

Thanks! That makes sense to me. confDir is per server and pipelineOptions is per pipeline. Will change.

Thanks, please update the Javadoc as well.

I observe the option does not take effect when invoking a Python pipeline and I targeted the code here. It may be related to What @kennknowles mentioned below there is a "non-portable mode" and also a "portable mode" having different entry points for environment settings.

Running Python pipelines on Flink Runner should always be a portable mode.

I see, then this is in portable mode confDir settings in pipeline options did not take effect.

kennknowles · 2022-10-19T03:04:07Z

sdks/python/test-suites/portable/common.gradle

@@ -304,6 +304,13 @@ project.tasks.register("postCommitPy${pythonVersionSuffix}IT") {
        "--environment_type=LOOPBACK",
        "--temp_location=gs://temp-storage-for-end-to-end-tests/temp-it",
        "--flink_job_server_jar=${project(":runners:flink:${latestFlinkVersion}:job-server").shadowJar.archivePath}",
+        '--sdk_harness_log_level_overrides=' +
+            // suppress info level flink.runtime log flood
+            '{\\"org.apache.flink.runtime\\":\\"WARN\\",' +


This basically looks funny to me because the SDK harness is only actually used in portable mode, where there is no org.apache.flink namespace.

So the thing we are changing is when it is run in non-portable mode, and there is not actually any SDK harness.

I don't know a good solution. It is just a naming thing. Probably good to have a single flag that works now and also later. Is there not already a --log_level_overrides ? I guess then it is ambiguous whether you are applying it to the runner or to the SDK harness.

I have no solution. This is basically OK with me, but something about it is not perfectly clean.

Maybe @lukecwik has an opinion about which flags should control which log levels. I know that today we have

Flag for the Dataflow worker

Flag for the SDK harness

So unfortunately neither are a good choice for sharing with other runners.

I don't know if any other runner has any flag at all. My issue is that "SDK harness" is a pretty weird name for a concept that most users really don't or shouldn't care about or know exists most of the time.

Yes, we have two flags about log level overrides, another one is in DataflowWorkerLoggingOptions but this option class is deprecated. So I just use the existing sdk_harness_log_level_overrides unless we want to create another option flag. I agree --log_level_overrides would make generic sense.

If we decide to create new options for log overrides I think it would be under another Issue.

Abacn · 2022-10-19T18:52:20Z

Python PreCommit fails broken pipe on newly added Py310 test

Failed
pytest.internal (from preCommitIT-df-py310)

Failing for the past 1 build (Since #25360 )
Took 0 ms.
Error Message
internal error

Abacn · 2022-10-19T18:53:04Z

Java PreCommit fails known flakes

org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testSplit
org.apache.beam.sdk.io.pulsar.PulsarIOTest.classMethod

tvalentyn · 2022-10-19T19:02:56Z

Python PreCommit fails broken pipe on newly added Py310 test

It's being worked on: #23734

Abacn · 2022-10-19T20:48:50Z

wow all Jenkins tests passed once. Rare

AnandInguva · 2022-10-19T21:07:19Z

wow all Jenkins tests passed once. Rare

Is the PreCommit failure on Python 3.10 flaky?

Abacn · 2022-10-19T21:48:16Z

It is considered flaky if not fixed in between my second last (failure) and the last push (suceeded).

tvalentyn · 2022-10-20T02:07:19Z

Thanks, @Abacn !

github-actions bot added flink java python runners labels Oct 14, 2022

Abacn force-pushed the postcommitlogspam branch 2 times, most recently from ce9a0a0 to 3c40cda Compare October 14, 2022 02:47

github-actions bot added the build label Oct 14, 2022

Abacn commented Oct 17, 2022

View reviewed changes

Abacn force-pushed the postcommitlogspam branch 2 times, most recently from 234abe6 to 4135f1d Compare October 17, 2022 22:29

github-actions bot added core and removed build labels Oct 17, 2022

Abacn force-pushed the postcommitlogspam branch 2 times, most recently from 1f55236 to d6eb53f Compare October 17, 2022 23:33

github-actions bot added the build label Oct 17, 2022

Abacn force-pushed the postcommitlogspam branch 3 times, most recently from 7523c2b to 29e396b Compare October 18, 2022 04:00

Abacn changed the title ~~Portable log overrides PoC~~ Reduce log flood in Python PostCommit flink task Oct 18, 2022

Abacn force-pushed the postcommitlogspam branch from 29e396b to f15c3e2 Compare October 18, 2022 04:36

fix name collision warning logs still show

ae44880

Abacn commented Oct 18, 2022

View reviewed changes

tvalentyn reviewed Oct 18, 2022

View reviewed changes

Abacn mentioned this pull request Oct 18, 2022

[Feature Request]: Support default log level and log level overrides for runners based on translators #23713

Open

Move getConfiguredLoggerFromOptions to SdkHarnessOptions

3edff6a

Abacn force-pushed the postcommitlogspam branch from 5593f5c to 3edff6a Compare October 18, 2022 22:45

tvalentyn reviewed Oct 19, 2022

View reviewed changes

github-actions bot removed the core label Oct 19, 2022

kennknowles reviewed Oct 19, 2022

View reviewed changes

Document flinkConfDir precedence

afe5330

Abacn force-pushed the postcommitlogspam branch from cb81467 to afe5330 Compare October 19, 2022 16:53

pipelineOptions.getFlinkConfDir takes precedence against confDir

2a8077b

Abacn force-pushed the postcommitlogspam branch from 248173b to 2a8077b Compare October 19, 2022 19:16

tvalentyn merged commit 72e27f4 into apache:master Oct 20, 2022

Abacn deleted the postcommitlogspam branch October 20, 2022 03:03

Abacn mentioned this pull request Nov 17, 2022

[collection] environment do not respect flink-conf.yaml. #20709

Open

Abacn mentioned this pull request Dec 13, 2022

[Feature Request]: Support log level override for multi-language pipeline #24647

Closed

15 tasks

Abacn mentioned this pull request Aug 15, 2023

Port SdkHarness log options to Dataflow legacy runner #28010

Merged

3 tasks

Reduce log flood in Python PostCommit flink task #23635

Reduce log flood in Python PostCommit flink task #23635

Conversation

Abacn commented Oct 14, 2022 • edited Loading

GitHub Actions Tests Status (on master branch)

codecov bot commented Oct 14, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Abacn commented Oct 18, 2022

Abacn commented Oct 18, 2022

Abacn commented Oct 18, 2022

Abacn commented Oct 18, 2022

Abacn commented Oct 18, 2022

Abacn commented Oct 18, 2022

Abacn commented Oct 18, 2022

Abacn commented Oct 18, 2022

Abacn commented Oct 18, 2022

github-actions bot commented Oct 18, 2022

Abacn commented Oct 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Abacn Oct 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Abacn commented Oct 19, 2022

Abacn commented Oct 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Abacn Oct 19, 2022 • edited Loading

Choose a reason for hiding this comment

Abacn Oct 19, 2022 • edited Loading

Choose a reason for hiding this comment

Abacn commented Oct 19, 2022 • edited Loading

Abacn commented Oct 19, 2022

tvalentyn commented Oct 19, 2022

Abacn commented Oct 19, 2022

AnandInguva commented Oct 19, 2022

Abacn commented Oct 19, 2022

tvalentyn commented Oct 20, 2022

Abacn commented Oct 14, 2022 •

edited

Loading

codecov bot commented Oct 14, 2022 •

edited

Loading

Abacn Oct 18, 2022 •

edited

Loading

Abacn Oct 19, 2022 •

edited

Loading

Abacn Oct 19, 2022 •

edited

Loading

Abacn commented Oct 19, 2022 •

edited

Loading