Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow sample success is not checked by sample tests #2302

Closed
Ark-kun opened this issue Oct 4, 2019 · 9 comments
Closed

Dataflow sample success is not checked by sample tests #2302

Ark-kun opened this issue Oct 4, 2019 · 9 comments

Comments

@Ark-kun
Copy link
Contributor

Ark-kun commented Oct 4, 2019

I see that the dataflow notebook fails to run, but the test succeeds:

INFO:root:subprocess:     "message": "Dataflow API has not been used in project 363997316495 before or it is disabled. Enable it by visiting https://console.cloud.google.com/apis/api/dataflow.googleapis.com/overview?project=363997316495 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/ml/kfp_component/launcher/__main__.py", line 34, in <module>
    main()
  File "/ml/kfp_component/launcher/__main__.py", line 31, in main
    launch(args.file_or_module, args.args)
  File "kfp_component/launcher/launcher.py", line 45, in launch
    return fire.Fire(module, command=args, name=module.__name__)
  File "/usr/local/lib/python2.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/usr/local/lib/python2.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/usr/local/lib/python2.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "kfp_component/google/dataflow/_launch_python.py", line 75, in launch_python
    sub_process.wait_and_check()
  File "kfp_component/google/dataflow/_process.py", line 40, in wait_and_check
    raise subprocess.CalledProcessError(return_code, self._cmd)
subprocess.CalledProcessError: Command '['python2', '-u', '/tmp/tmpP_xu93/wc.py', '--runner', 'dataflow', '--project', 'ml-pipeline-test', '--staging_location', 'gs://ml-pipeline-test/840668066d16834c6fcceec9d310cb7ab46c541d/sample_test/a018eb542547d95a7c522151ab26e7b7', '--temp_location', 'gs://ml-pipeline-test/840668066d16834c6fcceec9d310cb7ab46c541d/sample_test/a018eb542547d95a7c522151ab26e7b7', '--output', 'gs://ml-pipeline-test/840668066d16834c6fcceec9d310cb7ab46c541d/sample_test/wc/wordcount.out']' returned non-zero exit status 1

https://pantheon.corp.google.com/logs/viewer?interval=NO_LIMIT&project=ml-pipeline-test&organizationId=433637338589&minLogLevel=0&expandAll=false&timestamp=2019-10-03T22:29:54.674000000Z&customFacets=&limitCustomFacetWidth=true&advancedFilter=resource.type%3D%22container%22%0Aresource.labels.cluster_name%3D%22sample-8406680-7464%22%0Aresource.labels.namespace_id%3D%22kubeflow%22%0Aresource.labels.project_id%3D%22ml-pipeline-test%22%0Aresource.labels.zone:%22us-east1-b%22%0Aresource.labels.container_name%3D%22main%22%0Aresource.labels.pod_id%3D%22dataflow-launch-python-pipeline-n979f-4028504103%22&scrollTimestamp=2019-10-03T00:39:04.077845218Z

@gaoning777
Copy link
Contributor

Have you retried it after enabling the dataflow API?

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Oct 4, 2019

Have you retried it after enabling the dataflow API?

It's from the sample test logs. The issue is not that it fails, but that the failure is not detected.

@gaoning777
Copy link
Contributor

Could you paste the link to the prow system?

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Oct 8, 2019

Could you paste the link to the prow system?

https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_pipelines/2241/kubeflow-pipeline-sample-test/1179551516474740736
Looks like the test timed out. Maybe I'm wrong about the test succeeding.

@numerology
Copy link

Interesting... The dataflow API is not enabled before. And in the sample test the associated pod failed but the overall test passed, see
https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubeflow_pipelines/2220/kubeflow-pipeline-sample-test/1183783508208783361/build-log.txt

Trying to understand why such failure was not caught.

@numerology
Copy link

The reason is that, in dataflow.config.yaml run_pipeline was specified as False.
So the sample test infra won't check the running status of the pipeline.

@gaoning777
Copy link
Contributor

enabling the test check: #2387

@numerology
Copy link

/close
Fixed by #2387

@k8s-ci-robot
Copy link
Contributor

@numerology: Closing this issue.

In response to this:

/close
Fixed by #2387

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants