Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air] air_benchmark_xgboost_cpu_10.aws failing #37827

Closed
rickyyx opened this issue Jul 26, 2023 · 10 comments
Closed

[air] air_benchmark_xgboost_cpu_10.aws failing #37827

rickyyx opened this issue Jul 26, 2023 · 10 comments
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical

Comments

@rickyyx
Copy link
Contributor

rickyyx commented Jul 26, 2023

What happened + What you expected to happen

https://buildkite.com/ray-project/release-tests-branch/builds/1991#0189935f-278b-4d64-8da7-adaa812fc4d4

Versions / Dependencies

2.6.2

Reproduction script

https://buildkite.com/ray-project/release-tests-branch/builds/1991#0189935f-278b-4d64-8da7-adaa812fc4d4

Issue Severity

None

@rickyyx rickyyx added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) release-blocker P0 Issue that blocks the release P0 Issues that should be fixed in short order and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 26, 2023
@xwjiang2010
Copy link
Contributor

Seeing

Traceback (most recent call last):
  File "workloads/xgboost_benchmark.py", line 59, in run
    super(MyProcess, self).run()
  File "/home/ray/anaconda3/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "workloads/xgboost_benchmark.py", line 86, in run_xgboost_training
    ds = data.read_parquet(data_path)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/read_api.py", line 588, in read_parquet
    return read_datasource(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/read_api.py", line 344, in read_datasource
    ) = ray.get(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/worker.py", line 2520, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ImportError): ray::_get_read_tasks() (pid=961, ip=10.0.34.82)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/read_api.py", line 1928, in _get_read_tasks
    reader = ds.create_reader(**kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/parquet_datasource.py", line 170, in create_reader
    return _ParquetDatasourceReader(**kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/parquet_datasource.py", line 187, in __init__
    import pyarrow.parquet as pq
  File "/home/ray/anaconda3/lib/python3.8/site-packages/pyarrow/parquet/__init__.py", line 20, in <module>
    from .core import *
  File "/home/ray/anaconda3/lib/python3.8/site-packages/pyarrow/parquet/core.py", line 38, in <module>
    from pyarrow._parquet import (ParquetReader, Statistics,  # noqa
ImportError: cannot import name 'FileEncryptionProperties' from 'pyarrow._parquet' (/home/ray/anaconda3/lib/python3.8/site-packages/pyarrow/_parquet.cpython-38-x86_64-linux-gnu.so)

link

Same thing happened here for 2.6.1 nightly run. @can-anyscale I assume that you already looked at the cause of this failure in your last run? Maybe you can shed some light here?

I also tried cloning the workspace and import pyarrow.parquet as pq is working when manually tested.

@xwjiang2010
Copy link
Contributor

dropping release blocker as the same failure happened for 2.6.1 but would still like to understand the cause.

@xwjiang2010 xwjiang2010 added P2 Important issue, but not time-critical and removed release-blocker P0 Issue that blocks the release P0 Issues that should be fixed in short order labels Jul 26, 2023
@can-anyscale
Copy link
Collaborator

@xwjiang2010 I didn't look at this in 2.6.1, due to its urgency we decided to ship 2.6.1 with this test failure. The test passed in 2.6.0 so probably a recent regression.

@rickyyx
Copy link
Contributor Author

rickyyx commented Jul 26, 2023

2.6.1 is unlikely to have any changes related to this though? These are all the commits in 2.6.1 (not in 2.6.0):

d68bf04 (tag: ray-2.6.1, rickyyx/releases/2.6.2, releases/2.6.2) Update version to 2.6.1 (#37679)
cdbee21 [core][autoscaler] Fix env variable overwrite not able to be used if command itself uses the env (#37675) (#37676)
62b4a0a [serve] Cherry-pick Serve enum to_proto fixes for Python 3.11 (#37660)
f7572b5 (releases/2.6.0) [DOC] Added in new CSAT widget (#37351) (#37487)
e9f1dc4 [air][doc] Update docs to reflect head node syncing deprecation (#37475) (#37568)

@rickyyx
Copy link
Contributor Author

rickyyx commented Jul 26, 2023

@can-anyscale
Copy link
Collaborator

Oh I remember why, that test passed on its BYOD version in 2.6.0 so we determined it's ok.

Here a BYOD version run of that test for 2.6.2: https://buildkite.com/ray-project/release-tests-branch/builds/1993

@rickyyx
Copy link
Contributor Author

rickyyx commented Jul 26, 2023

https://buildkite.com/ray-project/release-tests-branch/builds/1993 seems to be on master though?

@can-anyscale
Copy link
Collaborator

can-anyscale commented Jul 26, 2023

@rickyyx right, since we need to pick up the BYOD version of the test in master (the test is non-BYOD in release branch). It has these two env set:

  • BRANCH_TO_TEST="releases/2.6.2"
  • COMMIT_TO_TEST=65017aaeb362769e5c19bb610f4da6d10b69bfd2

to test against ray docker image built for 2.6.2

@xwjiang2010
Copy link
Contributor

Test itself passed but there is an error with code 79

signing off the release test from ml team's perspective.

@xwjiang2010 xwjiang2010 removed their assignment Jul 26, 2023
@can-anyscale
Copy link
Collaborator

Ah yes error code 79 is my fault, fixing in #37842. The test is fine.

@rickyyx rickyyx closed this as completed Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

3 participants