Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air] simple xgboost script swallowing error #29097

Closed
richardliaw opened this issue Oct 5, 2022 · 0 comments · Fixed by #29143
Closed

[air] simple xgboost script swallowing error #29097

richardliaw opened this issue Oct 5, 2022 · 0 comments · Fixed by #29143
Assignees
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release

Comments

@richardliaw
Copy link
Contributor

What happened + What you expected to happen

I am expecting some error message that tells me what I need to fix.

(base) ➜  learning-ray-private git:(main) ✗ python /Users/rliaw/dev/learning-ray-private/code/_test.py
Usage stats collection is disabled.
2022-10-05 15:41:07,920	INFO worker.py:1515 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
== Status ==
Current time: 2022-10-05 15:41:13 (running for 00:00:02.88)
Memory usage on this node: 24.8/64.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/16 CPUs, 0/0 GPUs, 0.0/37.03 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10
Number of trials: 2/2 (1 PENDING, 1 RUNNING)


(XGBoostTrainer pid=33826) /Users/rliaw/miniconda3/lib/python3.7/site-packages/xgboost_ray/main.py:423: UserWarning: `num_actors` in `ray_params` is smaller than 2 (1). XGBoost will NOT be distributed!
(XGBoostTrainer pid=33826)   f"`num_actors` in `ray_params` is smaller than 2 "
(XGBoostTrainer pid=33826) 2022-10-05 15:41:16,171	INFO main.py:980 -- [RayXGBoost] Created 1 new actors (1 total actors). Waiting until actors are ready for training.
(XGBoostTrainer pid=33831) /Users/rliaw/miniconda3/lib/python3.7/site-packages/xgboost_ray/main.py:423: UserWarning: `num_actors` in `ray_params` is smaller than 2 (1). XGBoost will NOT be distributed!
(XGBoostTrainer pid=33831)   f"`num_actors` in `ray_params` is smaller than 2 "
(XGBoostTrainer pid=33831) 2022-10-05 15:41:17,011	INFO main.py:980 -- [RayXGBoost] Created 1 new actors (1 total actors). Waiting until actors are ready for training.
== Status ==
Current time: 2022-10-05 15:41:18 (running for 00:00:07.90)
Memory usage on this node: 25.1/64.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 6.0/16 CPUs, 0/0 GPUs, 0.0/37.03 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10
Number of trials: 2/2 (2 RUNNING)


(_RemoteRayXGBoostActor pid=33848) 2022-10-05 15:41:19,165	WARNING __init__.py:192 -- DeprecationWarning: `ray.worker.get_resource_ids` is a private attribute and access will be removed in a future Ray version.
(XGBoostTrainer pid=33826) 2022-10-05 15:41:19,279	INFO main.py:1025 -- [RayXGBoost] Starting XGBoost training.
(_RemoteRayXGBoostActor pid=33848) [15:41:19] task [xgboost.ray]:140307860138704 got new rank 0
(XGBoostTrainer pid=33826) 2022-10-05 15:41:19,464	INFO elastic.py:155 -- Actor status: 1 alive, 0 dead (1 total)
2022-10-05 15:41:19,644	ERROR trial_runner.py:990 -- Trial XGBoostTrainer_cf40b_00000: Error processing event.
ray.exceptions.RayTaskError(RuntimeError): ray::_Inner.train() (pid=33826, ip=127.0.0.1, repr=XGBoostTrainer)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 353, in train
    raise skip_exceptions(e) from None
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/trainable/function_trainable.py", line 328, in entrypoint
    self._status_reporter.get_checkpoint(),
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/train/base_trainer.py", line 475, in _trainable_func
    super()._trainable_func(self._merged_config, reporter, checkpoint_dir)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/trainable/function_trainable.py", line 651, in _trainable_func
    output = fn()
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/train/base_trainer.py", line 390, in train_func
    trainer.training_loop()
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/train/gbdt_trainer.py", line 263, in training_loop
    **config,
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/train/xgboost/xgboost_trainer.py", line 84, in _train
    return xgboost_ray.train(**kwargs)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/xgboost_ray/main.py", line 1508, in train
    ) from exc
RuntimeError: A Ray actor died during training and the maximum number of retries (0) is exhausted.
The trial XGBoostTrainer_cf40b_00000 errored with parameters={'scaling_config': {'trainer_resources': None, 'num_workers': 1, 'use_gpu': False, 'resources_per_worker': {'CPU': 2, 'GPU': 0}, 'placement_strategy': 'PACK', '_max_cpu_fraction_per_node': None}, 'preprocessor': StandardScaler(columns=['X', 'Y']), 'params': {'objective': 'binary:logistic', 'tree_method': 'approx', 'eval_metric': ['logloss', 'error'], 'eta': 0.04707815901985832, 'subsample': 0.5116661725221274, 'max_depth': 4}}. Error file: /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10/XGBoostTrainer_cf40b_00000_0_eta=0.0471,max_depth=4,subsample=0.5117,preprocessor=StandardScaler_columns_X_Y,num_workers=1_2022-10-05_15-41-11/error.txt
(_RemoteRayXGBoostActor pid=33857) 2022-10-05 15:41:20,082	WARNING __init__.py:192 -- DeprecationWarning: `ray.worker.get_resource_ids` is a private attribute and access will be removed in a future Ray version.
(XGBoostTrainer pid=33831) 2022-10-05 15:41:20,195	INFO main.py:1025 -- [RayXGBoost] Starting XGBoost training.
(_RemoteRayXGBoostActor pid=33857) [15:41:20] task [xgboost.ray]:140228942076880 got new rank 0
Trial XGBoostTrainer_cf40b_00001 reported train-logloss=0.693147,train-error=0.5 with parameters={'scaling_config': {'trainer_resources': None, 'num_workers': 1, 'use_gpu': False, 'resources_per_worker': {'CPU': 2, 'GPU': 0}, 'placement_strategy': 'PACK', '_max_cpu_fraction_per_node': None}, 'preprocessor': MinMaxScaler(columns=['X', 'Y']), 'params': {'objective': 'binary:logistic', 'tree_method': 'approx', 'eval_metric': ['logloss', 'error'], 'eta': 0.00015355604372622395, 'subsample': 0.7226451783101197, 'max_depth': 1}}.
Trial XGBoostTrainer_cf40b_00001 reported train-logloss=0.693147,train-error=0.5,should_checkpoint=True with parameters={'scaling_config': {'trainer_resources': None, 'num_workers': 1, 'use_gpu': False, 'resources_per_worker': {'CPU': 2, 'GPU': 0}, 'placement_strategy': 'PACK', '_max_cpu_fraction_per_node': None}, 'preprocessor': MinMaxScaler(columns=['X', 'Y']), 'params': {'objective': 'binary:logistic', 'tree_method': 'approx', 'eval_metric': ['logloss', 'error'], 'eta': 0.00015355604372622395, 'subsample': 0.7226451783101197, 'max_depth': 1}}. This trial completed.
2022-10-05 15:41:22,222	INFO tensorboardx.py:270 -- Removed the following hyperparameter values when logging to tensorboard: {'preprocessor': MinMaxScaler(columns=['X', 'Y'])}
(XGBoostTrainer pid=33831) 2022-10-05 15:41:22,183	INFO main.py:1519 -- [RayXGBoost] Finished XGBoost training on training data with total N=2 in 5.20 seconds (1.99 pure XGBoost training time).
== Status ==
Current time: 2022-10-05 15:41:22 (running for 00:00:11.34)
Memory usage on this node: 24.6/64.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/37.03 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10
Number of trials: 2/2 (1 ERROR, 1 TERMINATED)
+----------------------------+------------+-----------------+--------------+--------------------+--------------------+----------------------+------------------------+--------+------------------+-----------------+---------------+
| Trial name                 | status     | loc             |   params/eta |   params/max_depth |   params/subsample | preprocessor         |   scaling_config/num_w |   iter |   total time (s) |   train-logloss |   train-error |
|                            |            |                 |              |                    |                    |                      |                 orkers |        |                  |                 |               |
|----------------------------+------------+-----------------+--------------+--------------------+--------------------+----------------------+------------------------+--------+------------------+-----------------+---------------|
| XGBoostTrainer_cf40b_00001 | TERMINATED | 127.0.0.1:33831 |  0.000153556 |                  1 |           0.722645 | MinMaxScaler(co_e790 |                      1 |     11 |          5.26433 |        0.693147 |           0.5 |
| XGBoostTrainer_cf40b_00000 | ERROR      | 127.0.0.1:33826 |  0.0470782   |                  4 |           0.511666 | StandardScaler(_ee50 |                      1 |        |                  |                 |               |
+----------------------------+------------+-----------------+--------------+--------------------+--------------------+----------------------+------------------------+--------+------------------+-----------------+---------------+
Number of errored trials: 1
+----------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                 |   # failures | error file                                                                                                                                                                                                           |
|----------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| XGBoostTrainer_cf40b_00000 |            1 | /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10/XGBoostTrainer_cf40b_00000_0_eta=0.0471,max_depth=4,subsample=0.5117,preprocessor=StandardScaler_columns_X_Y,num_workers=1_2022-10-05_15-41-11/error.txt |
+----------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

2022-10-05 15:41:22,341	ERROR tune.py:758 -- Trials did not complete: [XGBoostTrainer_cf40b_00000]
2022-10-05 15:41:22,341	INFO tune.py:763 -- Total run time: 11.56 seconds (11.33 seconds for the tuning loop).

Versions / Dependencies

python 3.9, mac, master

absl-py==0.15.0
aiobotocore==1.2.2
aiohttp==3.7.4.post0
aiohttp-cors==0.7.0
aiohttp-middlewares==1.1.0
aioitertools==0.7.1
aioredis==1.3.1
aiorwlock==1.3.0
aiosignal==1.2.0
alabaster==0.7.12
alembic==1.4.1
altair==4.2.0
analytics-python==1.4.0
antlr4-python3-runtime==4.8
anyscale==0.5.33
appdirs==1.4.4
appnope==0.1.0
asciimatics==1.12.0
asgiref==3.5.2
asn1crypto==1.2.0
astunparse==1.6.3
async-timeout==3.0.1
atari-py==0.2.6
attrs==19.3.0
autograd==1.4
autopep8==1.5.4
aws-parallelcluster==2.11.1
aws-sam-translator==1.26.0
aws-xray-sdk==2.6.0
ax-platform==0.1.14
azure-core==1.6.0
azure-storage-blob==12.3.2
Babel==2.8.0
backcall==0.2.0
bayesian-optimization==1.2.0
bcrypt==3.2.2
beautifulsoup4==4.8.2
better-exceptions==0.3.3
black==19.10b0
bleach==3.1.5
blessed==1.19.1
blessings==1.7
blis==0.7.5
blist==1.3.6
bokeh==2.2.1
boto==2.49.0
boto3==1.16.52
botocore==1.19.52
botorch==0.3.1
Box2D==2.3.10
cached-property==1.5.2
cachetools==4.1.0
catalogue==1.0.0
catboost==1.0.6
certifi==2021.10.8
cffi==1.15.1
cfn-lint==0.35.0
chardet @ file:///opt/concourse/worker/volumes/live/92aa0fea-cec8-43da-4d77-2574e78c5981/volume/chardet_1605303181342/work
clang==5.0
-e git+https://github.com/clearbit/clearbit-python.git@a810c34f401376d063e26a4f0b007255b44d8489#egg=clearbit
click==7.1.1
cliff==3.5.0
cloudpickle==1.3.0
cma==2.7.0
cmaes==0.7.0
cmd2==1.4.0
colorama==0.4.3
colorful==0.5.4
colorlog==4.1.0
commonmark==0.9.1
conda==4.8.2
conda-pack==0.6.0
conda-package-handling==1.6.0
configparser==5.0.0
ConfigSpace==0.4.16
criticality-score==1.0.7
cryptography @ file:///opt/concourse/worker/volumes/live/dfab5924-fd99-4f24-7939-7dbf6f1585f3/volume/cryptography_1639414584824/work
cycler==0.10.0
cymem==2.0.6
Cython==0.29
dask==2.20.0
dask-glm==0.2.0
dask-ml==1.5.0
databricks-cli==0.11.0
dataclasses==0.6
datasets==1.1.2
decorator==4.4.2
defusedxml==0.6.0
Deprecated==1.2.10
-e git+https://github.com/huggingface/diffusers@28f730520ef341dbd2125eb3e248bebaf6830514#egg=diffusers
dill==0.3.2
discourse==0.1.2
distlib==0.3.4
distributed==2.20.0
dm-tree==0.1.5
docker==4.2.2
docker-pycreds==0.4.0
docopt==0.6.2
docutils==0.15.2
dragonfly-opt==0.1.5
ecdsa==0.14.1
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
entrypoints==0.3
et-xmlfile==1.0.1
expiringdict==1.2.1
fastapi==0.63.0
feather-format==0.4.1
ffmpy==0.3.0
filelock==3.7.1
flake8==3.7.7
flake8-comprehensions==3.2.3
flake8-quotes==3.2.0
flaky==3.7.0
Flask==1.1.2
flatbuffers==1.12
frozenlist==1.3.0
fsspec==0.8.4
ftfy==6.1.1
funcsigs==1.0.2
future==0.18.2
gast==0.4.0
gensim==3.8.3
geocoder==1.38.1
gitdb==4.0.5
GitPython==3.1.2
glob2==0.7
google==2.0.3
google-api-core==1.22.0
google-api-python-client==1.10.0
google-auth==1.20.0
google-auth-httplib2==0.0.4
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
googleapis-common-protos==1.52.0
gorilla==0.3.0
gpustat==1.0.0
GPy==1.9.9
gpytorch==1.7.0
gql==0.2.0
gradio==3.3.1
graphql-core==1.1
graphviz==0.8.4
greenlet==1.1.2
grip==4.5.2
grpcio==1.43.0
gunicorn==20.0.4
gym==0.17.2
gym-minigrid==1.0.1
h11==0.9.0
h5py==3.1.0
halo==0.0.31
HeapDict==1.0.1
HEBO==0.3.2
hiredis==1.1.0
horovod==0.21.1
hpbandster==0.7.4
httplib2==0.18.1
httptools==0.1.1
httpx==1.0.0b0
huggingface-hub==0.9.1
hydra-colorlog==0.1.4
hydra-core==0.11.3
hyperopt==0.2.4
idna==2.8
imageio==2.9.0
imagesize==1.2.0
importlab==0.5.1
importlib-metadata==3.3.0
ipaddress==1.0.23
ipdb==0.13.3
ipykernel==5.3.0
ipython==7.16.1
ipython-genutils==0.2.0
ipywidgets==7.5.1
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
jedi==0.17.1
Jinja2==2.11.2
jmespath==0.10.0
joblib==0.15.1
json5==0.9.5
jsondiff==1.1.2
jsonpatch==1.25
jsonpickle==1.4.1
jsonpointer==2.0
jsonschema==3.2.0
junit-xml==1.9
jupyter==1.0.0
jupyter-client==6.1.5
jupyter-console==6.1.0
jupyter-core==4.6.3
jupyterlab==2.2.6
jupyterlab-server==1.2.0
kaggle==1.5.10
keras==2.6.0
Keras-Preprocessing==1.1.2
keyring==21.5.0
kiwisolver==1.2.0
kubernetes==11.0.0
libclang==14.0.1
lightgbm==3.1.1
lightgbm-ray==0.1.5
llvmlite==0.33.0
locket==0.2.0
log-symbols==0.0.14
-e git+https://github.com/uber/ludwig/@82ade15e35f3f28f7a18afe8eed75c3dc5da10ab#egg=ludwig
lux==0.5.1
lux-api==0.2.3
lux-widget @ file:///Users/rliaw/miniconda3/share/jupyter/lab/staging/node_modules/luxwidget
lxml==4.5.2
lz4==3.1.0
Mako==1.1.3
Markdown==3.2.2
markdown-it-py==2.1.0
MarkupSafe==1.1.1
matplotlib==3.2.2
mccabe==0.6.1
mistune==0.8.4
mlflow==1.12.1
mock==4.0.2
more-itertools==8.2.0
moto==1.3.14
msgpack==1.0.0
msrest==0.6.17
multidict==4.7.5
multipledispatch==0.6.0
multiprocess==0.70.10
murmurhash==1.0.6
mxnet==1.6.0
mypy==0.782
mypy-extensions==0.4.3
nbconvert==5.6.1
nbformat==5.0.7
netifaces==0.10.9
networkx==2.4
nevergrad==0.4.1.post4
nlp==0.4.0
nltk==3.6.5
notebook==6.0.3
numba==0.50.1
numexpr==2.7.1
numpy==1.21.6
nvidia-ml-py==11.495.46
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.0
omegaconf==2.1.0
opencensus==0.7.10
opencensus-context==0.1.1
opencv-python==4.2.0.34
opencv-python-headless==4.3.0.36
openpyxl==3.0.5
opt-einsum==3.3.0
optuna==2.3.0
orjson==3.8.0
packaging==21.3
pandas==1.3.5
pandocfilters==1.4.2
parameterized==0.7.4
paramiko==2.11.0
paramz==0.9.5
parso==0.7.0
partd==1.1.0
path-and-address==2.0.1
pathlib==1.0.1
pathspec==0.8.1
pathtools==0.1.2
patsy==0.5.1
pbr==5.4.5
PettingZoo==1.4.2
pexpect==4.8.0
pickle5==0.0.11
pickleshare==0.7.5
Pillow==7.1.2
pipdeptree==1.0.0
pkginfo==1.6.1
plac==1.1.3
platformdirs==2.5.2
plotly==4.9.0
pluggy==0.13.1
present==0.6.0
preshed==3.0.6
prettytable==0.7.2
prometheus-client==0.8.0
prometheus-flask-exporter==0.14.1
promise==2.3
prompt-toolkit==3.0.5
protobuf==3.15.6
psutil==5.7.0
psycopg2==2.7.7
ptpython==3.0.3
ptyprocess==0.6.0
pulsar==2.0.2
pulsar-odm==0.7.0
py==1.8.1
py-spy==0.3.4
pyaml==20.4.0
pyarrow==6.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybuildkite==1.2.1
pycodestyle==2.6.0
pycosat==0.6.3
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pycryptodome==3.15.0
pydantic==1.9.1
pydeps==1.9.7
pydub==0.25.1
pyfiglet==0.8.post1
pyflakes==2.1.1
pyflyby==1.6.8
PyFunctional==1.3.0
PyGithub==1.54.1
pyglet==1.5.0
Pygments==2.6.1
PyJWT==1.7.1
pymongo==3.10.1
pymoo==0.5.0
PyNaCl==1.4.0
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1635333100036/work
pyparsing==2.4.6
pyperclip==1.8.1
Pyro4==4.80
pyrsistent==0.15.7
PySocks @ file:///opt/concourse/worker/volumes/live/ef943889-94fc-4539-798d-461c60b77804/volume/pysocks_1605305801690/work
pytest==5.4.3
pytest-asyncio==0.14.0
pytest-remotedata==0.3.2
pytest-rerunfailures==9.1.1
pytest-sugar==0.9.4
pytest-timeout==1.4.1
python-dateutil==2.8.1
python-editor==1.0.4
python-gitlab==2.5.0
python-jose==3.2.0
python-multipart==0.0.5
python-slugify==4.0.1
pytorch-lightning==1.0.2
pytorch-lightning-bolts==0.2.5
pytorch-tabnet==3.1.1
pytz==2020.1
PyWavelets==1.1.1
pywren==0.4.0
PyYAML==5.4.1
pyzmq==19.0.1
qtconsole==4.7.5
QtPy==1.9.0
querystring-parser==1.2.4
ratelim==0.1.6
ray @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp37-cp37m-macosx_10_15_intel.whl
-e git+https://github.com/anyscale/hackathon-2022-automl@801febef4234e726a41e41d714616eb1d87af8c6#egg=ray_automl
-e git+https://github.com/ray-project/ray-sklearn@0dee83b2188330d1100d5000d1595c9252d75129#egg=ray_sklearn
readme-renderer==28.0
recommonmark==0.6.0
redis==3.5.3
regex==2021.11.10
requests==2.22.0
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
responses==0.10.16
retrying==1.3.3
rfc3986==1.4.0
rich==12.5.1
rsa==3.4.2
ruamel-yaml-conda @ file:///opt/concourse/worker/volumes/live/da6f10aa-e617-4894-45a9-cfdf5da681c3/volume/ruamel_yaml_1616016690897/work
s3transfer==0.3.3
sacremoses==0.0.43
scikit-image==0.17.2
scikit-learn==0.24.0
scikit-optimize==0.8.1
scipy==1.4.1
seaborn==0.11.2
Send2Trash==1.5.0
sentencepiece==0.1.91
sentry-sdk==0.14.4
serpent==1.30.2
shellingham==1.5.0
shortuuid==1.0.1
sigopt==5.7.0
six==1.15.0
sklearn==0.0
skorch==0.9.0
smart-open==2.0.0
smmap==3.0.4
snowballstemmer==2.0.0
sortedcontainers==2.2.2
soupsieve==2.0
spacy==2.3.7
Sphinx==3.0.4
sphinx-click==2.3.2
sphinx-copybutton==0.2.12
sphinx-gallery==0.7.0
sphinx-jsonschema==1.15
sphinx-rtd-theme==0.5.0
sphinx-version-warning==1.1.2
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
spinners==0.0.24
SQLAlchemy==1.3.13
sqlparse==0.3.1
srsly==1.0.5
sshpubkeys==3.1.0
sshtunnel==0.4.0
starlette==0.13.6
statsmodels==0.11.1
stdlib-list==0.7.0
stevedore==3.3.0
subprocess32==3.5.4
tables==3.6.1
tabulate==0.8.7
tblib==1.6.0
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.6.0.post3
tensorboardX==2.0
tensorflow==2.6.2
tensorflow-estimator==2.6.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-probability==0.10.0
termcolor==1.1.0
terminado==0.8.3
testfixtures==6.14.1
testpath==0.4.4
text-unidecode==1.3
texthero @ git+https://github.com/jbesomi/texthero.git@caf87e3038598f8038b37b560f5186414f4974e7
tfa-nightly==0.12.0.dev20200820045606
thinc==7.4.5
threadpoolctl==2.1.0
thrift==0.13.0
tifffile==2020.11.18
timm==0.1.30
tokenizers==0.12.1
toml==0.10.1
toolz==0.10.0
torch==1.12.0
torchvision==0.8.2
tornado==6.0.4
tqdm @ file:///tmp/build/80754af9/tqdm_1635330843403/work
traitlets==4.3.3
transformers==4.21.2
-e git+https://github.com/ray-project/tune-sklearn/@ba9949140d79b36b9bd63e95ab2d9809052ff6f0#egg=tune_sklearn
twine==3.2.0
typed-ast==1.4.1
typeguard==2.10.0
typer==0.4.1
typing-extensions==3.7.4.3
Unidecode==1.3.2
uritemplate==3.0.1
urllib3==1.25.11
uvicorn==0.16.0
uvloop==0.14.0
virtualenv==20.14.1
wandb==0.9.6
wasabi==0.9.0
watchdog==0.10.2
watchtower==1.0.0
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==0.57.0
websockets==8.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wordcloud==1.8.1
wrapt==1.12.1
xgboost==1.3.0.post0
xgboost-ray @ git+http://github.com/ray-project/xgboost_ray.git@a84a446562bd1fafe54bee9797b6b04abae9a916
xlrd==1.2.0
xlwt==1.3.0
xmltodict==0.12.0
xxhash==2.0.0
yapf==0.23.0
yarl==1.4.2
yaspin==1.0.0
zict==2.0.0
zipp==3.1.0
zoopt==0.4.1

Reproduction script

import ray

from ray.air.config import ScalingConfig
from ray import tune
from ray.data.preprocessors import StandardScaler, MinMaxScaler


dataset = ray.data.from_items(
    [{"X": 1.0, "Y": 2.0}, {"X": 4.0, "Y": 0.0}]
)
prep_v1 = StandardScaler(columns=["X", "Y"])
prep_v2 = MinMaxScaler(columns=["X", "Y"])

param_space = {
    "scaling_config": ScalingConfig(
        num_workers=tune.grid_search([1]),
        resources_per_worker={
            "CPU": 2,
            "GPU": 0,
        },
    ),
    "preprocessor": tune.grid_search([prep_v1, prep_v2]),
    "params": {
        "objective": "binary:logistic",
        "tree_method": "approx",
        "eval_metric": ["logloss", "error"],
        "eta": tune.loguniform(1e-4, 1e-1),
        "subsample": tune.uniform(0.5, 1.0),
        "max_depth": tune.randint(1, 9),
    },
}

from ray.train.xgboost import XGBoostTrainer
from ray.air.config import RunConfig
from ray.tune import Tuner


trainer = XGBoostTrainer(
    params={},
    run_config=RunConfig(verbose=2),
    preprocessor=None,
    scaling_config=None,
    label_column="Y",
    datasets={"train": dataset}
)

tuner = Tuner(
    trainer,
    param_space=param_space,
)

results = tuner.fit()

Issue Severity

High: It blocks me from completing my task.

@richardliaw richardliaw added bug Something that is supposed to be working; but isn't release-blocker P0 Issue that blocks the release P0 Issues that should be fixed in short order labels Oct 5, 2022
@richardliaw richardliaw added this to the Ray 2.1 milestone Oct 5, 2022
@matthewdeng matthewdeng added the air label Oct 5, 2022
krfricke added a commit that referenced this issue Oct 7, 2022
We currently raise `skip_exceptions(e) from None` to reduce the stacktrace output of failing functions. However, in python this means that the context is swallowed completely, even if `skip_exceptions(e)` returns an exception with context. the `from None` takes precedence. 
The solution here is to extract the cause manually from the new `skip_exceptions(e)`-Exception and raise from this context. The tests are still passing (thus for regular cases the traceback remains compact), but the repro script in #29097 will reveal the actual cause of the error.

Signed-off-by: Kai Fricke <kai@anyscale.com>
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this issue Dec 19, 2022
…29143)

We currently raise `skip_exceptions(e) from None` to reduce the stacktrace output of failing functions. However, in python this means that the context is swallowed completely, even if `skip_exceptions(e)` returns an exception with context. the `from None` takes precedence.
The solution here is to extract the cause manually from the new `skip_exceptions(e)`-Exception and raise from this context. The tests are still passing (thus for regular cases the traceback remains compact), but the repro script in ray-project#29097 will reveal the actual cause of the error.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants