[air] simple xgboost script swallowing error #29097

richardliaw · 2022-10-05T22:46:17Z

What happened + What you expected to happen

I am expecting some error message that tells me what I need to fix.

(base) ➜  learning-ray-private git:(main) ✗ python /Users/rliaw/dev/learning-ray-private/code/_test.py
Usage stats collection is disabled.
2022-10-05 15:41:07,920	INFO worker.py:1515 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
== Status ==
Current time: 2022-10-05 15:41:13 (running for 00:00:02.88)
Memory usage on this node: 24.8/64.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/16 CPUs, 0/0 GPUs, 0.0/37.03 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10
Number of trials: 2/2 (1 PENDING, 1 RUNNING)


(XGBoostTrainer pid=33826) /Users/rliaw/miniconda3/lib/python3.7/site-packages/xgboost_ray/main.py:423: UserWarning: `num_actors` in `ray_params` is smaller than 2 (1). XGBoost will NOT be distributed!
(XGBoostTrainer pid=33826)   f"`num_actors` in `ray_params` is smaller than 2 "
(XGBoostTrainer pid=33826) 2022-10-05 15:41:16,171	INFO main.py:980 -- [RayXGBoost] Created 1 new actors (1 total actors). Waiting until actors are ready for training.
(XGBoostTrainer pid=33831) /Users/rliaw/miniconda3/lib/python3.7/site-packages/xgboost_ray/main.py:423: UserWarning: `num_actors` in `ray_params` is smaller than 2 (1). XGBoost will NOT be distributed!
(XGBoostTrainer pid=33831)   f"`num_actors` in `ray_params` is smaller than 2 "
(XGBoostTrainer pid=33831) 2022-10-05 15:41:17,011	INFO main.py:980 -- [RayXGBoost] Created 1 new actors (1 total actors). Waiting until actors are ready for training.
== Status ==
Current time: 2022-10-05 15:41:18 (running for 00:00:07.90)
Memory usage on this node: 25.1/64.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 6.0/16 CPUs, 0/0 GPUs, 0.0/37.03 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10
Number of trials: 2/2 (2 RUNNING)


(_RemoteRayXGBoostActor pid=33848) 2022-10-05 15:41:19,165	WARNING __init__.py:192 -- DeprecationWarning: `ray.worker.get_resource_ids` is a private attribute and access will be removed in a future Ray version.
(XGBoostTrainer pid=33826) 2022-10-05 15:41:19,279	INFO main.py:1025 -- [RayXGBoost] Starting XGBoost training.
(_RemoteRayXGBoostActor pid=33848) [15:41:19] task [xgboost.ray]:140307860138704 got new rank 0
(XGBoostTrainer pid=33826) 2022-10-05 15:41:19,464	INFO elastic.py:155 -- Actor status: 1 alive, 0 dead (1 total)
2022-10-05 15:41:19,644	ERROR trial_runner.py:990 -- Trial XGBoostTrainer_cf40b_00000: Error processing event.
ray.exceptions.RayTaskError(RuntimeError): ray::_Inner.train() (pid=33826, ip=127.0.0.1, repr=XGBoostTrainer)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 353, in train
    raise skip_exceptions(e) from None
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/trainable/function_trainable.py", line 328, in entrypoint
    self._status_reporter.get_checkpoint(),
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/train/base_trainer.py", line 475, in _trainable_func
    super()._trainable_func(self._merged_config, reporter, checkpoint_dir)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/trainable/function_trainable.py", line 651, in _trainable_func
    output = fn()
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/train/base_trainer.py", line 390, in train_func
    trainer.training_loop()
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/train/gbdt_trainer.py", line 263, in training_loop
    **config,
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/train/xgboost/xgboost_trainer.py", line 84, in _train
    return xgboost_ray.train(**kwargs)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/xgboost_ray/main.py", line 1508, in train
    ) from exc
RuntimeError: A Ray actor died during training and the maximum number of retries (0) is exhausted.
The trial XGBoostTrainer_cf40b_00000 errored with parameters={'scaling_config': {'trainer_resources': None, 'num_workers': 1, 'use_gpu': False, 'resources_per_worker': {'CPU': 2, 'GPU': 0}, 'placement_strategy': 'PACK', '_max_cpu_fraction_per_node': None}, 'preprocessor': StandardScaler(columns=['X', 'Y']), 'params': {'objective': 'binary:logistic', 'tree_method': 'approx', 'eval_metric': ['logloss', 'error'], 'eta': 0.04707815901985832, 'subsample': 0.5116661725221274, 'max_depth': 4}}. Error file: /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10/XGBoostTrainer_cf40b_00000_0_eta=0.0471,max_depth=4,subsample=0.5117,preprocessor=StandardScaler_columns_X_Y,num_workers=1_2022-10-05_15-41-11/error.txt
(_RemoteRayXGBoostActor pid=33857) 2022-10-05 15:41:20,082	WARNING __init__.py:192 -- DeprecationWarning: `ray.worker.get_resource_ids` is a private attribute and access will be removed in a future Ray version.
(XGBoostTrainer pid=33831) 2022-10-05 15:41:20,195	INFO main.py:1025 -- [RayXGBoost] Starting XGBoost training.
(_RemoteRayXGBoostActor pid=33857) [15:41:20] task [xgboost.ray]:140228942076880 got new rank 0
Trial XGBoostTrainer_cf40b_00001 reported train-logloss=0.693147,train-error=0.5 with parameters={'scaling_config': {'trainer_resources': None, 'num_workers': 1, 'use_gpu': False, 'resources_per_worker': {'CPU': 2, 'GPU': 0}, 'placement_strategy': 'PACK', '_max_cpu_fraction_per_node': None}, 'preprocessor': MinMaxScaler(columns=['X', 'Y']), 'params': {'objective': 'binary:logistic', 'tree_method': 'approx', 'eval_metric': ['logloss', 'error'], 'eta': 0.00015355604372622395, 'subsample': 0.7226451783101197, 'max_depth': 1}}.
Trial XGBoostTrainer_cf40b_00001 reported train-logloss=0.693147,train-error=0.5,should_checkpoint=True with parameters={'scaling_config': {'trainer_resources': None, 'num_workers': 1, 'use_gpu': False, 'resources_per_worker': {'CPU': 2, 'GPU': 0}, 'placement_strategy': 'PACK', '_max_cpu_fraction_per_node': None}, 'preprocessor': MinMaxScaler(columns=['X', 'Y']), 'params': {'objective': 'binary:logistic', 'tree_method': 'approx', 'eval_metric': ['logloss', 'error'], 'eta': 0.00015355604372622395, 'subsample': 0.7226451783101197, 'max_depth': 1}}. This trial completed.
2022-10-05 15:41:22,222	INFO tensorboardx.py:270 -- Removed the following hyperparameter values when logging to tensorboard: {'preprocessor': MinMaxScaler(columns=['X', 'Y'])}
(XGBoostTrainer pid=33831) 2022-10-05 15:41:22,183	INFO main.py:1519 -- [RayXGBoost] Finished XGBoost training on training data with total N=2 in 5.20 seconds (1.99 pure XGBoost training time).
== Status ==
Current time: 2022-10-05 15:41:22 (running for 00:00:11.34)
Memory usage on this node: 24.6/64.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/37.03 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10
Number of trials: 2/2 (1 ERROR, 1 TERMINATED)
+----------------------------+------------+-----------------+--------------+--------------------+--------------------+----------------------+------------------------+--------+------------------+-----------------+---------------+
| Trial name                 | status     | loc             |   params/eta |   params/max_depth |   params/subsample | preprocessor         |   scaling_config/num_w |   iter |   total time (s) |   train-logloss |   train-error |
|                            |            |                 |              |                    |                    |                      |                 orkers |        |                  |                 |               |
|----------------------------+------------+-----------------+--------------+--------------------+--------------------+----------------------+------------------------+--------+------------------+-----------------+---------------|
| XGBoostTrainer_cf40b_00001 | TERMINATED | 127.0.0.1:33831 |  0.000153556 |                  1 |           0.722645 | MinMaxScaler(co_e790 |                      1 |     11 |          5.26433 |        0.693147 |           0.5 |
| XGBoostTrainer_cf40b_00000 | ERROR      | 127.0.0.1:33826 |  0.0470782   |                  4 |           0.511666 | StandardScaler(_ee50 |                      1 |        |                  |                 |               |
+----------------------------+------------+-----------------+--------------+--------------------+--------------------+----------------------+------------------------+--------+------------------+-----------------+---------------+
Number of errored trials: 1
+----------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                 |   # failures | error file                                                                                                                                                                                                           |
|----------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| XGBoostTrainer_cf40b_00000 |            1 | /Users/rliaw/ray_results/XGBoostTrainer_2022-10-05_15-41-10/XGBoostTrainer_cf40b_00000_0_eta=0.0471,max_depth=4,subsample=0.5117,preprocessor=StandardScaler_columns_X_Y,num_workers=1_2022-10-05_15-41-11/error.txt |
+----------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

2022-10-05 15:41:22,341	ERROR tune.py:758 -- Trials did not complete: [XGBoostTrainer_cf40b_00000]
2022-10-05 15:41:22,341	INFO tune.py:763 -- Total run time: 11.56 seconds (11.33 seconds for the tuning loop).

Versions / Dependencies

python 3.9, mac, master

absl-py==0.15.0
aiobotocore==1.2.2
aiohttp==3.7.4.post0
aiohttp-cors==0.7.0
aiohttp-middlewares==1.1.0
aioitertools==0.7.1
aioredis==1.3.1
aiorwlock==1.3.0
aiosignal==1.2.0
alabaster==0.7.12
alembic==1.4.1
altair==4.2.0
analytics-python==1.4.0
antlr4-python3-runtime==4.8
anyscale==0.5.33
appdirs==1.4.4
appnope==0.1.0
asciimatics==1.12.0
asgiref==3.5.2
asn1crypto==1.2.0
astunparse==1.6.3
async-timeout==3.0.1
atari-py==0.2.6
attrs==19.3.0
autograd==1.4
autopep8==1.5.4
aws-parallelcluster==2.11.1
aws-sam-translator==1.26.0
aws-xray-sdk==2.6.0
ax-platform==0.1.14
azure-core==1.6.0
azure-storage-blob==12.3.2
Babel==2.8.0
backcall==0.2.0
bayesian-optimization==1.2.0
bcrypt==3.2.2
beautifulsoup4==4.8.2
better-exceptions==0.3.3
black==19.10b0
bleach==3.1.5
blessed==1.19.1
blessings==1.7
blis==0.7.5
blist==1.3.6
bokeh==2.2.1
boto==2.49.0
boto3==1.16.52
botocore==1.19.52
botorch==0.3.1
Box2D==2.3.10
cached-property==1.5.2
cachetools==4.1.0
catalogue==1.0.0
catboost==1.0.6
certifi==2021.10.8
cffi==1.15.1
cfn-lint==0.35.0
chardet @ file:///opt/concourse/worker/volumes/live/92aa0fea-cec8-43da-4d77-2574e78c5981/volume/chardet_1605303181342/work
clang==5.0
-e git+https://github.com/clearbit/clearbit-python.git@a810c34f401376d063e26a4f0b007255b44d8489#egg=clearbit
click==7.1.1
cliff==3.5.0
cloudpickle==1.3.0
cma==2.7.0
cmaes==0.7.0
cmd2==1.4.0
colorama==0.4.3
colorful==0.5.4
colorlog==4.1.0
commonmark==0.9.1
conda==4.8.2
conda-pack==0.6.0
conda-package-handling==1.6.0
configparser==5.0.0
ConfigSpace==0.4.16
criticality-score==1.0.7
cryptography @ file:///opt/concourse/worker/volumes/live/dfab5924-fd99-4f24-7939-7dbf6f1585f3/volume/cryptography_1639414584824/work
cycler==0.10.0
cymem==2.0.6
Cython==0.29
dask==2.20.0
dask-glm==0.2.0
dask-ml==1.5.0
databricks-cli==0.11.0
dataclasses==0.6
datasets==1.1.2
decorator==4.4.2
defusedxml==0.6.0
Deprecated==1.2.10
-e git+https://github.com/huggingface/diffusers@28f730520ef341dbd2125eb3e248bebaf6830514#egg=diffusers
dill==0.3.2
discourse==0.1.2
distlib==0.3.4
distributed==2.20.0
dm-tree==0.1.5
docker==4.2.2
docker-pycreds==0.4.0
docopt==0.6.2
docutils==0.15.2
dragonfly-opt==0.1.5
ecdsa==0.14.1
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
entrypoints==0.3
et-xmlfile==1.0.1
expiringdict==1.2.1
fastapi==0.63.0
feather-format==0.4.1
ffmpy==0.3.0
filelock==3.7.1
flake8==3.7.7
flake8-comprehensions==3.2.3
flake8-quotes==3.2.0
flaky==3.7.0
Flask==1.1.2
flatbuffers==1.12
frozenlist==1.3.0
fsspec==0.8.4
ftfy==6.1.1
funcsigs==1.0.2
future==0.18.2
gast==0.4.0
gensim==3.8.3
geocoder==1.38.1
gitdb==4.0.5
GitPython==3.1.2
glob2==0.7
google==2.0.3
google-api-core==1.22.0
google-api-python-client==1.10.0
google-auth==1.20.0
google-auth-httplib2==0.0.4
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
googleapis-common-protos==1.52.0
gorilla==0.3.0
gpustat==1.0.0
GPy==1.9.9
gpytorch==1.7.0
gql==0.2.0
gradio==3.3.1
graphql-core==1.1
graphviz==0.8.4
greenlet==1.1.2
grip==4.5.2
grpcio==1.43.0
gunicorn==20.0.4
gym==0.17.2
gym-minigrid==1.0.1
h11==0.9.0
h5py==3.1.0
halo==0.0.31
HeapDict==1.0.1
HEBO==0.3.2
hiredis==1.1.0
horovod==0.21.1
hpbandster==0.7.4
httplib2==0.18.1
httptools==0.1.1
httpx==1.0.0b0
huggingface-hub==0.9.1
hydra-colorlog==0.1.4
hydra-core==0.11.3
hyperopt==0.2.4
idna==2.8
imageio==2.9.0
imagesize==1.2.0
importlab==0.5.1
importlib-metadata==3.3.0
ipaddress==1.0.23
ipdb==0.13.3
ipykernel==5.3.0
ipython==7.16.1
ipython-genutils==0.2.0
ipywidgets==7.5.1
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
jedi==0.17.1
Jinja2==2.11.2
jmespath==0.10.0
joblib==0.15.1
json5==0.9.5
jsondiff==1.1.2
jsonpatch==1.25
jsonpickle==1.4.1
jsonpointer==2.0
jsonschema==3.2.0
junit-xml==1.9
jupyter==1.0.0
jupyter-client==6.1.5
jupyter-console==6.1.0
jupyter-core==4.6.3
jupyterlab==2.2.6
jupyterlab-server==1.2.0
kaggle==1.5.10
keras==2.6.0
Keras-Preprocessing==1.1.2
keyring==21.5.0
kiwisolver==1.2.0
kubernetes==11.0.0
libclang==14.0.1
lightgbm==3.1.1
lightgbm-ray==0.1.5
llvmlite==0.33.0
locket==0.2.0
log-symbols==0.0.14
-e git+https://github.com/uber/ludwig/@82ade15e35f3f28f7a18afe8eed75c3dc5da10ab#egg=ludwig
lux==0.5.1
lux-api==0.2.3
lux-widget @ file:///Users/rliaw/miniconda3/share/jupyter/lab/staging/node_modules/luxwidget
lxml==4.5.2
lz4==3.1.0
Mako==1.1.3
Markdown==3.2.2
markdown-it-py==2.1.0
MarkupSafe==1.1.1
matplotlib==3.2.2
mccabe==0.6.1
mistune==0.8.4
mlflow==1.12.1
mock==4.0.2
more-itertools==8.2.0
moto==1.3.14
msgpack==1.0.0
msrest==0.6.17
multidict==4.7.5
multipledispatch==0.6.0
multiprocess==0.70.10
murmurhash==1.0.6
mxnet==1.6.0
mypy==0.782
mypy-extensions==0.4.3
nbconvert==5.6.1
nbformat==5.0.7
netifaces==0.10.9
networkx==2.4
nevergrad==0.4.1.post4
nlp==0.4.0
nltk==3.6.5
notebook==6.0.3
numba==0.50.1
numexpr==2.7.1
numpy==1.21.6
nvidia-ml-py==11.495.46
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.0
omegaconf==2.1.0
opencensus==0.7.10
opencensus-context==0.1.1
opencv-python==4.2.0.34
opencv-python-headless==4.3.0.36
openpyxl==3.0.5
opt-einsum==3.3.0
optuna==2.3.0
orjson==3.8.0
packaging==21.3
pandas==1.3.5
pandocfilters==1.4.2
parameterized==0.7.4
paramiko==2.11.0
paramz==0.9.5
parso==0.7.0
partd==1.1.0
path-and-address==2.0.1
pathlib==1.0.1
pathspec==0.8.1
pathtools==0.1.2
patsy==0.5.1
pbr==5.4.5
PettingZoo==1.4.2
pexpect==4.8.0
pickle5==0.0.11
pickleshare==0.7.5
Pillow==7.1.2
pipdeptree==1.0.0
pkginfo==1.6.1
plac==1.1.3
platformdirs==2.5.2
plotly==4.9.0
pluggy==0.13.1
present==0.6.0
preshed==3.0.6
prettytable==0.7.2
prometheus-client==0.8.0
prometheus-flask-exporter==0.14.1
promise==2.3
prompt-toolkit==3.0.5
protobuf==3.15.6
psutil==5.7.0
psycopg2==2.7.7
ptpython==3.0.3
ptyprocess==0.6.0
pulsar==2.0.2
pulsar-odm==0.7.0
py==1.8.1
py-spy==0.3.4
pyaml==20.4.0
pyarrow==6.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybuildkite==1.2.1
pycodestyle==2.6.0
pycosat==0.6.3
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pycryptodome==3.15.0
pydantic==1.9.1
pydeps==1.9.7
pydub==0.25.1
pyfiglet==0.8.post1
pyflakes==2.1.1
pyflyby==1.6.8
PyFunctional==1.3.0
PyGithub==1.54.1
pyglet==1.5.0
Pygments==2.6.1
PyJWT==1.7.1
pymongo==3.10.1
pymoo==0.5.0
PyNaCl==1.4.0
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1635333100036/work
pyparsing==2.4.6
pyperclip==1.8.1
Pyro4==4.80
pyrsistent==0.15.7
PySocks @ file:///opt/concourse/worker/volumes/live/ef943889-94fc-4539-798d-461c60b77804/volume/pysocks_1605305801690/work
pytest==5.4.3
pytest-asyncio==0.14.0
pytest-remotedata==0.3.2
pytest-rerunfailures==9.1.1
pytest-sugar==0.9.4
pytest-timeout==1.4.1
python-dateutil==2.8.1
python-editor==1.0.4
python-gitlab==2.5.0
python-jose==3.2.0
python-multipart==0.0.5
python-slugify==4.0.1
pytorch-lightning==1.0.2
pytorch-lightning-bolts==0.2.5
pytorch-tabnet==3.1.1
pytz==2020.1
PyWavelets==1.1.1
pywren==0.4.0
PyYAML==5.4.1
pyzmq==19.0.1
qtconsole==4.7.5
QtPy==1.9.0
querystring-parser==1.2.4
ratelim==0.1.6
ray @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp37-cp37m-macosx_10_15_intel.whl
-e git+https://github.com/anyscale/hackathon-2022-automl@801febef4234e726a41e41d714616eb1d87af8c6#egg=ray_automl
-e git+https://github.com/ray-project/ray-sklearn@0dee83b2188330d1100d5000d1595c9252d75129#egg=ray_sklearn
readme-renderer==28.0
recommonmark==0.6.0
redis==3.5.3
regex==2021.11.10
requests==2.22.0
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
responses==0.10.16
retrying==1.3.3
rfc3986==1.4.0
rich==12.5.1
rsa==3.4.2
ruamel-yaml-conda @ file:///opt/concourse/worker/volumes/live/da6f10aa-e617-4894-45a9-cfdf5da681c3/volume/ruamel_yaml_1616016690897/work
s3transfer==0.3.3
sacremoses==0.0.43
scikit-image==0.17.2
scikit-learn==0.24.0
scikit-optimize==0.8.1
scipy==1.4.1
seaborn==0.11.2
Send2Trash==1.5.0
sentencepiece==0.1.91
sentry-sdk==0.14.4
serpent==1.30.2
shellingham==1.5.0
shortuuid==1.0.1
sigopt==5.7.0
six==1.15.0
sklearn==0.0
skorch==0.9.0
smart-open==2.0.0
smmap==3.0.4
snowballstemmer==2.0.0
sortedcontainers==2.2.2
soupsieve==2.0
spacy==2.3.7
Sphinx==3.0.4
sphinx-click==2.3.2
sphinx-copybutton==0.2.12
sphinx-gallery==0.7.0
sphinx-jsonschema==1.15
sphinx-rtd-theme==0.5.0
sphinx-version-warning==1.1.2
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
spinners==0.0.24
SQLAlchemy==1.3.13
sqlparse==0.3.1
srsly==1.0.5
sshpubkeys==3.1.0
sshtunnel==0.4.0
starlette==0.13.6
statsmodels==0.11.1
stdlib-list==0.7.0
stevedore==3.3.0
subprocess32==3.5.4
tables==3.6.1
tabulate==0.8.7
tblib==1.6.0
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.6.0.post3
tensorboardX==2.0
tensorflow==2.6.2
tensorflow-estimator==2.6.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-probability==0.10.0
termcolor==1.1.0
terminado==0.8.3
testfixtures==6.14.1
testpath==0.4.4
text-unidecode==1.3
texthero @ git+https://github.com/jbesomi/texthero.git@caf87e3038598f8038b37b560f5186414f4974e7
tfa-nightly==0.12.0.dev20200820045606
thinc==7.4.5
threadpoolctl==2.1.0
thrift==0.13.0
tifffile==2020.11.18
timm==0.1.30
tokenizers==0.12.1
toml==0.10.1
toolz==0.10.0
torch==1.12.0
torchvision==0.8.2
tornado==6.0.4
tqdm @ file:///tmp/build/80754af9/tqdm_1635330843403/work
traitlets==4.3.3
transformers==4.21.2
-e git+https://github.com/ray-project/tune-sklearn/@ba9949140d79b36b9bd63e95ab2d9809052ff6f0#egg=tune_sklearn
twine==3.2.0
typed-ast==1.4.1
typeguard==2.10.0
typer==0.4.1
typing-extensions==3.7.4.3
Unidecode==1.3.2
uritemplate==3.0.1
urllib3==1.25.11
uvicorn==0.16.0
uvloop==0.14.0
virtualenv==20.14.1
wandb==0.9.6
wasabi==0.9.0
watchdog==0.10.2
watchtower==1.0.0
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==0.57.0
websockets==8.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wordcloud==1.8.1
wrapt==1.12.1
xgboost==1.3.0.post0
xgboost-ray @ git+http://github.com/ray-project/xgboost_ray.git@a84a446562bd1fafe54bee9797b6b04abae9a916
xlrd==1.2.0
xlwt==1.3.0
xmltodict==0.12.0
xxhash==2.0.0
yapf==0.23.0
yarl==1.4.2
yaspin==1.0.0
zict==2.0.0
zipp==3.1.0
zoopt==0.4.1

Reproduction script

import ray

from ray.air.config import ScalingConfig
from ray import tune
from ray.data.preprocessors import StandardScaler, MinMaxScaler


dataset = ray.data.from_items(
    [{"X": 1.0, "Y": 2.0}, {"X": 4.0, "Y": 0.0}]
)
prep_v1 = StandardScaler(columns=["X", "Y"])
prep_v2 = MinMaxScaler(columns=["X", "Y"])

param_space = {
    "scaling_config": ScalingConfig(
        num_workers=tune.grid_search([1]),
        resources_per_worker={
            "CPU": 2,
            "GPU": 0,
        },
    ),
    "preprocessor": tune.grid_search([prep_v1, prep_v2]),
    "params": {
        "objective": "binary:logistic",
        "tree_method": "approx",
        "eval_metric": ["logloss", "error"],
        "eta": tune.loguniform(1e-4, 1e-1),
        "subsample": tune.uniform(0.5, 1.0),
        "max_depth": tune.randint(1, 9),
    },
}

from ray.train.xgboost import XGBoostTrainer
from ray.air.config import RunConfig
from ray.tune import Tuner


trainer = XGBoostTrainer(
    params={},
    run_config=RunConfig(verbose=2),
    preprocessor=None,
    scaling_config=None,
    label_column="Y",
    datasets={"train": dataset}
)

tuner = Tuner(
    trainer,
    param_space=param_space,
)

results = tuner.fit()

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

We currently raise `skip_exceptions(e) from None` to reduce the stacktrace output of failing functions. However, in python this means that the context is swallowed completely, even if `skip_exceptions(e)` returns an exception with context. the `from None` takes precedence. The solution here is to extract the cause manually from the new `skip_exceptions(e)`-Exception and raise from this context. The tests are still passing (thus for regular cases the traceback remains compact), but the repro script in #29097 will reveal the actual cause of the error. Signed-off-by: Kai Fricke <kai@anyscale.com>

…29143) We currently raise `skip_exceptions(e) from None` to reduce the stacktrace output of failing functions. However, in python this means that the context is swallowed completely, even if `skip_exceptions(e)` returns an exception with context. the `from None` takes precedence. The solution here is to extract the cause manually from the new `skip_exceptions(e)`-Exception and raise from this context. The tests are still passing (thus for regular cases the traceback remains compact), but the repro script in ray-project#29097 will reveal the actual cause of the error. Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

richardliaw added bug Something that is supposed to be working; but isn't release-blocker P0 Issue that blocks the release P0 Issues that should be fixed in short order labels Oct 5, 2022

richardliaw added this to the Ray 2.1 milestone Oct 5, 2022

matthewdeng added the air label Oct 5, 2022

krfricke mentioned this issue Oct 6, 2022

[air] Fix traceback context swallowing in training jobs #29143

Merged

7 tasks

matthewdeng assigned krfricke Oct 7, 2022

krfricke closed this as completed in #29143 Oct 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[air] simple xgboost script swallowing error #29097

[air] simple xgboost script swallowing error #29097

richardliaw commented Oct 5, 2022

[air] simple xgboost script swallowing error #29097

[air] simple xgboost script swallowing error #29097

Comments

richardliaw commented Oct 5, 2022

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity