MedianstopAssessor silently crashes dispatcher when reporting multiple metrics #2117

arvoelke · 2020-03-03T23:25:26Z

Short summary about the issue/question:

When reporting multiple metrics, e.g.,

nni.report_intermediate_result({'default': loss, 'other_metric': some_other_value})

the MedianstopAssessor errors which ends up silently crashing the dispatcher (which makes the WebUI unresponsive and stops the experiment from making any progress). See dispatcher.log:

[03/03/2020, 05:34:16 PM] WARNING (medianstop_Assessor/Thread-2) incorrect data type or value:
[03/03/2020, 05:34:16 PM] ERROR (medianstop_Assessor/Thread-2) float() argument must be a string or a number, not 'collections.OrderedDict'
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/medianstop_assessor/medianstop_assessor.py", line 95, in assess_trial
    num_trial_history = [float(ele) for ele in trial_history]
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/medianstop_assessor/medianstop_assessor.py", line 95, in <listcomp>
    num_trial_history = [float(ele) for ele in trial_history]
TypeError: float() argument must be a string or a number, not 'collections.OrderedDict'
[03/03/2020, 05:34:16 PM] ERROR (nni.msg_dispatcher/Thread-2) Assessor error
[03/03/2020, 05:34:16 PM] ERROR (nni.msg_dispatcher/Thread-2) local variable 'num_trial_history' referenced before assignment
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 206, in _handle_intermediate_metric_data
    result = self.assessor.assess_trial(trial_job_id, ordered_history)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/medianstop_assessor/medianstop_assessor.py", line 103, in assess_trial
    self._update_data(trial_job_id, num_trial_history)
UnboundLocalError: local variable 'num_trial_history' referenced before assignment
[03/03/2020, 05:34:16 PM] ERROR (nni.msg_dispatcher_base/Thread-2) local variable 'result' referenced before assignment
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 90, in command_queue_worker
    self.process_command(command, data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 149, in process_command
    command_handlers[command](data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 139, in handle_report_metric_data
    self._handle_intermediate_metric_data(data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 211, in _handle_intermediate_metric_data
    if isinstance(result, bool):
UnboundLocalError: local variable 'result' referenced before assignment
[03/03/2020, 05:34:17 PM] INFO (nni.msg_dispatcher_base/MainThread) Dispatcher exiting...
[03/03/2020, 05:34:20 PM] INFO (nni.msg_dispatcher_base/MainThread) Terminated by NNI manager

How to reproduce it:

Call nni.report_intermediate_result({'default': loss, 'other_metric': some_other_value}) while using the MedianstopAssessor.

nni Environment:

nni version: master
nni mode(local|pai|remote): local
OS: Ubuntu 18.04
python version: Python 3.7.6
is conda or virtualenv used?: conda
is running in docker?: no

Anything else we need to know:

The root of the issue is on this line:

nni/src/sdk/pynni/nni/medianstop_assessor/medianstop_assessor.py

Line 95 in 9987014

num_trial_history = [float(ele) for ele in trial_history]

My work-around is to create a file called assessor.py that contains:

from nni.medianstop_assessor import MedianstopAssessor


class FixedMedianstopAssessor(MedianstopAssessor):

    def assess_trial(self, trial_job_id, trial_history):
        trial_history = [float(ele['default']) for ele in trial_history]
        return super().assess_trial(trial_job_id, trial_history)

and then change the assessor in the experimental config to:

assessor:
  codeDir: .
  classFileName: assessor.py
  className: FixedMedianstopAssessor
  classArgs:
    optimize_mode: ...

The text was updated successfully, but these errors were encountered:

QuanluZhang · 2020-03-04T01:03:24Z

@arvoelke thanks for reporting this issue. It seems builit-in assessors have not supported dict metric. We will fix this problem very soon.

arvoelke · 2020-03-04T23:52:34Z

Great, thanks. Also not sure if this is a different issue, but there appears to be some interaction with the tuner's includeIntermediateResults: true setting. My work-around is fine when it is false, but when true an early stop triggers the following in dispatcher.log:

ERROR (nni.msg_dispatcher_base/Thread-1) The input was of non-string type "<class 'collections.OrderedDict'>" in `json_tricks.load(s)`. Bytes cannot be automatically decoding since the encoding is not known. Recommended way is to instead encode the bytes to a string and pass that string to `load(s)`, for example bytevar.encode("utf-8") if utf-8 is the encoding. Alternatively you can force an attempt by passing conv_str_byte=True, but this may cause decoding issues.
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*i/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 90, in command_queue_worker
    self.process_command(command, data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 149, in process_command
    command_handlers[command](data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 134, in handle_report_metric_data
    data['value'] = json_tricks.loads(data['value'])
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/json_tricks/nonp.py", line 210, in loads
    .format(type(string)))
TypeError: The input was of non-string type "<class 'collections.OrderedDict'>" in `json_tricks.load(s)`. Bytes cannot be automatically decoding since the encoding is not known. Recommended way is to instead encode the bytes to a string and pass that string to `load(s)`, for example bytevar.encode("utf-8") if utf-8 is the encoding. Alternatively you can force an attempt by passing conv_str_byte=True, but this may cause decoding issues.

I can't find the root of the error in the logs. However, it is triggered at the exact same time that nni.report_intermediate_result({'default': loss, 'other_metric': some_other_value}) is called and an early stop occurs (according to trial logs). For now I'll revert to using includeIntermediateResults: false since I'd like to be able to monitor these additional intermediate metrics.

QuanluZhang · 2020-03-05T03:54:57Z

Great, thanks. Also not sure if this is a different issue, but there appears to be some interaction with the tuner's includeIntermediateResults: true setting. My work-around is fine when it is false, but when true an early stop triggers the following in dispatcher.log:

ERROR (nni.msg_dispatcher_base/Thread-1) The input was of non-string type "<class 'collections.OrderedDict'>" in `json_tricks.load(s)`. Bytes cannot be automatically decoding since the encoding is not known. Recommended way is to instead encode the bytes to a string and pass that string to `load(s)`, for example bytevar.encode("utf-8") if utf-8 is the encoding. Alternatively you can force an attempt by passing conv_str_byte=True, but this may cause decoding issues.
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*i/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 90, in command_queue_worker
    self.process_command(command, data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 149, in process_command
    command_handlers[command](data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 134, in handle_report_metric_data
    data['value'] = json_tricks.loads(data['value'])
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/json_tricks/nonp.py", line 210, in loads
    .format(type(string)))
TypeError: The input was of non-string type "<class 'collections.OrderedDict'>" in `json_tricks.load(s)`. Bytes cannot be automatically decoding since the encoding is not known. Recommended way is to instead encode the bytes to a string and pass that string to `load(s)`, for example bytevar.encode("utf-8") if utf-8 is the encoding. Alternatively you can force an attempt by passing conv_str_byte=True, but this may cause decoding issues.

I can't find the root of the error in the logs. However, it is triggered at the exact same time that nni.report_intermediate_result({'default': loss, 'other_metric': some_other_value}) is called and an early stop occurs (according to trial logs). For now I'll revert to using includeIntermediateResults: false since I'd like to be able to monitor these additional intermediate metrics.

@arvoelke thanks for reporting this issue. This is a bug and fixed in #2121 .

QuanluZhang · 2020-03-06T13:37:07Z

merged, close

arvoelke mentioned this issue Mar 3, 2020

Can I report multiple metrics to the tuner? #1183

Closed

QuanluZhang self-assigned this Mar 4, 2020

QuanluZhang linked a pull request Mar 5, 2020 that will close this issue

make assessors support metric data in dict #2121

Merged

QuanluZhang closed this as completed Mar 6, 2020

arvoelke mentioned this issue Mar 9, 2020

Reporting multiple metrics and then resuming crashes SMAC tuner #2140

Closed

scarlett2018 added user raised Assessor bug Something isn't working labels Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MedianstopAssessor silently crashes dispatcher when reporting multiple metrics #2117

MedianstopAssessor silently crashes dispatcher when reporting multiple metrics #2117

arvoelke commented Mar 3, 2020 •

edited

Loading

QuanluZhang commented Mar 4, 2020

arvoelke commented Mar 4, 2020

QuanluZhang commented Mar 5, 2020

QuanluZhang commented Mar 6, 2020

MedianstopAssessor silently crashes dispatcher when reporting multiple metrics #2117

MedianstopAssessor silently crashes dispatcher when reporting multiple metrics #2117

Comments

arvoelke commented Mar 3, 2020 • edited Loading

QuanluZhang commented Mar 4, 2020

arvoelke commented Mar 4, 2020

QuanluZhang commented Mar 5, 2020

QuanluZhang commented Mar 6, 2020

arvoelke commented Mar 3, 2020 •

edited

Loading