Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

MedianstopAssessor silently crashes dispatcher when reporting multiple metrics #2117

Closed
arvoelke opened this issue Mar 3, 2020 · 4 comments · Fixed by #2121
Closed

MedianstopAssessor silently crashes dispatcher when reporting multiple metrics #2117

arvoelke opened this issue Mar 3, 2020 · 4 comments · Fixed by #2121
Assignees
Labels
bug Something isn't working user raised

Comments

@arvoelke
Copy link

arvoelke commented Mar 3, 2020

Short summary about the issue/question:

When reporting multiple metrics, e.g.,

nni.report_intermediate_result({'default': loss, 'other_metric': some_other_value})

the MedianstopAssessor errors which ends up silently crashing the dispatcher (which makes the WebUI unresponsive and stops the experiment from making any progress). See dispatcher.log:

[03/03/2020, 05:34:16 PM] WARNING (medianstop_Assessor/Thread-2) incorrect data type or value:
[03/03/2020, 05:34:16 PM] ERROR (medianstop_Assessor/Thread-2) float() argument must be a string or a number, not 'collections.OrderedDict'
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/medianstop_assessor/medianstop_assessor.py", line 95, in assess_trial
    num_trial_history = [float(ele) for ele in trial_history]
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/medianstop_assessor/medianstop_assessor.py", line 95, in <listcomp>
    num_trial_history = [float(ele) for ele in trial_history]
TypeError: float() argument must be a string or a number, not 'collections.OrderedDict'
[03/03/2020, 05:34:16 PM] ERROR (nni.msg_dispatcher/Thread-2) Assessor error
[03/03/2020, 05:34:16 PM] ERROR (nni.msg_dispatcher/Thread-2) local variable 'num_trial_history' referenced before assignment
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 206, in _handle_intermediate_metric_data
    result = self.assessor.assess_trial(trial_job_id, ordered_history)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/medianstop_assessor/medianstop_assessor.py", line 103, in assess_trial
    self._update_data(trial_job_id, num_trial_history)
UnboundLocalError: local variable 'num_trial_history' referenced before assignment
[03/03/2020, 05:34:16 PM] ERROR (nni.msg_dispatcher_base/Thread-2) local variable 'result' referenced before assignment
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 90, in command_queue_worker
    self.process_command(command, data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 149, in process_command
    command_handlers[command](data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 139, in handle_report_metric_data
    self._handle_intermediate_metric_data(data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 211, in _handle_intermediate_metric_data
    if isinstance(result, bool):
UnboundLocalError: local variable 'result' referenced before assignment
[03/03/2020, 05:34:17 PM] INFO (nni.msg_dispatcher_base/MainThread) Dispatcher exiting...
[03/03/2020, 05:34:20 PM] INFO (nni.msg_dispatcher_base/MainThread) Terminated by NNI manager

How to reproduce it:

Call nni.report_intermediate_result({'default': loss, 'other_metric': some_other_value}) while using the MedianstopAssessor.

nni Environment:

  • nni version: master
  • nni mode(local|pai|remote): local
  • OS: Ubuntu 18.04
  • python version: Python 3.7.6
  • is conda or virtualenv used?: conda
  • is running in docker?: no

Anything else we need to know:

The root of the issue is on this line:

num_trial_history = [float(ele) for ele in trial_history]

My work-around is to create a file called assessor.py that contains:

from nni.medianstop_assessor import MedianstopAssessor


class FixedMedianstopAssessor(MedianstopAssessor):

    def assess_trial(self, trial_job_id, trial_history):
        trial_history = [float(ele['default']) for ele in trial_history]
        return super().assess_trial(trial_job_id, trial_history)

and then change the assessor in the experimental config to:

assessor:
  codeDir: .
  classFileName: assessor.py
  className: FixedMedianstopAssessor
  classArgs:
    optimize_mode: ...
@QuanluZhang
Copy link
Contributor

@arvoelke thanks for reporting this issue. It seems builit-in assessors have not supported dict metric. We will fix this problem very soon.

@arvoelke
Copy link
Author

arvoelke commented Mar 4, 2020

Great, thanks. Also not sure if this is a different issue, but there appears to be some interaction with the tuner's includeIntermediateResults: true setting. My work-around is fine when it is false, but when true an early stop triggers the following in dispatcher.log:

ERROR (nni.msg_dispatcher_base/Thread-1) The input was of non-string type "<class 'collections.OrderedDict'>" in `json_tricks.load(s)`. Bytes cannot be automatically decoding since the encoding is not known. Recommended way is to instead encode the bytes to a string and pass that string to `load(s)`, for example bytevar.encode("utf-8") if utf-8 is the encoding. Alternatively you can force an attempt by passing conv_str_byte=True, but this may cause decoding issues.
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*i/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 90, in command_queue_worker
    self.process_command(command, data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 149, in process_command
    command_handlers[command](data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 134, in handle_report_metric_data
    data['value'] = json_tricks.loads(data['value'])
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/json_tricks/nonp.py", line 210, in loads
    .format(type(string)))
TypeError: The input was of non-string type "<class 'collections.OrderedDict'>" in `json_tricks.load(s)`. Bytes cannot be automatically decoding since the encoding is not known. Recommended way is to instead encode the bytes to a string and pass that string to `load(s)`, for example bytevar.encode("utf-8") if utf-8 is the encoding. Alternatively you can force an attempt by passing conv_str_byte=True, but this may cause decoding issues.

I can't find the root of the error in the logs. However, it is triggered at the exact same time that nni.report_intermediate_result({'default': loss, 'other_metric': some_other_value}) is called and an early stop occurs (according to trial logs). For now I'll revert to using includeIntermediateResults: false since I'd like to be able to monitor these additional intermediate metrics.

@QuanluZhang QuanluZhang linked a pull request Mar 5, 2020 that will close this issue
@QuanluZhang
Copy link
Contributor

Great, thanks. Also not sure if this is a different issue, but there appears to be some interaction with the tuner's includeIntermediateResults: true setting. My work-around is fine when it is false, but when true an early stop triggers the following in dispatcher.log:

ERROR (nni.msg_dispatcher_base/Thread-1) The input was of non-string type "<class 'collections.OrderedDict'>" in `json_tricks.load(s)`. Bytes cannot be automatically decoding since the encoding is not known. Recommended way is to instead encode the bytes to a string and pass that string to `load(s)`, for example bytevar.encode("utf-8") if utf-8 is the encoding. Alternatively you can force an attempt by passing conv_str_byte=True, but this may cause decoding issues.
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*i/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 90, in command_queue_worker
    self.process_command(command, data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 149, in process_command
    command_handlers[command](data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 134, in handle_report_metric_data
    data['value'] = json_tricks.loads(data['value'])
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/json_tricks/nonp.py", line 210, in loads
    .format(type(string)))
TypeError: The input was of non-string type "<class 'collections.OrderedDict'>" in `json_tricks.load(s)`. Bytes cannot be automatically decoding since the encoding is not known. Recommended way is to instead encode the bytes to a string and pass that string to `load(s)`, for example bytevar.encode("utf-8") if utf-8 is the encoding. Alternatively you can force an attempt by passing conv_str_byte=True, but this may cause decoding issues.

I can't find the root of the error in the logs. However, it is triggered at the exact same time that nni.report_intermediate_result({'default': loss, 'other_metric': some_other_value}) is called and an early stop occurs (according to trial logs). For now I'll revert to using includeIntermediateResults: false since I'd like to be able to monitor these additional intermediate metrics.

@arvoelke thanks for reporting this issue. This is a bug and fixed in #2121 .

@QuanluZhang
Copy link
Contributor

merged, close

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working user raised
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants