Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

[nas] fix issue introduced by the trial recovery feature #5109

Merged
merged 92 commits into from
Oct 12, 2022

Conversation

QuanluZhang
Copy link
Contributor

@QuanluZhang QuanluZhang commented Sep 5, 2022

fix bugs in #4931

@QuanluZhang QuanluZhang marked this pull request as ready for review September 5, 2022 13:28
pass
previous_max_param_id = self.recover_parameter_id(data)
self.parameters_count = previous_max_param_id
self._advisor_initialized = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that handld_add_customized_trial is never called and _advisor_initialized is never set to true?

Copy link
Contributor Author

@QuanluZhang QuanluZhang Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! handle_add_customized_trial is called when experiment is resumed (even no trial should be recovered), but will not be called when experiment is created, this is a bug introduced by me...
I moved this flag to handle_request_trial_jobs, which means if trial is not requested, send_trial will be blocked. And "request trial" will always be sent by nnimanager

@@ -41,7 +45,11 @@ def send_trial(parameters: dict, placement_constraint=None) -> int:
Send a new trial. Executed on tuner end.
Return a ID that is the unique identifier for this trial.
"""
return get_advisor().send_trial(parameters, placement_constraint)
advisor = get_advisor()
while not advisor.initialized:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest putting this into RetiariiAdvisor.send_trial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@ultmaster
Copy link
Contributor

ultmaster commented Sep 6, 2022

/azp run integration test - local - linux

@azure-pipelines
Copy link
Contributor

No pipelines are associated with this pull request.

1 similar comment
@azure-pipelines
Copy link
Contributor

No pipelines are associated with this pull request.

@ultmaster
Copy link
Contributor

/azp run integration test - remote - linux to linux

@azure-pipelines
Copy link
Contributor

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@ultmaster
Copy link
Contributor

/azp run integration test - local - linux

@azure-pipelines
Copy link
Contributor

No pipelines are associated with this pull request.

@Louis-J Louis-J merged commit bcc640c into microsoft:master Oct 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants