-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix remote reuse bugs #2981
Fix remote reuse bugs #2981
Changes from 44 commits
5982baa
5653624
e5ee726
1cb2349
e53d4ec
99b0c08
fe60ba5
52aa6ad
533c504
8f024fa
6a96403
7afab04
13bb9f2
6e1d822
5a91061
4faeb06
21aa3b5
ffa8c6b
4c49f60
1057ad2
0a52185
8b20632
f07f3f4
939b1d4
1f78669
7b7cadc
96d41e4
04f5645
beafde2
c3c6135
c0674dd
0e1bb25
4a6e1d4
9ed3cf4
4e57606
257627f
d2967e5
9880f79
bdbfa6e
0dd7c97
5cd22f0
7e54be7
4dc2a41
5cc6898
620e4ec
fbf758a
7c80d45
706b1b0
cb69b81
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -137,10 +137,13 @@ def is_running(self): | |
def kill(self, trial_id=None): | ||
if trial_id == self.id or trial_id is None: | ||
if self.process is not None: | ||
nni_log(LogType.Info, "%s: killing trial" % self.name) | ||
for child in psutil.Process(self.process.pid).children(True): | ||
child.kill() | ||
self.process.kill() | ||
try: | ||
nni_log(LogType.Info, "%s: killing trial" % self.name) | ||
for child in psutil.Process(self.process.pid).children(True): | ||
child.kill() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there is try catch, it's better to try catch for each kill. So that one fail won't effect others. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This error is used to catch psutil.Process(), not child kill. In some kind of scene, trial has already exited, and kill() command is sent later. Will throw process not exist error. |
||
self.process.kill() | ||
except Exception as ex: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we catch a more specific exception? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, add a NoSuchProcess. |
||
nni_log(LogType.Error, "kill trial %s failed: %s " % (trial_id, str(ex))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It likes a clean up, don't need error level log. debug or info is enough. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. psutil.NoSuchProcess is expected exit issue, use info level. For other kinds of unexpected issue, I think use error level is better. |
||
self.cleanup() | ||
|
||
def cleanup(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to release environment resource here, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a case that environment is submitted, but it starts slowly and hasn't start process and create pid file, the system call refresh function to read pid file, it will cause
no such file
exception.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it looks you don't need to check this file. Check if environment.isRunnerReady, then check the file. It will depend on first initialized message. And you set env status to running is too earlier. In remote, it's better to wait isRunnerReady first, then check file status, and set to running, success or failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, if the environment is failed to start, the isRunnerReady will always be false, but we need to refresh env status to failed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know it failed to start? You may can wait the pid file when initializing, instead of set env to running directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check process return code to detect if env is failed to start. Added detecting logic for pid file exist.