Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

WebUI displayed with no content #266

Closed
FalinLei opened this issue Oct 24, 2018 · 21 comments
Closed

WebUI displayed with no content #266

FalinLei opened this issue Oct 24, 2018 · 21 comments
Assignees
Labels
bug Something isn't working user raised WebUI

Comments

@FalinLei
Copy link

FalinLei commented Oct 24, 2018

My server is Ubuntu 16.04 , and I run a ngc tensorflow docker container on it.
After starting the experiment, it showed that the url of webui is http://10.10.62.8:8080 and http://127.0.0.1:8080. The first one had no response, so I mapped the port 8080 to my computer port 1018.
image
This is what the command "nnictl trial ls" shows, which means those trials all succeeded.
image
But there is no content in the webui.

@QuanluZhang
Copy link
Contributor

In v0.2 version, nni requires two ports: 51188 (restserver) and 8080 (webui). Mapping 8080 to 1018 allows you to open webui, then webui needs to send requests to restserver through port 51188, so you also need to map container's 51188 port to host's 51188 port.
In our upcoming v0.3, restserver and webui are merged together, using only one port 8080 (configurable).

@FalinLei
Copy link
Author

It works. Thank you!

@FalinLei
Copy link
Author

I just run the example in mnist-annotation folder, after training all trials there is test accuracy information in the log file.
image
While in the tab "Trial Status" the "loss/accuracy" of trials seems to be blank,
image
and the Intermediate Result Graph is blank too.
image
"Optimization Process" and ""Hyper Parameter" tabs are blank and show "no data".
image
image

@QuanluZhang
Copy link
Contributor

QuanluZhang commented Oct 24, 2018

  1. please provide nni's log by running nnictl log stderr, to see whether there is any errors or warnings.
  2. run nnictl trial ls to check whether there is final result displayed. If there is, the problem comes from webui.
  3. use for example chrome's inspect to check whether there is any error when loading the page.

@FalinLei
Copy link
Author

image

@QuanluZhang
Copy link
Contributor

there is no error in backend, then please try step 2 and 3 that I listed above.

@QuanluZhang
Copy link
Contributor

Also, could you open http://yourIP:51188/api/v1/nni/trial-jobs, to see whether final result exists?

@FalinLei
Copy link
Author

Thank you -_-

The command nnictl trial ls displays results of all trials.
image

And there is one error when loading the page.
image

http://yourIP:51188/api/v1/nni/trial-jobs also shows that final result exists.
image

@QuanluZhang
Copy link
Contributor

QuanluZhang commented Oct 24, 2018

Thanks. Could you also paste the content of ~/nni/experiments/<EXPERIMENT-ID>/log/nnimanager.log and ~/nni/experiments/<EXPERIMENT-ID>/log/dispatcher.log?

@QuanluZhang
Copy link
Contributor

QuanluZhang commented Oct 24, 2018

Thanks. This is not full log of ~/nni/experiments/<EXPERIMENT-ID>/log/nnimanager.log. Maybe github does not allow pasting so much data. Could you attach the log files by dropping?

@FalinLei
Copy link
Author

nnimanager.log
nnimanager.log
dispatcher.log
dispatcher.log

@microsoft microsoft deleted a comment from FalinLei Oct 24, 2018
@QuanluZhang
Copy link
Contributor

You used the example mnist_annotation/, right? Did you modify the code? Could you try the unmodified example mnist/ to check whether it has the same problem?

@FalinLei
Copy link
Author

Because the example mnist_annotation/ didn't work I changed the example to mnist/, I didn't modify the code.
I just tried both of them again, they had the same problem.

@QuanluZhang
Copy link
Contributor

Could you check the files under ~/nni/experiments/<EXPERIMENT-ID>/trials/<TRIAL-ID>/.nni/ , there should be two files: metrics and state. Could you paste the content of them?

@FalinLei
Copy link
Author

metrics file
ME000112{"sequence": 0, "value": 0.11349999904632568, "parameter_id": 3, "trial_job_id": "ftua9", "type": "PERIODICAL"}
ME000112{"sequence": 1, "value": 0.11349999904632568, "parameter_id": 3, "trial_job_id": "ftua9", "type": "PERIODICAL"}
ME000107{"sequence": 0, "value": 0.09799999743700027, "parameter_id": 3, "trial_job_id": "ftua9", "type": "FINAL"}

state file
0 1540372598378

@QuanluZhang
Copy link
Contributor

Hi @FalinLei , could you tell us how to get the docker image that you used, so that we can reproduce the problem?

@FalinLei
Copy link
Author

OK, the docker image that I used is nvcr.io/nvidia/tensorflow:18.06-py3. You can pull it from https://ngc.nvidia.com/registry/nvidia-tensorflow.

@QuanluZhang
Copy link
Contributor

QuanluZhang commented Oct 24, 2018

Hi @FalinLei , this problem is induced by that tail-stream package does not work in docker container. metrics file is changed, but tail-stream does not notice that. We will fix this problem very soon. Thanks for your assistance.

@QuanluZhang
Copy link
Contributor

QuanluZhang commented Oct 25, 2018

Hi @FalinLei , after double checking the problem, we suspect it is induced by host kernel version. Could you tell us the kernel version of your host machine? run uname -a, lsb_release -a, docker -v on that host machine.

@FalinLei
Copy link
Author

uname -a
Linux sugon96 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
docker -v
Docker version 18.03.1-ce, build 9ee9f40

@scarlett2018 scarlett2018 added question Further information is requested v0.2 labels Oct 26, 2018
@QuanluZhang
Copy link
Contributor

Hi @FalinLei , after further investigation, we found that this problem can only be reproduced on specific combination(s) of kernel version and docker version. We pushed a hot fix to v0.2, i.e., #273 , which can resolve the issue.

@scarlett2018 scarlett2018 added user raised WebUI and removed question Further information is requested labels Apr 14, 2020
@scarlett2018 scarlett2018 added the bug Something isn't working label Apr 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working user raised WebUI
Projects
None yet
Development

No branches or pull requests

3 participants