Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Got Segmentation fault when running amc_search.py #4000

Closed
twmht opened this issue Aug 2, 2021 · 7 comments
Closed

Got Segmentation fault when running amc_search.py #4000

twmht opened this issue Aug 2, 2021 · 7 comments
Assignees

Comments

@twmht
Copy link
Contributor

twmht commented Aug 2, 2021

Hi,

I have segmentation fault error when running amc_search

after debugging with faulthandler. It seems that the error is caused by scipy

Traceback (most recent call last):
    File "tools/amc_search.py", line 187, in <module>
    pruner.compress()
    File "/home/shared/nfs/acer-share/bushido/third_party/nni/nni/algorithms/compression/pytorch/pruning/amc/amc_pruner.py", line 210, in compress
    self.train(self.ddpg_args.train_episode, self.agent, self.env, self.output_dir)
    File "/home/shared/nfs/acer-share/bushido/third_party/nni/nni/algorithms/compression/pytorch/pruning/amc/amc_pruner.py", line 229, in train
    action  =  agent.select_action(observation, episode = episode)
    File "/home/shared/nfs/acer-share/bushido/third_party/nni/nni/algorithms/compression/pytorch/pruning/amc/lib/agent.py", line 186, in select_action
    action  =  self.sample_from_truncated_normal_distribution(lower = self.lbound, upper = self.rbound, mu = action, sigma = delta)
    File "/home/shared/nfs/acer-share/bushido/third_party/nni/nni/algorithms/compression/pytorch/pruning/amc/lib/agent.py", line 230, in sample_from_truncated_normal_distribution
    return stats.truncnorm.rvs((lower-mu)/sigma, (upper-mu)/sigma, loc = mu, scale = sigma, size = size)
    File "/home/acer/.pyenv/versions/pytorch/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 966, in rvs
    raise ValueError("Domain error in arguments.")
    ValueError: Domain error in arguments.

my scipy versio is 1.4.1 and nni version is v2.3.

any idea?

@linbinskn
Copy link
Contributor

Have you modified the code of amc_search.py? In the same setting, I have no bug here.

@twmht
Copy link
Contributor Author

twmht commented Aug 10, 2021

@linbinskn

No modification except I used my own dataset.

@twmht
Copy link
Contributor Author

twmht commented Aug 10, 2021

@linbinskn

The error is from stats.truncnorm.rvs, which version of scipy you used?

@linbinskn
Copy link
Contributor

I have tried 1.4.1, everything seems fine.

@twmht
Copy link
Contributor Author

twmht commented Aug 13, 2021

@linbinskn
Copy link
Contributor

Exploding gradients will lead to NaN value.

@twmht
Copy link
Contributor Author

twmht commented Aug 20, 2021

@linbinskn

I switch to torch1.7.0 from torch.1.8.0 and the error is gone. there might be some problems between torch1.8.0 and nni v2.3.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants