Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer在最后一个epoch之后卡住, tester报错 #259

Closed
Zhiyu-Chen opened this issue Dec 8, 2019 · 2 comments
Closed

Trainer在最后一个epoch之后卡住, tester报错 #259

Zhiyu-Chen opened this issue Dec 8, 2019 · 2 comments

Comments

@Zhiyu-Chen
Copy link

Zhiyu-Chen commented Dec 8, 2019

在跑matching_esim.py这个代码,运行trainer.train()最后一个epoch之后不会返回,一直卡住。
image

如果用validation set或者在跑tester.test()也会报错

Traceback (most recent call last):
File "", line 2, in
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/fastNLP/core/tester.py", line 165, in test
pred_dict = self._data_forward(self._predict_func, batch_x)
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/fastNLP/core/tester.py", line 213, in _data_forward
y = self._predict_func_wrapper(**x)
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/fastNLP/core/_parallel_utils.py", line 91, in wrapper
outputs = parallel_apply(replicas, func_name, inputs, kwargs, device_ids[:len(replicas)])
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/fastNLP/core/_parallel_utils.py", line 71, in parallel_apply
raise output
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/fastNLP/core/parallel_utils.py", line 47, in worker
output = getattr(module, func_name)(*input, **kwargs)
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/fastNLP/models/snli.py", line 118, in predict
pred = self.forward(**kwargs)[Const.OUTPUT].argmax(-1)
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/fastNLP/models/snli.py", line 84, in forward
a = self.rnn(a0, mask1.byte()) # a: [B, PL, 2 * H]
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/fastNLP/models/snli.py", line 167, in forward
self.rnn.flatten_parameters()
File "/home/colozoy/.conda/envs/p3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 120, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: set_storage is not allowed on a Tensor created from .data or .detach().
If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)
without autograd tracking the change, remove the .data / .detach() call and wrap the change in a with torch.no_grad(): block.
For example, change:
x.data.set
(y)
to:
with torch.no_grad():
x.set
(y)

@Zhiyu-Chen
Copy link
Author

training 卡住的bug突然莫名其妙好了。。
但是tester的bug还是在,可能是DataParallel的问题,pytorch的bug:pytorch/pytorch#21108

@xuyige
Copy link
Member

xuyige commented Dec 12, 2019

您好,感谢您的issue
这个文件的代码是在1.0.0版本的基础上写的,更新到1.3.0之后出现了一些问题,因此我之前对它做了一个小的修改。目前来说建议使用1.2.0及以下版本跑这个脚本,1.3.1版本个人没有使用所以不大确定,1.3.0应当是确认可能会出现该问题

@xuyige xuyige closed this as completed Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants