Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为什么训练时出现这种警告,特别长一串红色warnning #3353

Closed
3 tasks done
loxoo6 opened this issue Jul 4, 2023 · 46 comments
Closed
3 tasks done

为什么训练时出现这种警告,特别长一串红色warnning #3353

loxoo6 opened this issue Jul 4, 2023 · 46 comments
Assignees
Labels
bug Something isn't working GoodFirstIssue

Comments

@loxoo6
Copy link

loxoo6 commented Jul 4, 2023

问题确认 Search before asking

Bug描述 Describe the Bug

Uploading image.png…

复现环境 Environment

paddlepaddle:2.3.2
paddleseg:2.7

Bug描述确认 Bug description confirmation

  • 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息,确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR? Are you willing to submit a PR?

  • 我愿意提交PR!I'd like to help by submitting a PR!
@loxoo6 loxoo6 added the bug Something isn't working label Jul 4, 2023
@Asthestarsfalll
Copy link
Contributor

你好,图片链接失效

@gitlonglong
Copy link

你好,图片链接失效

我也遇到了,这样的:
I0706 13:09:12.768538 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0706 13:09:13.075042 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0706 13:09:13.382442 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0706 13:09:13.688205 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0706 13:09:13.996582 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0706 13:09:14.305588 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0706 13:09:14.611194 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0706 13:09:14.919446 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0706 13:09:15.227041 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0706 13:09:15.533654 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
2023-07-06 13:09:15 [INFO] [TRAIN] epoch: 1, iter: 60/40000, loss: 1.8980, lr: 0.009987, batch_cost: 0.3071, reader_cost: 0.00020, ips: 13.0255 samples/sec | ETA 03:24:25

可是代码里没有用Tensor.numpy()[0]啊

@loxoo6
Copy link
Author

loxoo6 commented Jul 7, 2023

降低一些版本就行了
paddlepaddle/paddle:2.4.2-gpu-cuda11.7-cudnn8.4-trt8.4

@shiyutang
Copy link
Collaborator

可参考:PaddlePaddle/PaddleOCR#10302

@shiyutang
Copy link
Collaborator

以上回答已经充分解答了问题,如果有新的问题欢迎随时提交issue,或者在此条issue下继续回复~
我们开启了飞桨套件的ISSUE攻关活动,欢迎感兴趣的开发者参加:PaddlePaddle/PaddleOCR#10223

@Asthestarsfalll
Copy link
Contributor

Asthestarsfalll commented Jul 7, 2023

你好,图片链接失效

我也遇到了,这样的: I0706 13:09:12.768538 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:13.075042 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:13.382442 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:13.688205 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:13.996582 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:14.305588 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:14.611194 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:14.919446 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:15.227041 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:15.533654 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. 2023-07-06 13:09:15 [INFO] [TRAIN] epoch: 1, iter: 60/40000, loss: 1.8980, lr: 0.009987, batch_cost: 0.3071, reader_cost: 0.00020, ips: 13.0255 samples/sec | ETA 03:24:25

可是代码里没有用Tensor.numpy()[0]啊

请问你是训练的是什么模型呢?这里只是抛出了警告:目前版本会隐式将0-D tensor转换为1-D tensor,2.6以后的版本将会直接抛出错误。实际并不影响训练过程。

@gitlonglong
Copy link

你好,图片链接失效

我也遇到了,这样的: I0706 13:09:12.768538 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:13.075042 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:13.382442 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:13.688205 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:13.996582 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:14.305588 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:14.611194 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:14.919446 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:15.227041 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0706 13:09:15.533654 3772 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. 2023-07-06 13:09:15 [INFO] [TRAIN] epoch: 1, iter: 60/40000, loss: 1.8980, lr: 0.009987, batch_cost: 0.3071, reader_cost: 0.00020, ips: 13.0255 samples/sec | ETA 03:24:25
可是代码里没有用Tensor.numpy()[0]啊

请问你是训练的是什么模型呢?这里只是抛出了警告:目前版本会隐式将0-D tensor转换为1-D tensor,2.6以后的版本将会直接抛出错误。实际并不影响训练过程。

我是在AIstudio中直接用paddleseg训练voc12数据集时出错,2.5及以上版本都会有这个警告,2.4及以下的没试过。使用‘export FLAGS_set_to_1d=False’就没有这个警告信息了,但是验证的时候会报错。
报错发现是对一个数而不是一个列表去取它的第0个元素:
1./home/aistudio/PaddleSeg/paddleseg/core/train.py 文件内'avg_loss += loss.numpy()[0]'改成‘avg_loss += loss.numpy()’
2.‘avg_loss_list = [l[0] / log_iters for l in avg_loss_list]’也是要改成‘avg_loss_list = [l / log_iters for l in avg_loss_list]’。
3.metrics.py文件里的‘pred_area.append(paddle.sum(paddle.cast(pred_i, "int32")))’也要改成‘pred_area.append(paddle.sum(paddle.cast(pred_i, "int32")).unsqueeze(0))’
这样才能正常运行。

我是直接下载的paddleseg,唯一的修改是将/home/aistudio/PaddleSeg/configs/deeplabv3p/deeplabv3p_resnet50_os8_voc12aug_512x512_40k.yml文件里的‘base: '../base/pascal_voc12aug.yml'修改为了‘base: '../base/pascal_voc12.yml'’。
其他数据集还没试过,在voc12数据集会这样。

另外使用‘pascal_voc12aug.yml’的时候是要先运行/home/aistudio/PaddleSeg/tools/voc_augment.py文件是吗?但我运行时总会报网络连接错误,下载不下来benchmark.tgz,正在考虑浏览器下载了再当做数据集传进去。

@Asthestarsfalll
Copy link
Contributor

@gitlonglong

  1. 请问你使用的paddleseg版本是多少呢?.numpy()[0]的问题应该在这个PR中修复了;
  2. benchmark.tgz可以在aistudio的数据集中找一下,我找了一个,你可以试试看能不能用,直接添加到数据中就行了:https://aistudio.baidu.com/aistudio/datasetdetail/65497。

@gitlonglong
Copy link

@gitlonglong

  1. 请问你使用的paddleseg版本是多少呢?.numpy()[0]的问题应该在这个PR中修复了;
  2. benchmark.tgz可以在aistudio的数据集中找一下,我找了一个,你可以试试看能不能用,直接添加到数据中就行了:https://aistudio.baidu.com/aistudio/datasetdetail/65497。

@Asthestarsfalll
好的,我试试,多谢!
我一开始用的2.6,后来换2.8了。而且我又看了一下,2.7和2.8是修复了这个bug,但是用voc数据集的时候还是会报错,在train.py里的这一行:avg_loss_list = [l[0] / log_iters for l in avg_loss_list](l[0]的写法会报错,换成l就可以)
另外metrics.py文件里也会报错,像我上面说的那样改才能在VOC12上跑得起来。

@Asthestarsfalll
Copy link
Contributor

@gitlonglong 方便的话可以分享一下aistudio的项目,我来排查一下问题所在

@shiyutang shiyutang reopened this Jul 10, 2023
@a-strong-python
Copy link

我也出现了这个问题,paddleseg版本为2.8

I0712 14:24:21.198009 19039 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0712 14:24:21.626497 19039 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0712 14:24:21.626669 19039 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0712 14:24:21.626710 19039 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0712 14:24:22.062738 19039 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0712 14:24:22.062984 19039 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0712 14:24:22.063066 19039 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.

@a-strong-python
Copy link

当程序运行到第一次保存模型的轮数时,会报下面的错误
2023-07-12 14:26:44 [INFO] [TRAIN] epoch: 17, iter: 500/10000, loss: 1.8821, lr: 0.000972, batch_cost: 0.4265, reader_cost: 0.30989, ips: 14.0681 samples/sec | ETA 01:07:31
2023-07-12 14:26:44 [INFO] Start evaluating (total_samples: 45, total_iters: 45)...
Traceback (most recent call last):
File "/home/aistudio/PaddleSeg/tools/train.py", line 195, in
main(args)
File "/home/aistudio/PaddleSeg/tools/train.py", line 170, in main
train(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/train.py", line 315, in train
mean_iou, acc, _, _, _ = evaluate(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/val.py", line 161, in evaluate
intersect_area, pred_area, label_area = metrics.calculate_area(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/utils/metrics.py", line 57, in calculate_area
pred_area = paddle.concat(pred_area)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/manipulation.py", line 1121, in concat
return _C_ops.concat(input, axis)
ValueError: (InvalidArgument) The axis is expected to be in range of [0, 0), but got 0
[Hint: Expected axis >= -rank && axis < rank == true, but received axis >= -rank && axis < rank:0 != true:1.] (at ../paddle/phi/infermeta/multiary.cc:961)

@a-strong-python
Copy link

2023-07-12 16:31:44 [INFO] [TRAIN] epoch: 12, iter: 500/1000, loss: 0.3831, lr: 0.000537, batch_cost: 0.1358, reader_cost: 0.07347, ips: 29.4629 samples/sec | ETA 00:01:07
2023-07-12 16:31:44 [INFO] Start evaluating (total_samples: 45, total_iters: 45)...
Traceback (most recent call last):
File "/home/aistudio/PaddleSeg/tools/train.py", line 195, in
main(args)
File "/home/aistudio/PaddleSeg/tools/train.py", line 170, in main
train(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/train.py", line 315, in train
mean_iou, acc, _, _, _ = evaluate(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/val.py", line 161, in evaluate
intersect_area, pred_area, label_area = metrics.calculate_area(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/utils/metrics.py", line 57, in calculate_area
pred_area = paddle.concat(pred_area)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/manipulation.py", line 1121, in concat
return _C_ops.concat(input, axis)
ValueError: (InvalidArgument) The axis is expected to be in range of [0, 0), but got 0
[Hint: Expected axis >= -rank && axis < rank == true, but received axis >= -rank && axis < rank:0 != true:1.] (at ../paddle/phi/infermeta/multiary.cc:961)

@Asthestarsfalll
Copy link
Contributor

2023-07-12 16:31:44 [INFO] [TRAIN] epoch: 12, iter: 500/1000, loss: 0.3831, lr: 0.000537, batch_cost: 0.1358, reader_cost: 0.07347, ips: 29.4629 samples/sec | ETA 00:01:07 2023-07-12 16:31:44 [INFO] Start evaluating (total_samples: 45, total_iters: 45)... Traceback (most recent call last): File "/home/aistudio/PaddleSeg/tools/train.py", line 195, in main(args) File "/home/aistudio/PaddleSeg/tools/train.py", line 170, in main train( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/train.py", line 315, in train mean_iou, acc, _, _, _ = evaluate( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/val.py", line 161, in evaluate intersect_area, pred_area, label_area = metrics.calculate_area( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/utils/metrics.py", line 57, in calculate_area pred_area = paddle.concat(pred_area) File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/manipulation.py", line 1121, in concat return _C_ops.concat(input, axis) ValueError: (InvalidArgument) The axis is expected to be in range of [0, 0), but got 0 [Hint: Expected axis >= -rank && axis < rank == true, but received axis >= -rank && axis < rank:0 != true:1.] (at ../paddle/phi/infermeta/multiary.cc:961)

看起来是pred_area都是0dim tensor,可以重新提一个issue

@ToddBear
Copy link
Collaborator

以上回答已经充分解答了问题,如果有新的问题欢迎随时提交issue,或者在此条issue下继续回复~
我们开启了飞桨套件的ISSUE攻关活动,欢迎感兴趣的开发者参加:PaddlePaddle/PaddleOCR#10223

@gitlonglong
Copy link

@gitlonglong 方便的话可以分享一下aistudio的项目,我来排查一下问题所在

@Asthestarsfalll
不用我的aistudio项目,你随便创一个项目,10分钟应该就能弄好,跑一下就能发现问题了。

@gitlonglong
Copy link

2023-07-12 16:31:44 [INFO] [TRAIN] epoch: 12, iter: 500/1000, loss: 0.3831, lr: 0.000537, batch_cost: 0.1358, reader_cost: 0.07347, ips: 29.4629 samples/sec | ETA 00:01:07 2023-07-12 16:31:44 [INFO] Start evaluating (total_samples: 45, total_iters: 45)... Traceback (most recent call last): File "/home/aistudio/PaddleSeg/tools/train.py", line 195, in main(args) File "/home/aistudio/PaddleSeg/tools/train.py", line 170, in main train( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/train.py", line 315, in train mean_iou, acc, _, _, _ = evaluate( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/val.py", line 161, in evaluate intersect_area, pred_area, label_area = metrics.calculate_area( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/utils/metrics.py", line 57, in calculate_area pred_area = paddle.concat(pred_area) File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/manipulation.py", line 1121, in concat return _C_ops.concat(input, axis) ValueError: (InvalidArgument) The axis is expected to be in range of [0, 0), but got 0 [Hint: Expected axis >= -rank && axis < rank == true, but received axis >= -rank && axis < rank:0 != true:1.] (at ../paddle/phi/infermeta/multiary.cc:961)

@a-strong-python
按照我上面说的改,给升一个维度,就可以成功运行了。像这样:pred_area.append(paddle.sum(paddle.cast(pred_i, "int32")).unsqueeze(0))

@a-strong-python
Copy link

我都改了,还是报错的,直接跑的官方例子,不知道出了啥问题

@Asthestarsfalll
Copy link
Contributor

Asthestarsfalll commented Aug 16, 2023

@gitlonglong @a-strong-python

似乎是PYPI上的paddleseg有问题,如果是通过源码安装则没有这个问题

@KKWY0909
Copy link

@gitlonglong
我使用源码安装的2.8版本,按照您的方法进行修改,但是在自己的数据集上训练时仍会出现这种warning

@gitlonglong
Copy link

@gitlonglong 我使用源码安装的2.8版本,按照您的方法进行修改,但是在自己的数据集上训练时仍会出现这种warning

@KKWY0909
源码安装之后调用的函数是原本的固定的了,而不是你修改后的。有两个方法解决,第一个方法是每次修改后重新安装一遍,第二种是使用低一些的版本,比如2.6,每次修改就可以直接用

@Ericgone
Copy link

可参考:PaddlePaddle/PaddleOCR#10302

这个参考链接没有任何价值啊!!! 根本不是针对那个警告的

@Asthestarsfalll
Copy link
Contributor

Asthestarsfalll commented Aug 24, 2023

@Ericgone
这里只是抛出了警告:目前版本会隐式将0-D tensor转换为1-D tensor,2.6以后的版本将会直接抛出错误。实际并不影响训练过程。

可以通过export FLAGS_set_to_1d=False来忽略警告

@TerryBryant
Copy link

@Ericgone 这里只是抛出了警告:目前版本会隐式将0-D tensor转换为1-D tensor,2.6以后的版本将会直接抛出错误。实际并不影响训练过程。

可以通过export FLAGS_set_to_1d=False来忽略警告

这个警告会把你的训练log日志占满,可以控制它只出现一次吗?

@Asthestarsfalll
Copy link
Contributor

@Ericgone 这里只是抛出了警告:目前版本会隐式将0-D tensor转换为1-D tensor,2.6以后的版本将会直接抛出错误。实际并不影响训练过程。
可以通过export FLAGS_set_to_1d=False来忽略警告

这个警告会把你的训练log日志占满,可以控制它只出现一次吗?

  1. 尝试使用 export FLAGS_set_to_1d=False
  2. 更换更低版本的paddle

@TerryBryant
Copy link

TerryBryant commented Aug 30, 2023

@Ericgone 这里只是抛出了警告:目前版本会隐式将0-D tensor转换为1-D tensor,2.6以后的版本将会直接抛出错误。实际并不影响训练过程。
可以通过export FLAGS_set_to_1d=False来忽略警告

这个警告会把你的训练log日志占满,可以控制它只出现一次吗?

  1. 尝试使用 export FLAGS_set_to_1d=False
  2. 更换更低版本的paddle

方法一:Failed, NCCL error ../paddle/fluid/distributed/collective/process_group_nccl.cc:660 'internal error'
LAUNCH INFO 2023-08-30 13:52:47,376 Exit code 1
方法二,从paddle2.5.1降级到2.4.2有效,感谢。

@henryccl
Copy link

henryccl commented Sep 20, 2023

@Ericgone 这里只是抛出了警告:目前版本会隐式将0-D tensor转换为1-D tensor,2.6以后的版本将会直接抛出错误。实际并不影响训练过程。

可以通过export FLAGS_set_to_1d=False来忽略警告

我使用paddleseg训练自己的数据集遇到同样的问题(我使用官方提供的PaddleSeg-release-2.8进行训练,除了数据集配置并无其他代码修改), 但是这个命令并没有用,另外在aistudio中将paddle2.5改为2.4也无法解决这个问题
image

@KKWY0909
Copy link

@henryccl 可以试试使用PaddleSeg-release-2.8.1版本

@henryccl
Copy link

@henryccl 可以试试使用PaddleSeg-release-2.8.1版本

感谢!2.8.1确实没报错了。

@jason660519
Copy link

請問,2.8.1去哪安裝? 我怎裝都是2.8.0...

@JiFfeng-Yu
Copy link

请问怎么下载PaddleSeg-release-2.8.1

@JiFfeng-Yu
Copy link

請問,2.8.1去哪安裝? 我怎裝都是2.8.0...

我也是,,我找到的都是PaddleSeg-release-2.8.0

@Asthestarsfalll
Copy link
Contributor

github clone 再从本地安装

@henryccl
Copy link

請問,2.8.1去哪安裝? 我怎裝都是2.8.0...

我也是,,我找到的都是PaddleSeg-release-2.8.0

image

@Lee6384
Copy link

Lee6384 commented Nov 6, 2023

@henryccl 可以试试使用PaddleSeg-release-2.8.1版本

感谢!2.8.1确实没报错了。

你好,请问为什么我用2.8.1还是会报这一长串的错误,还需要其他操作吗

@Lee6384
Copy link

Lee6384 commented Nov 6, 2023

2023-07-12 16:31:44 [INFO] [TRAIN] epoch: 12, iter: 500/1000, loss: 0.3831, lr: 0.000537, batch_cost: 0.1358, reader_cost: 0.07347, ips: 29.4629 samples/sec | ETA 00:01:07 2023-07-12 16:31:44 [INFO] Start evaluating (total_samples: 45, total_iters: 45)... Traceback (most recent call last): File "/home/aistudio/PaddleSeg/tools/train.py", line 195, in main(args) File "/home/aistudio/PaddleSeg/tools/train.py", line 170, in main train( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/train.py", line 315, in train mean_iou, acc, _, _, _ = evaluate( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/core/val.py", line 161, in evaluate intersect_area, pred_area, label_area = metrics.calculate_area( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddleseg/utils/metrics.py", line 57, in calculate_area pred_area = paddle.concat(pred_area) File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/manipulation.py", line 1121, in concat return _C_ops.concat(input, axis) ValueError: (InvalidArgument) The axis is expected to be in range of [0, 0), but got 0 [Hint: Expected axis >= -rank && axis < rank == true, but received axis >= -rank && axis < rank:0 != true:1.] (at ../paddle/phi/infermeta/multiary.cc:961)

你好,我也出现了你上面的两个问题,请问你最后是怎样解决的?

@yuzu16
Copy link

yuzu16 commented Dec 30, 2023

你好,请问解决了吗

@Lee6384
Copy link

Lee6384 commented Jan 2, 2024

你好,请问解决了吗
版本的问题,paddleseg和paddlepaddle都要调整,另一个帖子里面有提到一些具体的操作

@yuzu16
Copy link

yuzu16 commented Jan 2, 2024

能指路一下调整版本的链接吗?谢谢。

@yuzu16
Copy link

yuzu16 commented Jan 3, 2024

2023-07-12 16:31:44 [INFO] [TRAIN] 纪元:12,iter:500/1000,损失:0.3831,lr:0.000537,batch_cost:0.1358,reader_cost:0.07347,ips:29.4629 样本/秒 | ETA 00:01:07 2023-07-12 16:31:44 [INFO] 开始评估(total_samples:45,total_iters:45)... 回溯(最近一次调用): 文件“/home/aistudio/PaddleSeg/ tools/train.py”,第 195 行,在 main(args) 文件“/home/aistudio/PaddleSeg/tools/train.py”,第 170 行,在主 列车中( 文件“/opt/conda/envs/python35-paddle120” -env/lib/python3.10/site-packages/paddleseg/core/train.py”,第 315 行,火车中 mean_iou,acc,,_ =评估( 文件“/opt/conda/envs/python35- paddle120-env/lib/python3.10/site-packages/paddleseg/core/val.py",第 161 行,在评估 intersect_area、pred_area、label_area =metrics.calculate_area( 文件 "/opt/conda/envs/python35-paddle120 -env/lib/python3.10/site-packages/paddleseg/utils/metrics.py”,第 57 行,在calculate_area pred_area = paddle.concat(pred_area) 文件“/opt/conda/envs/python35-paddle120-env/ lib/python3.10/site-packages/paddle/tensor/manipulation.py”,第 1121 行,连续 返回 _C_ops.concat(input, axis) ValueError: (InvalidArgument) 轴预计在 [0, 0),但得到 0 [提示:预期轴 >​​= -rank && axis <rank == true,但收到 axis >= -rank && axis <rank:0 != true:1.] (at ../paddle/ phi/infermeta/multiary.cc:961)

您好,请问这个报错问题解决了吗?能方便告诉一下如何解决的吗?

@yuzu16
Copy link

yuzu16 commented Jan 3, 2024

你好,请问解决了吗
版本的问题,paddleseg和paddlepaddle都要调整,另一个帖子里面有提到一些具体的操作

好的,收到,就是版本的问题吗?方便指路一下版本修改链接吗?

@Lee6384
Copy link

Lee6384 commented Jan 3, 2024

2023-07-12 16:31:44 [INFO] [TRAIN] 纪元:12,iter:500/1000,损失:0.3831,lr:0.000537,batch_cost:0.1358,reader_cost:0.07347,ips:29.4629 样本/秒 | ETA 00:01:07 2023-07-12 16:31:44 [INFO] 开始评估(total_samples:45,total_iters:45)... 回溯(最近一次调用): 文件“/home/aistudio/PaddleSeg/ tools/train.py”,第 195 行,在 main(args) 文件“/home/aistudio/PaddleSeg/tools/train.py”,第 170 行,在主 列车中( 文件“/opt/conda/envs/python35-paddle120” -env/lib/python3.10/site-packages/paddleseg/core/train.py”,第 315 行,火车中 mean_iou,acc,,_ =评估( 文件“/opt/conda/envs/python35- paddle120-env/lib/python3.10/site-packages/paddleseg/core/val.py",第 161 行,在评估 intersect_area、pred_area、label_area =metrics.calculate_area( 文件 "/opt/conda/envs/python35-paddle120 -env/lib/python3.10/site-packages/paddleseg/utils/metrics.py”,第 57 行,在calculate_area pred_area = paddle.concat(pred_area) 文件“/opt/conda/envs/python35-paddle120-env/ lib/python3.10/site-packages/paddle/tensor/manipulation.py”,第 1121 行,连续 返回 _C_ops.concat(input, axis) ValueError: (InvalidArgument) 轴预计在 [0, 0),但得到 0 [提示:预期轴 >​​= -rank && axis <rank == true,但收到 axis >= -rank && axis <rank:0 != true:1.] (at ../paddle/ phi/infermeta/multiary.cc:961)

您好,请问这个报错问题解决了吗?能方便告诉一下如何解决的吗?

#3408 这个里面有关于版本的解决方法,我也出现过你上面的报错,可以解决

@yuzu16
Copy link

yuzu16 commented Jan 3, 2024

2023-07-12 16:31:44 [INFO] [TRAIN] 纪元:12,iter:500/1000,损失:0.3831,lr:0.000537,batch_cost:0.1358,reader_cost:0.07347,ips:29.4629 样本/秒 | ETA 00:01:07 2023-07-12 16:31:44 [INFO] 开始评估(total_samples:45,total_iters:45)...回溯(最近一次调用):文件“/home/aistudio/PaddleSeg/tools /train.py”,第 195 行,在 main(args) 文件“/home/aistudio/PaddleSeg/tools/train.py”,第 170 行,在主列车中(文件“/opt/conda/envs/python35” ” -paddle120” -env/lib/python3.10/site-packages/paddleseg/core/train.py”,第315行,火车中mean_iou,acc,,_ =评估(文件“/opt/conda/envs” / python35- paddle120-env/lib/python3.10/site-packages/paddleseg/core/val.py",第161行,在评估 intersect_area、pred_area、label_area =metrics.calculate_area( 文件 "/opt/conda/envs / python35-paddle120 -env/lib/python3.10/site-packages/paddleseg/utils/metrics.py”,第57行,在calculate_area pred_area = paddle.concat(pred_area)文件“/opt/conda/envs/python35 - paddle120-env/ lib/python3.10/site-packages/paddle/tensor/manipulation.py”,第 1121 行,返回连续 _C_ops.concat(input, axis) ValueError: (InvalidArgument) 轴前锋在 [0, 0 ) ,但得到 0 [提示:预期轴 >​​​​​​= -rank && axis <rank == true,但收到 axis >= -rank && axis <rank:0 != true:1.] (at . ./桨/phi/infermeta/multiary.cc:961)

您好,请问这个报错问题解决了吗?能方便告知一下如何解决的吗?

#3408这里面有关于版本的解决方法,我也出现过你上面的报错,可以解决

好的 谢谢

@yuzu16
Copy link

yuzu16 commented Jan 3, 2024

当程序运行到第一个保存模型的轮数时,会报下面的错误 2023-07-12 14:26:44 [INFO] [TRAIN] epoch: 17, iter: 500/10000, loss: 1.8821, lr :0.000972,batch_cost:0.4265,reader_cost:0.30989,ips:14.0681样本/秒| ETA 01:07:31 2023-07-12 14:26:44 [INFO] 开始评估(total_samples:45,total_iters:45)... 回溯(最近一次调用): 文件“/home/aistudio/PaddleSeg/tools /train.py”,第195行,在 main(args) 文件“/home/aistudio/PaddleSeg/tools/train.py”,第170行,在主 列车中( 文件“/opt/conda/envs/python35” -paddle120” -env/lib/python3.10/site-packages/paddleseg/core/train.py”,第315行,火车中 mean_iou,acc,,_ =评估( 文件“/opt/conda/ envs/python35- paddle120-env/lib/python3.10/site-packages/paddleseg/core/val.py",第161行,在评估 intersect_area、pred_area、label_area =metrics.calculate_area( 文件 "/opt/conda/ envs/python35-paddle120 -env/lib/python3.10/site-packages/paddleseg/utils/metrics.py”,第57行,在calculate_area pred_area = paddle.concat(pred_area) 文件“/opt/conda/envs/ python35-paddle120-env/ lib/python3.10/site-packages/paddle/tensor/manipulation.py”,第 1121 行,连续 返回 _C_ops.concat(input, axis) ValueError: (InvalidArgument) 轴最前面在 [0, 0),但得到 0 [提示:预期轴 >​​​​= -rank && axis <rank == true,但收到 axis >= -rank && axis <rank:0 != true:1.] (at .. /桨/ phi/infermeta/multiary.cc:961)

你好 请问这个问题解决了吗 怎么解决的?

@Unlicensed-driver-ljx
Copy link

pocr WARNING: The shape of model params head.before_gtc.1.fc.weight [480, 384] not matched with loaded params head.before_gtc.1.fc.weight [1024, 384] !解决了吗

@TingquanGao
Copy link
Collaborator

Thanks for this issue. As it has been inactive for a long time, we would close it. If you has any questions, please feel free to reopen or new issue, and we will follow up and resolve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working GoodFirstIssue
Projects
None yet
Development

No branches or pull requests