You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
W0724 09:40:35.476796 7755 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0724 09:40:35.476837 7755 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2023-07-24 09:40:36 [INFO] Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz
Connecting to https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz
Downloading resnet18_vd_ssld_v2.tar.gz
[==================================================] 100.00%
Uncompress resnet18_vd_ssld_v2.tar.gz
[==================================================] 100.00%
2023-07-24 09:40:38 [INFO] There are 115/115 variables loaded into ResNet_vd.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:654: UserWarning: When training, we now always track global mean and variance.
"When training, we now always track global mean and variance.")
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int64, the right dtype will convert to paddle.float32
format(lhs_dtype, rhs_dtype, lhs_dtype))
2023-07-24 09:40:55 [INFO] [TRAIN] epoch: 1, iter: 10/160000, loss: 3.4389, lr: 0.009999, batch_cost: 1.6020, reader_cost: 1.16270, ips: 2.4968 samples/sec | ETA 71:11:48
2023-07-24 09:41:10 [INFO] [TRAIN] epoch: 1, iter: 20/160000, loss: 5.1390, lr: 0.009999, batch_cost: 1.4837, reader_cost: 1.32117, ips: 2.6959 samples/sec | ETA 65:56:05
2023-07-24 09:41:25 [INFO] [TRAIN] epoch: 1, iter: 30/160000, loss: 2.9624, lr: 0.009998, batch_cost: 1.5491, reader_cost: 1.39400, ips: 2.5821 samples/sec | ETA 68:50:10
2023-07-24 09:41:40 [INFO] [TRAIN] epoch: 1, iter: 40/160000, loss: 2.4292, lr: 0.009998, batch_cost: 1.4487, reader_cost: 1.30202, ips: 2.7611 samples/sec | ETA 64:22:17
2023-07-24 09:41:55 [INFO] [TRAIN] epoch: 1, iter: 50/160000, loss: 2.5561, lr: 0.009997, batch_cost: 1.5167, reader_cost: 1.35702, ips: 2.6373 samples/sec | ETA 67:23:18
2023-07-24 09:41:55 [INFO] Start evaluating (total_samples: 30, total_iters: 30)...
Traceback (most recent call last):
File "tools/train.py", line 262, in
main(args)
File "tools/train.py", line 257, in main
to_static_training=cfg.to_static_training)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/core/train.py", line 289, in train
**test_config)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/core/val.py", line 165, in evaluate
ignore_index=eval_dataset.ignore_index)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/utils/metrics.py", line 43, in calculate_area
label.shape))
ValueError: Shape of pred and `label should be equal, but there are [1, 4032, 2272] and [1, 4032, 2268].
terminate called without an active exception
C++ Traceback (most recent call last):
No stack trace in paddle, may be caused by external reasons.
Error Message Summary:
FatalError: Process abort signal is detected by the operating system.
[TimeInfo: *** Aborted at 1690162917 (unix time) try "date -d @1690162917" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x3e800001e4b) received by PID 7755 (TID 0x7f04ccaae700) from PID 7755 ***]
我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息,确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.
是否愿意提交PR? Are you willing to submit a PR?
我愿意提交PR!I'd like to help by submitting a PR!
The text was updated successfully, but these errors were encountered:
问题确认 Search before asking
Bug描述 Describe the Bug
2023-07-24 09:40:35 [INFO]
------------Environment Information-------------
platform: Linux-4.15.0-140-generic-x86_64-with-debian-stretch-sid
Python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Paddle compiled with cuda: True
NVCC: Build cuda_11.2.r11.2/compiler.29618528_0
cudnn: 8.2
GPUs used: 1
CUDA_VISIBLE_DEVICES: None
GPU: ['GPU 0: Tesla V100-SXM2-32GB']
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
PaddleSeg: 2.7.0
PaddlePaddle: 2.3.2
OpenCV: 4.1.1
2023-07-24 09:40:35 [INFO]
---------------Config Information---------------
batch_size: 4
iters: 160000
loss:
coef:
types:
type: OhemCrossEntropyLoss
type: OhemCrossEntropyLoss
type: OhemCrossEntropyLoss
lr_scheduler:
end_lr: 0.0
learning_rate: 0.01
power: 0.9
type: PolynomialDecay
model:
backbone:
in_channels: 3
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz
type: ResNet18_vd
num_classes: 2
type: BiseNetV1
optimizer:
type: sgd
weight_decay: 0.0005
train_dataset:
dataset_root: /home/aistudio/PaddleSeg/data
img_channels: 3
mode: train
num_classes: 2
train_path: /home/aistudio/PaddleSeg/data/train_list.txt
transforms:
min_scale_factor: 0.5
scale_step_size: 0.25
type: ResizeStepScaling
type: RandomPaddingCrop
type: Dataset
val_dataset:
dataset_root: /home/aistudio/PaddleSeg/data
img_channels: 3
mode: val
num_classes: 2
transforms:
type: Dataset
val_path: /home/aistudio/PaddleSeg/data/val_list.txt
W0724 09:40:35.476796 7755 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0724 09:40:35.476837 7755 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2023-07-24 09:40:36 [INFO] Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz
Connecting to https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz
Downloading resnet18_vd_ssld_v2.tar.gz
[==================================================] 100.00%
Uncompress resnet18_vd_ssld_v2.tar.gz
[==================================================] 100.00%
2023-07-24 09:40:38 [INFO] There are 115/115 variables loaded into ResNet_vd.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:654: UserWarning: When training, we now always track global mean and variance.
"When training, we now always track global mean and variance.")
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int64, the right dtype will convert to paddle.float32
format(lhs_dtype, rhs_dtype, lhs_dtype))
2023-07-24 09:40:55 [INFO] [TRAIN] epoch: 1, iter: 10/160000, loss: 3.4389, lr: 0.009999, batch_cost: 1.6020, reader_cost: 1.16270, ips: 2.4968 samples/sec | ETA 71:11:48
2023-07-24 09:41:10 [INFO] [TRAIN] epoch: 1, iter: 20/160000, loss: 5.1390, lr: 0.009999, batch_cost: 1.4837, reader_cost: 1.32117, ips: 2.6959 samples/sec | ETA 65:56:05
2023-07-24 09:41:25 [INFO] [TRAIN] epoch: 1, iter: 30/160000, loss: 2.9624, lr: 0.009998, batch_cost: 1.5491, reader_cost: 1.39400, ips: 2.5821 samples/sec | ETA 68:50:10
2023-07-24 09:41:40 [INFO] [TRAIN] epoch: 1, iter: 40/160000, loss: 2.4292, lr: 0.009998, batch_cost: 1.4487, reader_cost: 1.30202, ips: 2.7611 samples/sec | ETA 64:22:17
2023-07-24 09:41:55 [INFO] [TRAIN] epoch: 1, iter: 50/160000, loss: 2.5561, lr: 0.009997, batch_cost: 1.5167, reader_cost: 1.35702, ips: 2.6373 samples/sec | ETA 67:23:18
2023-07-24 09:41:55 [INFO] Start evaluating (total_samples: 30, total_iters: 30)...
Traceback (most recent call last):
File "tools/train.py", line 262, in
main(args)
File "tools/train.py", line 257, in main
to_static_training=cfg.to_static_training)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/core/train.py", line 289, in train
**test_config)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/core/val.py", line 165, in evaluate
ignore_index=eval_dataset.ignore_index)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/utils/metrics.py", line 43, in calculate_area
label.shape))
ValueError: Shape of
pred
and `label should be equal, but there are [1, 4032, 2272] and [1, 4032, 2268].terminate called without an active exception
C++ Traceback (most recent call last):
No stack trace in paddle, may be caused by external reasons.
Error Message Summary:
FatalError:
Process abort signal
is detected by the operating system.[TimeInfo: *** Aborted at 1690162917 (unix time) try "date -d @1690162917" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x3e800001e4b) received by PID 7755 (TID 0x7f04ccaae700) from PID 7755 ***]
复现环境 Environment
配置文件是:
base: '../base/cityscapes.yml'
batch_size: 4
iters: 160000
model:
type: BiseNetV1
backbone:
type: ResNet18_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz
train_dataset:
type: Dataset
dataset_root: /home/aistudio/PaddleSeg/data
train_path: /home/aistudio/PaddleSeg/data/train_list.txt
num_classes: 2
mode: train
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [512, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
- type: Normalize
val_dataset:
type: Dataset
dataset_root: /home/aistudio/PaddleSeg/data
val_path: /home/aistudio/PaddleSeg/data/val_list.txt
num_classes: 2
mode: val
transforms:
- type: Normalize
optimizer:
type: sgd
weight_decay: 0.0005
loss:
types:
- type: OhemCrossEntropyLoss
- type: OhemCrossEntropyLoss
- type: OhemCrossEntropyLoss
coef: [1, 1, 1]
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
end_lr: 0.0
power: 0.9
运行环境:
aistudio
paddlepaddle==2.3.3
paddleseg==2.7.0
python3
Bug描述确认 Bug description confirmation
是否愿意提交PR? Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: