Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training loss fails to converge #176

Closed
kyouma9s opened this issue Jun 20, 2021 · 37 comments
Closed

Training loss fails to converge #176

kyouma9s opened this issue Jun 20, 2021 · 37 comments

Comments

@kyouma9s
Copy link

kyouma9s commented Jun 20, 2021

Hi,

Thanks for your excellent work. I have a problem when training custom dataset.

I refer to this link, using NVISII to make 50k dataset, 40k for training and 10k for testing. Here are some examples.

0000

Although there are many sugar boxes in this picture, my goal is to train the beef can. Unlike the dataset created by others, there are a lot of duplicated objects in my training pictures.

A typical JSON file looks as follows

{
            "bounding_box": {
                "bottom_right": [
                    479,
                    132
                ],
                "top_left": [
                    0,
                    -29
                ]
            },
            "class": "ycb_002_master_chef_can",
            "location": [
                0.17170387506484985,
                0.1325557380914688,
                -0.7930967211723328
            ],
            "name": "ycb_002_master_chef_can_1623881842972",
            "projected_cuboid": [
                [
                    511.2225818634033,
                    224.55554008483887
                ],
                [
                    557.5224995613098,
                    169.38842296600342
                ],
                [
                    480.2271890640259,
                    113.93750667572021
                ],
                [
                    435.78068017959595,
                    172.70424842834473
                ],
                [
                    528.2153511047363,
                    250.0400733947754
                ],
                [
                    581.2865257263184,
                    188.86679649353027
                ],
                [
                    494.3824529647827,
                    126.59634590148926
                ],
                [
                    443.631808757782,
                    192.3055601119995
                ]
            ],
            "projected_cuboid_centroid": [
                503.60398054122925,
                180.23793697357178
            ],
            "provenance": "nvisii",
            "quaternion_xyzw": [
                0.3330736458301544,
                0.6829744577407837,
                -0.0770084336400032,
                0.6455056071281433
            ],
            "quaternion_xyzw_world": [
                0.3330736458301544,
                0.6829744577407837,
                -0.0770084336400032,
                0.6455056071281433
            ],
            "segmentation_id": 1,
            "visibility_image": 1
        },

The belief maps generated during training are as follows

0
1
2
3
4
5
6
7
8

The training command like this.

python3 train.py --gpuids 0 1 2 3 --data /root/robo_dataset/train --datatest /root/robo_dataset/test --object ycb_002_master_chef_can --batchsize 128 --imagesize 400 --lr 0.0001 --namefile ycb_002_master_chef_can --epochs 120 --loginterval 1 --outf ./out

I also tried to active the --save flag to check the input.

I tried to train with 4xRTX3090 at least 80 epochs, the training loss slowly dropped from 0.07 to 0.03, and then stop decreasing. I have to stop training because this performance so far is not optimistic to cover the paid cost.

Could you give me any suggestions by any chance?

loss_train.csv
loss_test.csv
header.txt

@blaine141 @TontonTremblay

@kyouma9s kyouma9s changed the title Training fails to converge Training loss fails to converge Jun 20, 2021
@TontonTremblay
Copy link
Collaborator

Hello, thank you for the very detailed post. To me it looks like the training has finished. Have you tried running the weights to do inference, how does it look, how are the belief maps for the testing set? seriously I am not sure what is causing you issues, to me it looks like it should work. Looking at your image at the top, I would suggest to increase the sample per pixel to get sharper images.

@kyouma9s
Copy link
Author

I have not run the inference script because the training loss is much higher than I expected. I see you mentioned earlier that the training loss is less than 0.01 for a fully trained model.

@TontonTremblay
Copy link
Collaborator

TontonTremblay commented Jun 21, 2021 via email

@kyouma9s
Copy link
Author

kyouma9s commented Jun 23, 2021

Hi,

I tried to run the inference script and this is what I got.

objects found: [{'name': 'ycb_002_master_chef_can', 'location': None, 'quaternion': None, 'cuboid2d': array([None, None, None, None, None, None, None, None,
       (400.8300740499558, 205.00378509782468)], dtype=object), 'projected_points': [None, None, None, None, None, None, None, None, (400.8300740499558, 205.00378509782468)], 'score': 0.19770813}, {'name': 'ycb_002_master_chef_can', 'location': None, 'quaternion': None, 'cuboid2d': array([None, None, None, None, None, None, None, None,
       (110.40288078192779, 226.60441166507223)], dtype=object), 'projected_points': [None, None, None, None, None, None, None, None, (110.40288078192779, 226.60441166507223)], 'score': 0.2823677}]

most of the returns are empty, i think something wrong.

this is the output belief maps

beliefMaps

this is the inference script I am trying to use.
https://gist.github.com/kyouma9s/5e518714a8e613b65bfb7330142fde2d

I used the object from YCB to generate the dataset, but without alignment, and the unit still meters. However, in the inference script, I transformed to millimetre as dimensions.

Another thing interesting is when I try to validate my dataset by using Nvidia Dataset Utilities (see the below picture), I can only see the 2d cuboid. Other information is not available to display.

image

@TontonTremblay
Copy link
Collaborator

Ahhhh I know what is the problem. The master chef can is a symmetrical object, thus the regression to the circle. You can see that it detects very well the instances, the last belief map. I would suggest that you look into losses for symmetrical objects. If I find something I will share it with you.

@kyouma9s
Copy link
Author

is that loss preventing me from getting any valid quaternion or location data?

@TontonTremblay
Copy link
Collaborator

@TontonTremblay
Copy link
Collaborator

image

if you look at that heatmap, you can see there is confusion over where the point is, there are multiple possible location. Check out the intro and problem statement in this for deeper explanation. https://openaccess.thecvf.com/content_WACV_2020/papers/Ammirato_SymGAN_Orientation_Estimation_without_Annotation_for_Symmetric_Objects_WACV_2020_paper.pdf

@TontonTremblay
Copy link
Collaborator

Overall your setting is correct, you just picked a hard object to train on.

@kyouma9s
Copy link
Author

Overall your setting is correct, you just picked a hard object to train on.

very thanks for your help. I was a little confused. If the chef can is symmetrical, the cuboid or most other models should be symmetrical and also hard to train? For example, the suger box above.

@kyouma9s
Copy link
Author

kyouma9s commented Jun 23, 2021

I guess if it is difficult to detect yaw rotating cylinders. do you think a dataset ignore yaw rotation will help training? How do you deal with these cylindrical objects, because I see you also provide the weight file of meat.

@TontonTremblay
Copy link
Collaborator

read the papers I shared, the first one proposes a loss for symmetrical, the second one is too complicated to re-implement. Sorry that the original training code does not deal well with the symmetries.

An other approach which I have done for cubes is to rearange the keypoints to be in a camera canonical view. Which is similar to what you are suggesting.

@kyouma9s
Copy link
Author

@TontonTremblay

Hello, I found the pointsBelief only used points from projected_cuboid. Can I assign another 8 points that are more representative of the shape of the object for this variable?

@TontonTremblay
Copy link
Collaborator

yeah you can assign what ever you want. Really this is just a keypoint detector. You can change the location in nvisii as well.

@kyouma9s
Copy link
Author

kyouma9s commented Jul 1, 2021

Hi @TontonTremblay

Thank you very much for your help! I fixated the yaw axis of training images in order to eliminate any rotation on yaw. Now these images look like

image
image

Do you think this approach will help training by any chance?

@TontonTremblay
Copy link
Collaborator

TontonTremblay commented Jul 2, 2021 via email

@kyouma9s
Copy link
Author

kyouma9s commented Jul 2, 2021

Hi @TontonTremblay

The roll rotation of chef_can is restricted to 0-90 degrees, the pitch rotation is restricted to 0-90 degrees, and yaw rotation is not allowed.

def make_rotation():
    new_rot = (
        random.uniform(0, 90),  # Roll
        random.uniform(0, 90),  # Pitch
        0  # Yaw
    )
    q = Quaternion.from_euler(*new_rot, degrees=True)
    return q.x, q.y, q.z, q.w

image

here are some examples!

image
image
image

Could you help me check if these images are suitable for training by any chance? Thank you again!

@TontonTremblay
Copy link
Collaborator

This looks pretty good to me. I would try it, I feel pretty confident it will work. You might be able to find corner cases in real life that it wont work, but overall you wont get the problems you had before.

@kyouma9s
Copy link
Author

kyouma9s commented Jul 4, 2021

Hi @TontonTremblay

This time the training loss stopped dropping at around 0.01. And I got some interesting results. But the performance was still not as good as expected. I noticed that an object appears only once in each picture in some custom datasets that has been successfully trained. Will this affect training?

images

result-0002
result-0003
result-0004
result-0005
result-0006
result-0007
result-0008
result-0009

@TontonTremblay
Copy link
Collaborator

TontonTremblay commented Jul 4, 2021 via email

@kyouma9s
Copy link
Author

kyouma9s commented Jul 4, 2021

Hi @TontonTremblay

Yes, I produced some testing images which only include a single can. It seems that the bottom circle is well recognised, but the reasoning of dimensions may be wrong.

images

result-1
result-2
result-3
result-4
result-5
result-6
result-7
result-8
result-9
result-10

@TontonTremblay
Copy link
Collaborator

TontonTremblay commented Jul 4, 2021 via email

@kyouma9s
Copy link
Author

kyouma9s commented Jul 4, 2021

@TontonTremblay

for now, it is 40k for training and 10k for testing, each image contains 15 cans

@TontonTremblay
Copy link
Collaborator

TontonTremblay commented Jul 4, 2021 via email

@kyouma9s
Copy link
Author

kyouma9s commented Jul 6, 2021

Hi @TontonTremblay

I created a new dataset (50k for training and 10k for testing) in which each image contains up to three duplicate objects. The cuboid size is also corrected. The training loss can drop to 0.006.
I have some good results, amazing! Do you have any suggestions for those cases where the test fails?

images

result-1
result-2
result-3
result-4
result-5
result-6
result-7
result-8
result-9
result-10
result-11
result-12
result-13
result-14
result-15
result-16
result-17
result-18
result-19
result-20

@kyouma9s
Copy link
Author

kyouma9s commented Jul 6, 2021

I used collapse tag to hide images by default so that this issue could be clear to read.

@TontonTremblay
Copy link
Collaborator

These results look very interesting and they are highly encouraging. Some of these results look pretty good. I am not sure what your end-goal is, so it is hard to help you.

@kyouma9s
Copy link
Author

kyouma9s commented Jul 6, 2021

These results look very interesting and they are highly encouraging. Some of these results look pretty good. I am not sure what your end-goal is, so it is hard to help you.

I hope the trained network can help the robot detect object poses from RGB camera.

@mintar
Copy link
Contributor

mintar commented Jul 7, 2021

Perhaps your cuboid dimensions are wrong. You can get the correct values by looking at the _object_settings.json file from your training data set. If you used the same values as the FAT dataset, it should look like this:

"cuboid_dimensions": [ 10.240400314331055, 14.0177001953125, 10.230899810791016 ]

Can you paste your DOPE config_pose.yaml file here?

@kyouma9s
Copy link
Author

kyouma9s commented Jul 7, 2021

Hi @mintar

The dataset was generated by NVISII. The dimensions setting is like

"ycb_002_master_chef_can": [10.25, 10.23,14.01]

The first value represents the length on the x-axis, the second value represents the length on the y-axis, and the third value represents the length on the z-axis. All values are in centimeters. The model is aligned by myself but the unit is not be transformed into cm. The last two values are swapped because I did not directly use the model from FAT.

Below is my model.
image

I didn't have a config_pose.yaml because I didn't use the ROS node and I modified the inference code to feed the image and get the output directly.

inference code
import sys
sys.path.append(".")
sys.path.append("..")

import cv2
from src.dope.inference.cuboid import Cuboid3d
from src.dope.inference.cuboid_pnp_solver import CuboidPNPSolver
from src.dope.inference.detector import ModelData, ObjectDetector
import numpy as np
import yaml
from PIL import Image, ImageDraw
import math
from glob import iglob
import os

class Draw(object):
    """Drawing helper class to visualize the neural network output"""

    def __init__(self, im):
        """
        :param im: The image to draw in.
        """
        self.draw = ImageDraw.Draw(im)

    def draw_line(self, point1, point2, line_color, line_width=2):
        """Draws line on image"""
        if point1 is not None and point2 is not None:
            self.draw.line([point1, point2], fill=line_color, width=line_width)

    def draw_dot(self, point, point_color, point_radius):
        """Draws dot (filled circle) on image"""
        if point is not None:
            xy = [
                point[0] - point_radius,
                point[1] - point_radius,
                point[0] + point_radius,
                point[1] + point_radius
            ]
            self.draw.ellipse(xy,
                              fill=point_color,
                              outline=point_color
                              )

    def draw_cube(self, points, color=(255, 255, 255)):
        """
        Draws cube with a thick solid line across
        the front top edge and an X on the top face.
        """

        # draw front
        self.draw_line(points[0], points[1], (0, 0, 255))
        self.draw_line(points[1], points[2], (0, 0, 255))
        self.draw_line(points[2], points[3], (0, 0, 255))
        self.draw_line(points[3], points[0], (0, 0, 255))
        self.draw_line(points[0], points[2], (0, 0, 255))
        self.draw_line(points[1], points[3], (0, 0, 255))

        # draw back
        self.draw_line(points[4], points[5],color)
        self.draw_line(points[5], points[6],color)
        self.draw_line(points[6], points[7],color)
        self.draw_line(points[7], points[4],color)

        # draw sides
        self.draw_line(points[1], points[5],color)
        self.draw_line(points[2], points[6],color)
        self.draw_line(points[0], points[4],color)
        self.draw_line(points[3], points[7],color)


def main():

    models = {}
    pnp_solvers = {}

    config_detect = lambda: None
    config_detect.mask_edges = 1
    config_detect.mask_faces = 1
    config_detect.vertex = 1
    config_detect.threshold = 0.001
    config_detect.softmax = 1000
    config_detect.thresh_angle = 0.5
    config_detect.thresh_map = 0.01
    config_detect.sigma = 3
    config_detect.thresh_points = 0.1

    weights = {
        "ycb_002_master_chef_can":"backup/net_ycb_002_master_chef_can_76.pth",
    }

    dimensions = {
    #x,y,z
        "ycb_002_master_chef_can": [10.25, 10.23,14.01]
    }

    draw_colors = {
        "ycb_002_master_chef_can": (13, 255, 128),  # green
    }

    camera_matrix = np.array([[482.84283447265625,    0,         200.0],
                            [  0,      482.84283447265625,       200.0],
                            [  0,      0.,        1.        ]])
    dist_coeffs = np.zeros((4, 1))

    # For each object to detect, load network model, create PNP solver, and start ROS publishers
    for model, weights_url in weights.items():
        models[model] = \
            ModelData(

                model,
                weights_url
            )
        models[model].load_net_model()



        pnp_solvers[model] = \
            CuboidPNPSolver(
                model,
                cuboid3d=Cuboid3d(dimensions[model])
            )
       
    max_images = 50
    got_images = 0
    for pathToImg in iglob(os.path.join("./va", "*.jpg")):
        if got_images >= max_images:
            break
        print(pathToImg)
        got_images+=1
        img = cv2.imread(pathToImg)
        if img is None:
            continue
        #cv2.imshow('img', img)
        #cv2.waitKey(1)
        im = Image.fromarray(img.copy())
        draw = Draw(im)
        height, width, _ = img.shape
        scaling_factor = float(400) / height
        if scaling_factor < 1.0:
            img = cv2.resize(img, (int(scaling_factor * width), int(scaling_factor * height)))
        
        pnp_solvers[model].set_camera_intrinsic_matrix(camera_matrix*scaling_factor)
        pnp_solvers[model].set_dist_coeffs(dist_coeffs)
        
        #print("models: %d" % len(models))
        for m in models:
            # try to detect object
            results, im_belief = ObjectDetector.detect_object_in_image(models[m].net, pnp_solvers[m], img, config_detect, grid_belief_debug=True, norm_belief=True, run_sampling=True)

            #print("objects found: {}".format(results))
            cv_imageBelief = np.array(im_belief)
            imageToShow = cv2.resize(cv_imageBelief, dsize=(height, width))
            #cv2.imwrite('be-%s.jpg' % num, imageToShow)
            
            for i_r, result in enumerate(results):
                if None not in result['projected_points']:
                    points2d = []
                    for pair in result['projected_points']:
                        points2d.append(tuple(pair))
                    draw.draw_cube(points2d, draw_colors[m])

            annotated_frame = np.array(im)
            #cv2.imwrite('an-%s.jpg' % num, annotated_frame)
            preview_img = np.concatenate((annotated_frame, imageToShow), axis=1)
            cv2.imwrite('result-%d.jpg' % got_images, preview_img)

    print("end")


if __name__ == '__main__':
    main()

@kyouma9s
Copy link
Author

kyouma9s commented Jul 7, 2021

I think I have made encouraging results. Thank you so much! But I don't know how to improve the performance (precision&recall?) so that I can use the trained network on robot. Do you have any suggestions by any chance?

@mintar
Copy link
Contributor

mintar commented Jul 8, 2021

I think you just picked a very challenging object due to its rotational symmetry. It also looks like your training images have a different texture (with the red blob and arrow) than your testing images, is that correct?

@mintar
Copy link
Contributor

mintar commented Jul 8, 2021

I think I found your bug. You made a mistake while modifying the code in the repo.

You do this:

camera_matrix = np.array([[482.84283447265625,    0,         200.0],
                            [  0,      482.84283447265625,       200.0],
                            [  0,      0.,        1.        ]])

pnp_solvers[model].set_camera_intrinsic_matrix(camera_matrix*scaling_factor)

Instead, this is correct:

camera_matrix = np.array([[482.84283447265625,    0,         200.0],
                            [  0,      482.84283447265625,       200.0],
                            [  0,      0.,        1.        ]])

camera_matrix[:2] *= scaling_factor

pnp_solvers[model].set_camera_intrinsic_matrix(camera_matrix)

If that doesn't solve all your problems, please double check the differences between the official code here and all your modifications.

@kyouma9s
Copy link
Author

kyouma9s commented Jul 8, 2021

Hi @mintar

Thank you for reviewing my code. Fortunately, because my test and training images are both 400x400, scaling_factor should always be 1, which does not cause side effects on the intrinsic matrix.

The actual training and testing images have the original texture, and the model with the texture of the arrow on it is used to verify whether the tool I made to generate synthetic data is working properly 😂

@mintar
Copy link
Contributor

mintar commented Jul 9, 2021

Ok, thanks for checking. Are the images in your comment #176 (comment) a random selection, or did you focus on the failure cases? If it's a random selection, the results are much worse than what can be expected from DOPE.

I've made a small overlay (in Gimp) of one of your images:

dope-overlaid

By the way: I also created PR #178 yesterday, with which you can generate images like this automatically.

From that image we can see that the estimated pose corresponds to the belief maps: The bounding box corners match the maxima in the belief maps. This means that PnP is working well, and so your cuboid dimensions and camera matrix are indeed correct. The fault lies in the network itself, so it has to be something about that particular object and its training data.

It seems to me that the network focuses overly strongly on the round top of the object. It doesn't recognize the object from angles where the top is occluded, and it has a large rotation error around one axis because it attempts to estimate the rotation based on the round top alone, which is hard.

Are the test images drawn from the same distribution as the training images? I.e., are they generated the same way? (Of course they shouldn't be included in the training data set).

@mintar
Copy link
Contributor

mintar commented Jul 9, 2021

After thinking some more about this, I have a theory what's going on. You restricted the angles under which the object is shown in the dataset, right? This isn't what you should do, because you want to recognize the object under arbitrary angles in the test set / reality. Instead, you should have arbitrary angles in the training set, but then correct the ground truth pose that the network is supposed to return for that angle. For example, I had an object (a box) that has a 180° rotational symmetry: If you rotate the object by 180° around its z axis, it has the same appearance. So what I did was that I generated my dataset with unrestricted angles using NDSS and then modified the output. Here's the script:

https://github.com/mintar/ycb_multicam_dataset_tools/blob/f7d2a4d8361aa9376f3cfe9077de2669906b9a0a/flip_symmetrical_objects.py

(Note: In the process of importing the mesh into NDSS some axes got rotated, so the z axis in the mesh actually becomes the x axis in the dataset.)

If we take textures into account, your object has the same 180° rotational symmetry, because the texture is mirrored on the other side, so you could try using my script unmodified. Of course your object is a bit harder, because if you only see the top or bottom, there is an infinite order of rotational symmetry around the z axis. You might get better results if you modify my script so that it always fully rotates the ground truth pose around the z axis so that the yaw angle becomes 0. It's useful to use nvdu_viz to view your modified dataset and verify that it's correct.

@kyouma9s
Copy link
Author

kyouma9s commented Jul 9, 2021

Hi @mintar

Thanks for your helping and new PR! The images in the comment #176 (comment) was generated only to evaluate network performance. They are not part of the training set or the testing set. Both the training set and the test set were generated using the same tool with the same parameters, they are just different in number.

I will continue to try the new method you suggest, and I will report if there is an interesting update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants