-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training loss fails to converge #176
Comments
Hello, thank you for the very detailed post. To me it looks like the training has finished. Have you tried running the weights to do inference, how does it look, how are the belief maps for the testing set? seriously I am not sure what is causing you issues, to me it looks like it should work. Looking at your image at the top, I would suggest to increase the sample per pixel to get sharper images. |
I have not run the inference script because the training loss is much higher than I expected. I see you mentioned earlier that the training loss is less than 0.01 for a fully trained model. |
Can you try running inference to see?
…On Mon, Jun 21, 2021 at 00:27 kyouma9s ***@***.***> wrote:
I have not run the inference script because the training loss is much
higher than I expected. I see you mentioned earlier that the training loss
is less than 0.01 for a fully trained model.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABK6JIFMZBEYXUCJOAGDHALTT3SVVANCNFSM47AAEZIA>
.
|
Hi, I tried to run the inference script and this is what I got.
most of the returns are empty, i think something wrong. this is the output belief maps this is the inference script I am trying to use. I used the object from YCB to generate the dataset, but without alignment, and the unit still meters. However, in the inference script, I transformed to millimetre as dimensions. Another thing interesting is when I try to validate my dataset by using Nvidia Dataset Utilities (see the below picture), I can only see the 2d cuboid. Other information is not available to display. |
Ahhhh I know what is the problem. The master chef can is a symmetrical object, thus the regression to the circle. You can see that it detects very well the instances, the last belief map. I would suggest that you look into losses for symmetrical objects. If I find something I will share it with you. |
is that loss preventing me from getting any valid quaternion or location data? |
if you look at that heatmap, you can see there is confusion over where the point is, there are multiple possible location. Check out the intro and problem statement in this for deeper explanation. https://openaccess.thecvf.com/content_WACV_2020/papers/Ammirato_SymGAN_Orientation_Estimation_without_Annotation_for_Symmetric_Objects_WACV_2020_paper.pdf |
Overall your setting is correct, you just picked a hard object to train on. |
very thanks for your help. I was a little confused. If the chef can is symmetrical, the cuboid or most other models should be symmetrical and also hard to train? For example, the suger box above. |
I guess if it is difficult to detect yaw rotating cylinders. do you think a dataset ignore yaw rotation will help training? How do you deal with these cylindrical objects, because I see you also provide the weight file of meat. |
read the papers I shared, the first one proposes a loss for symmetrical, the second one is too complicated to re-implement. Sorry that the original training code does not deal well with the symmetries. An other approach which I have done for cubes is to rearange the keypoints to be in a camera canonical view. Which is similar to what you are suggesting. |
Hello, I found the |
yeah you can assign what ever you want. Really this is just a keypoint detector. You can change the location in nvisii as well. |
Thank you very much for your help! I fixated the yaw axis of training images in order to eliminate any rotation on yaw. Now these images look like Do you think this approach will help training by any chance? |
Could you draw with unique colors for the vertex the projected cuboid? If
the one at the back are always at the back then you should be good to go.
…On Thu, Jul 1, 2021 at 16:11 kyouma9s ***@***.***> wrote:
Hi @TontonTremblay <https://github.com/TontonTremblay>
I fixated the yaw axis of training images. Now these images look like
[image: image]
<https://user-images.githubusercontent.com/75087464/124198928-b3843600-dac9-11eb-962b-e6620c28cdb2.png>
[image: image]
<https://user-images.githubusercontent.com/75087464/124198979-cf87d780-dac9-11eb-8dda-d19134e8603c.png>
Do you think this approach will help training by any chance?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABK6JIBYJIA34K677UPISI3TVTYZTANCNFSM47AAEZIA>
.
|
The roll rotation of chef_can is restricted to 0-90 degrees, the pitch rotation is restricted to 0-90 degrees, and yaw rotation is not allowed.
here are some examples! Could you help me check if these images are suitable for training by any chance? Thank you again! |
This looks pretty good to me. I would try it, I feel pretty confident it will work. You might be able to find corner cases in real life that it wont work, but overall you wont get the problems you had before. |
This time the training loss stopped dropping at around 0.01. And I got some interesting results. But the performance was still not as good as expected. I noticed that an object appears only once in each picture in some custom datasets that has been successfully trained. Will this affect training? |
Yes, I produced some testing images which only include a single can. It seems that the bottom circle is well recognised, but the reasoning of dimensions may be wrong. |
for now, it is 40k for training and 10k for testing, each image contains 15 cans |
So in the inference code I am pretty sure your cuboid size is not set
correctly. You get some good predictions though in some. I have trained
with at most 3 cans are the same time. Never like 15. Maybe you could
reduce it to 3-5 and double check your cuboid size.
…On Sun, Jul 4, 2021 at 12:32 kyouma9s ***@***.***> wrote:
@TontonTremblay <https://github.com/TontonTremblay>
for now, it is 40k training and 10k testing, each image contains 15 cans
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABK6JIF3XHQ3KYB3CG2WY5TTWCZNLANCNFSM47AAEZIA>
.
|
I created a new dataset (50k for training and 10k for testing) in which each image contains up to three duplicate objects. The cuboid size is also corrected. The training loss can drop to 0.006. |
I used collapse tag to hide images by default so that this issue could be clear to read. |
These results look very interesting and they are highly encouraging. Some of these results look pretty good. I am not sure what your end-goal is, so it is hard to help you. |
I hope the trained network can help the robot detect object poses from RGB camera. |
Perhaps your cuboid dimensions are wrong. You can get the correct values by looking at the "cuboid_dimensions": [ 10.240400314331055, 14.0177001953125, 10.230899810791016 ] Can you paste your DOPE |
Hi @mintar The dataset was generated by NVISII. The dimensions setting is like
The first value represents the length on the x-axis, the second value represents the length on the y-axis, and the third value represents the length on the z-axis. All values are in centimeters. The model is aligned by myself but the unit is not be transformed into cm. The last two values are swapped because I did not directly use the model from FAT. I didn't have a inference codeimport sys
sys.path.append(".")
sys.path.append("..")
import cv2
from src.dope.inference.cuboid import Cuboid3d
from src.dope.inference.cuboid_pnp_solver import CuboidPNPSolver
from src.dope.inference.detector import ModelData, ObjectDetector
import numpy as np
import yaml
from PIL import Image, ImageDraw
import math
from glob import iglob
import os
class Draw(object):
"""Drawing helper class to visualize the neural network output"""
def __init__(self, im):
"""
:param im: The image to draw in.
"""
self.draw = ImageDraw.Draw(im)
def draw_line(self, point1, point2, line_color, line_width=2):
"""Draws line on image"""
if point1 is not None and point2 is not None:
self.draw.line([point1, point2], fill=line_color, width=line_width)
def draw_dot(self, point, point_color, point_radius):
"""Draws dot (filled circle) on image"""
if point is not None:
xy = [
point[0] - point_radius,
point[1] - point_radius,
point[0] + point_radius,
point[1] + point_radius
]
self.draw.ellipse(xy,
fill=point_color,
outline=point_color
)
def draw_cube(self, points, color=(255, 255, 255)):
"""
Draws cube with a thick solid line across
the front top edge and an X on the top face.
"""
# draw front
self.draw_line(points[0], points[1], (0, 0, 255))
self.draw_line(points[1], points[2], (0, 0, 255))
self.draw_line(points[2], points[3], (0, 0, 255))
self.draw_line(points[3], points[0], (0, 0, 255))
self.draw_line(points[0], points[2], (0, 0, 255))
self.draw_line(points[1], points[3], (0, 0, 255))
# draw back
self.draw_line(points[4], points[5],color)
self.draw_line(points[5], points[6],color)
self.draw_line(points[6], points[7],color)
self.draw_line(points[7], points[4],color)
# draw sides
self.draw_line(points[1], points[5],color)
self.draw_line(points[2], points[6],color)
self.draw_line(points[0], points[4],color)
self.draw_line(points[3], points[7],color)
def main():
models = {}
pnp_solvers = {}
config_detect = lambda: None
config_detect.mask_edges = 1
config_detect.mask_faces = 1
config_detect.vertex = 1
config_detect.threshold = 0.001
config_detect.softmax = 1000
config_detect.thresh_angle = 0.5
config_detect.thresh_map = 0.01
config_detect.sigma = 3
config_detect.thresh_points = 0.1
weights = {
"ycb_002_master_chef_can":"backup/net_ycb_002_master_chef_can_76.pth",
}
dimensions = {
#x,y,z
"ycb_002_master_chef_can": [10.25, 10.23,14.01]
}
draw_colors = {
"ycb_002_master_chef_can": (13, 255, 128), # green
}
camera_matrix = np.array([[482.84283447265625, 0, 200.0],
[ 0, 482.84283447265625, 200.0],
[ 0, 0., 1. ]])
dist_coeffs = np.zeros((4, 1))
# For each object to detect, load network model, create PNP solver, and start ROS publishers
for model, weights_url in weights.items():
models[model] = \
ModelData(
model,
weights_url
)
models[model].load_net_model()
pnp_solvers[model] = \
CuboidPNPSolver(
model,
cuboid3d=Cuboid3d(dimensions[model])
)
max_images = 50
got_images = 0
for pathToImg in iglob(os.path.join("./va", "*.jpg")):
if got_images >= max_images:
break
print(pathToImg)
got_images+=1
img = cv2.imread(pathToImg)
if img is None:
continue
#cv2.imshow('img', img)
#cv2.waitKey(1)
im = Image.fromarray(img.copy())
draw = Draw(im)
height, width, _ = img.shape
scaling_factor = float(400) / height
if scaling_factor < 1.0:
img = cv2.resize(img, (int(scaling_factor * width), int(scaling_factor * height)))
pnp_solvers[model].set_camera_intrinsic_matrix(camera_matrix*scaling_factor)
pnp_solvers[model].set_dist_coeffs(dist_coeffs)
#print("models: %d" % len(models))
for m in models:
# try to detect object
results, im_belief = ObjectDetector.detect_object_in_image(models[m].net, pnp_solvers[m], img, config_detect, grid_belief_debug=True, norm_belief=True, run_sampling=True)
#print("objects found: {}".format(results))
cv_imageBelief = np.array(im_belief)
imageToShow = cv2.resize(cv_imageBelief, dsize=(height, width))
#cv2.imwrite('be-%s.jpg' % num, imageToShow)
for i_r, result in enumerate(results):
if None not in result['projected_points']:
points2d = []
for pair in result['projected_points']:
points2d.append(tuple(pair))
draw.draw_cube(points2d, draw_colors[m])
annotated_frame = np.array(im)
#cv2.imwrite('an-%s.jpg' % num, annotated_frame)
preview_img = np.concatenate((annotated_frame, imageToShow), axis=1)
cv2.imwrite('result-%d.jpg' % got_images, preview_img)
print("end")
if __name__ == '__main__':
main() |
I think I have made encouraging results. Thank you so much! But I don't know how to improve the performance (precision&recall?) so that I can use the trained network on robot. Do you have any suggestions by any chance? |
I think you just picked a very challenging object due to its rotational symmetry. It also looks like your training images have a different texture (with the red blob and arrow) than your testing images, is that correct? |
I think I found your bug. You made a mistake while modifying the code in the repo. You do this: camera_matrix = np.array([[482.84283447265625, 0, 200.0],
[ 0, 482.84283447265625, 200.0],
[ 0, 0., 1. ]])
pnp_solvers[model].set_camera_intrinsic_matrix(camera_matrix*scaling_factor) Instead, this is correct: camera_matrix = np.array([[482.84283447265625, 0, 200.0],
[ 0, 482.84283447265625, 200.0],
[ 0, 0., 1. ]])
camera_matrix[:2] *= scaling_factor
pnp_solvers[model].set_camera_intrinsic_matrix(camera_matrix) If that doesn't solve all your problems, please double check the differences between the official code here and all your modifications. |
Hi @mintar Thank you for reviewing my code. Fortunately, because my test and training images are both 400x400, The actual training and testing images have the original texture, and the model with the texture of the arrow on it is used to verify whether the tool I made to generate synthetic data is working properly 😂 |
Ok, thanks for checking. Are the images in your comment #176 (comment) a random selection, or did you focus on the failure cases? If it's a random selection, the results are much worse than what can be expected from DOPE. I've made a small overlay (in Gimp) of one of your images: By the way: I also created PR #178 yesterday, with which you can generate images like this automatically. From that image we can see that the estimated pose corresponds to the belief maps: The bounding box corners match the maxima in the belief maps. This means that PnP is working well, and so your cuboid dimensions and camera matrix are indeed correct. The fault lies in the network itself, so it has to be something about that particular object and its training data. It seems to me that the network focuses overly strongly on the round top of the object. It doesn't recognize the object from angles where the top is occluded, and it has a large rotation error around one axis because it attempts to estimate the rotation based on the round top alone, which is hard. Are the test images drawn from the same distribution as the training images? I.e., are they generated the same way? (Of course they shouldn't be included in the training data set). |
After thinking some more about this, I have a theory what's going on. You restricted the angles under which the object is shown in the dataset, right? This isn't what you should do, because you want to recognize the object under arbitrary angles in the test set / reality. Instead, you should have arbitrary angles in the training set, but then correct the ground truth pose that the network is supposed to return for that angle. For example, I had an object (a box) that has a 180° rotational symmetry: If you rotate the object by 180° around its z axis, it has the same appearance. So what I did was that I generated my dataset with unrestricted angles using NDSS and then modified the output. Here's the script: (Note: In the process of importing the mesh into NDSS some axes got rotated, so the z axis in the mesh actually becomes the x axis in the dataset.) If we take textures into account, your object has the same 180° rotational symmetry, because the texture is mirrored on the other side, so you could try using my script unmodified. Of course your object is a bit harder, because if you only see the top or bottom, there is an infinite order of rotational symmetry around the z axis. You might get better results if you modify my script so that it always fully rotates the ground truth pose around the z axis so that the yaw angle becomes 0. It's useful to use |
Hi @mintar Thanks for your helping and new PR! The images in the comment #176 (comment) was generated only to evaluate network performance. They are not part of the training set or the testing set. Both the training set and the test set were generated using the same tool with the same parameters, they are just different in number. I will continue to try the new method you suggest, and I will report if there is an interesting update. |
Hi,
Thanks for your excellent work. I have a problem when training custom dataset.
I refer to this link, using NVISII to make 50k dataset, 40k for training and 10k for testing. Here are some examples.
Although there are many sugar boxes in this picture, my goal is to train the beef can. Unlike the dataset created by others, there are a lot of duplicated objects in my training pictures.
A typical JSON file looks as follows
The belief maps generated during training are as follows
The training command like this.
I also tried to active the
--save
flag to check the input.I tried to train with 4xRTX3090 at least 80 epochs, the training loss slowly dropped from 0.07 to 0.03, and then stop decreasing. I have to stop training because this performance so far is not optimistic to cover the paid cost.
Could you give me any suggestions by any chance?
loss_train.csv
loss_test.csv
header.txt
@blaine141 @TontonTremblay
The text was updated successfully, but these errors were encountered: