-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaning up Ops Boxes and Losses 🧹 #5979
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I shouldn't be snooping around on draft PRs. I've added couple of comments but feel free to ignore if you think it's too early. 😄
torchvision/ops/ciou_loss.py
Outdated
@@ -12,6 +13,9 @@ def complete_box_iou_loss( | |||
) -> torch.Tensor: | |||
|
|||
""" | |||
# Original Implementation from | |||
https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/losses.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave this as a comment on the source. Long unrendered URLs are not particularly helpful for the documentation. We should make the same change on diou.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap. That looks ugly and it's on multiple places. If you want bring a separate quick PR that moves attributions on the main part of the methods to avoid the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option is to embed the link e.g.
Original Implementation from Detectron2
But the docstring should at least start by describing what the object is, even if it's very obvious from its name already. So I would suggest to write something like
"""Complete Box IoU Loss.
Implementation is adapted from `Detectron2 <https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/losses.py>`__.
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how it's currently over main branch. I would suggest adding 3 words in end
implementation adapted from
Detectron2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great description but the preview is a bit long @oke-aditya . It's best to skip a line after the first sentence (there's even a PEP for that) to keep the preview is short and to-the-point.
Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description. The summary line may be used by automatic indexing tools; it is important that it fits on one line and is separated from the rest of the docstring by a blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be tackling this in seperate PR anyways to unify all stuff. I feel we need bit more revamp for Ops docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely provide attribution to other projects when we use code from them. Let's keep the reference on the code as a comment on the main body of the method similar to what we do in most places instead of placing a link. We can review this on the future and make changes in a coordinated manner.
@@ -67,3 +67,33 @@ def split_normalization_params( | |||
else: | |||
other_params.extend(p for p in module.parameters() if p.requires_grad) | |||
return norm_params, other_params | |||
|
|||
|
|||
def _upcast(t: Tensor) -> Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that previous code used different versions of upcast. More specifically boxes permitted integers while losses didn't.
@pmeier Since the test_ops.py file is growing very big. 1.5k+ lines. I think it's time to take out common items. So I have created a new test file for losses. Should I do the same for IoU ? Also cc @NicolasHug |
# Protects from numerical overflows in multiplications by upcasting to the equivalent higher type | ||
if t.dtype not in (torch.float32, torch.float64): | ||
return t.float() | ||
return t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the same _upcast
as _utils
. It doesn't support integers and converts everything to floats. Could you please review all the places where the giou_loss._upcast()
was used and ensure the output will be a float? Basically box ops are OK to maintain things as integers but not losses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I do understand that I have not used the _upcast_if_not_float
. It's intentional, :) Since I would like to know why
box ops are OK to maintain things as integers but not losses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _upcast
on boxes was introduced for operators that estimate information related to a box. Things like the box area for instance. If you put integer boxes and you ask for the area of a box, you kind of expecting you receive the value in integers (that's debatable but that's how the operator worked). The issue was that if you used too small of a precision, the area estimation would overflow. So this method upcasts math operations to a space where it's safe to do multiplications without risking overflowing for most applications.
On the losses side, I'm not aware of any application that does things on integer space. Not only that but doing reduction == "mean"
will break things. So we need to be careful to definitely not support ints in losses. Similar care might be needed for some box operators. Area might still make sense to return as integer but I'm not 100% sure if that's the case with all the IoU metrics we deal here.
So I think it's important prior merging this PR, to make an explicit decision of what has to support integers and what doesn't, handle it appropriately and add tests and xfails to ensure we are not breaking anything.
from common_utils import cpu_and_gpu | ||
from torchvision import ops | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a class, inheriting and the calling method is also possible. For now is this fine?
assert_empty_loss(ops.distance_box_iou_loss, dtype, device) | ||
|
||
|
||
class TestFocalLoss: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are testing losses here, I felt to add this here . To avoid confusion between files.
|
||
|
||
def get_boxes(dtype, device): | ||
box1 = torch.tensor([-1, -1, 1, 1], dtype=dtype, device=device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super happy with this choice of box. Since this is actually invalid input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the concern.
Detectron2 used the same set of boxes. see this.
I think we should use valid input boxes.
That being said, should we also check if the input boxes have non-negative values?
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot assert here as it will lead to cuda call and cause trouble.
I'm not sure if we can use torch._assert_async
either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, ig this situation is similar to #5776 (comment)
@oke-aditya Shall we convert to draft and ping us when you are happy to review again? |
Yep. :) |
Not sure if I simplified the IoU tests but at least I reduced the code complexity from previous tests. I basically did not want to override a class that handles both the box_area and the IoUs. So I wrote separate test for box_area and then used a class to simplify IoU tests. i feel this keeps code bit readable and clear. Still need to get rid of _generate(). One way is to use pytest.fixture or other is to use a few global variables. |
@datumbox @NicolasHug as discussed with @pmeier . I will split this PR into 2 Parts. The first part will cleanup the code. The second part will cleanup the tests 😃 This will make merging easier. |
Fixes #5976
Work in Progress. Opened PR to see if I keep passing tests :)