Skip to content

Commit

Permalink
Merge branch 'main' into port/erase
Browse files Browse the repository at this point in the history
  • Loading branch information
pmeier committed Aug 30, 2023
2 parents fe3e45a + a06df0d commit 31d6f2b
Show file tree
Hide file tree
Showing 12 changed files with 297 additions and 281 deletions.
45 changes: 28 additions & 17 deletions docs/source/transforms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,20 +33,33 @@ tasks (image classification, detection, segmentation, video classification).
from torchvision import tv_tensors
img = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
bboxes = torch.randint(0, H // 2, size=(3, 4))
bboxes[:, 2:] += bboxes[:, :2]
bboxes = tv_tensors.BoundingBoxes(bboxes, format="XYXY", canvas_size=(H, W))
boxes = torch.randint(0, H // 2, size=(3, 4))
boxes[:, 2:] += boxes[:, :2]
boxes = tv_tensors.BoundingBoxes(boxes, format="XYXY", canvas_size=(H, W))
# The same transforms can be used!
img, bboxes = transforms(img, bboxes)
img, boxes = transforms(img, boxes)
# And you can pass arbitrary input structures
output_dict = transforms({"image": img, "bboxes": bboxes})
output_dict = transforms({"image": img, "boxes": boxes})
Transforms are typically passed as the ``transform`` or ``transforms`` argument
to the :ref:`Datasets <datasets>`.

.. TODO: Reader guide, i.e. what to read depending on what you're looking for
.. TODO: add link to getting started guide here.
Start here
----------

Whether you're new to Torchvision transforms, or you're already experienced with
them, we encourage you to start with
:ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py` in
order to learn more about what can be done with the new v2 transforms.

Then, browse the sections in below this page for general information and
performance tips. The available transforms and functionals are listed in the
:ref:`API reference <v2_api_ref>`.

More information and tutorials can also be found in our :ref:`example gallery
<gallery>`, e.g. :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`
or :ref:`sphx_glr_auto_examples_transforms_plot_custom_transforms.py`.

.. _conventions:

Expand Down Expand Up @@ -98,25 +111,21 @@ advantages compared to the v1 ones (in ``torchvision.transforms``):

- They can transform images **but also** bounding boxes, masks, or videos. This
provides support for tasks beyond image classification: detection, segmentation,
video classification, etc.
video classification, etc. See
:ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py`
and :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`.
- They support more transforms like :class:`~torchvision.transforms.v2.CutMix`
and :class:`~torchvision.transforms.v2.MixUp`.
and :class:`~torchvision.transforms.v2.MixUp`. See
:ref:`sphx_glr_auto_examples_transforms_plot_cutmix_mixup.py`.
- They're :ref:`faster <transforms_perf>`.
- They support arbitrary input structures (dicts, lists, tuples, etc.).
- Future improvements and features will be added to the v2 transforms only.

.. TODO: Add link to e2e example for first bullet point.
These transforms are **fully backward compatible** with the v1 ones, so if
you're already using tranforms from ``torchvision.transforms``, all you need to
do to is to update the import to ``torchvision.transforms.v2``. In terms of
output, there might be negligible differences due to implementation differences.

To learn more about the v2 transforms, check out
:ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py`.

.. TODO: make sure link is still good!!
.. note::

The v2 transforms are still BETA, but at this point we do not expect
Expand Down Expand Up @@ -184,7 +193,7 @@ This is very much like the :mod:`torch.nn` package which defines both classes
and functional equivalents in :mod:`torch.nn.functional`.

The functionals support PIL images, pure tensors, or :ref:`TVTensors
<tv_tensors>`, e.g. both ``resize(image_tensor)`` and ``resize(bboxes)`` are
<tv_tensors>`, e.g. both ``resize(image_tensor)`` and ``resize(boxes)`` are
valid.

.. note::
Expand Down Expand Up @@ -248,6 +257,8 @@ be derived from ``torch.nn.Module``.

See also: :ref:`sphx_glr_auto_examples_others_plot_scripted_tensor_transforms.py`.

.. _v2_api_ref:

V2 API reference - Recommended
------------------------------

Expand Down
10 changes: 7 additions & 3 deletions docs/source/tv_tensors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,13 @@ TVTensors

TVTensors are :class:`torch.Tensor` subclasses which the v2 :ref:`transforms
<transforms>` use under the hood to dispatch their inputs to the appropriate
lower-level kernels. Most users do not need to manipulate TVTensors directly and
can simply rely on dataset wrapping - see e.g.
:ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`.
lower-level kernels. Most users do not need to manipulate TVTensors directly.

Refer to
:ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py` for
an introduction to TVTensors, or
:ref:`sphx_glr_auto_examples_transforms_plot_tv_tensors.py` for more advanced
info.

.. autosummary::
:toctree: generated/
Expand Down
2 changes: 2 additions & 0 deletions gallery/README.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
.. _gallery:

Examples and tutorials
======================
13 changes: 13 additions & 0 deletions gallery/transforms/plot_transforms_e2e.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,3 +166,16 @@
print(f"{[type(target) for target in targets] = }")
for name, loss_val in loss_dict.items():
print(f"{name:<20}{loss_val:.3f}")

# %%
# Training References
# -------------------
#
# From there, you can check out the `torchvision references
# <https://github.com/pytorch/vision/tree/main/references>`_ where you'll find
# the actual training scripts we use to train our models.
#
# **Disclaimer** The code in our references is more complex than what you'll
# need for your own use-cases: this is because we're supporting different
# backends (PIL, tensors, TVTensors) and different transforms namespaces (v1 and
# v2). So don't be afraid to simplify and only keep what you need.
2 changes: 2 additions & 0 deletions gallery/transforms/plot_transforms_getting_started.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,8 @@
# can still be transformed by some transforms like
# :class:`~torchvision.transforms.v2.SanitizeBoundingBoxes`!).
#
# .. _transforms_datasets_intercompatibility:
#
# Transforms and Datasets intercompatibility
# ------------------------------------------
#
Expand Down
62 changes: 0 additions & 62 deletions test/test_transforms_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -449,68 +449,6 @@ def test__get_params(self, fill, side_range):
assert 0 <= params["padding"][3] <= (side_range[1] - 1) * h


class TestRandomCrop:
def test_assertions(self):
with pytest.raises(ValueError, match="Please provide only two dimensions"):
transforms.RandomCrop([10, 12, 14])

with pytest.raises(TypeError, match="Got inappropriate padding arg"):
transforms.RandomCrop([10, 12], padding="abc")

with pytest.raises(ValueError, match="Padding must be an int or a 1, 2, or 4"):
transforms.RandomCrop([10, 12], padding=[-0.7, 0, 0.7])

with pytest.raises(TypeError, match="Got inappropriate fill arg"):
transforms.RandomCrop([10, 12], padding=1, fill="abc")

with pytest.raises(ValueError, match="Padding mode should be either"):
transforms.RandomCrop([10, 12], padding=1, padding_mode="abc")

@pytest.mark.parametrize("padding", [None, 1, [2, 3], [1, 2, 3, 4]])
@pytest.mark.parametrize("size, pad_if_needed", [((10, 10), False), ((50, 25), True)])
def test__get_params(self, padding, pad_if_needed, size):
h, w = size = (24, 32)
image = make_image(size)

transform = transforms.RandomCrop(size, padding=padding, pad_if_needed=pad_if_needed)
params = transform._get_params([image])

if padding is not None:
if isinstance(padding, int):
pad_top = pad_bottom = pad_left = pad_right = padding
elif isinstance(padding, list) and len(padding) == 2:
pad_left = pad_right = padding[0]
pad_top = pad_bottom = padding[1]
elif isinstance(padding, list) and len(padding) == 4:
pad_left, pad_top, pad_right, pad_bottom = padding

h += pad_top + pad_bottom
w += pad_left + pad_right
else:
pad_left = pad_right = pad_top = pad_bottom = 0

if pad_if_needed:
if w < size[1]:
diff = size[1] - w
pad_left += diff
pad_right += diff
w += 2 * diff
if h < size[0]:
diff = size[0] - h
pad_top += diff
pad_bottom += diff
h += 2 * diff

padding = [pad_left, pad_top, pad_right, pad_bottom]

assert 0 <= params["top"] <= h - size[0] + 1
assert 0 <= params["left"] <= w - size[1] + 1
assert params["height"] == size[0]
assert params["width"] == size[1]
assert params["needs_pad"] is any(padding)
assert params["padding"] == padding


class TestGaussianBlur:
def test_assertions(self):
with pytest.raises(ValueError, match="Kernel size should be a tuple/list of two integers"):
Expand Down
21 changes: 0 additions & 21 deletions test/test_transforms_v2_consistency.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,26 +304,6 @@ def __init__(
],
closeness_kwargs={"rtol": 1e-5, "atol": 1e-5},
),
ConsistencyConfig(
v2_transforms.RandomCrop,
legacy_transforms.RandomCrop,
[
ArgsKwargs(12),
ArgsKwargs((15, 17)),
NotScriptableArgsKwargs(11, padding=1),
ArgsKwargs(11, padding=[1]),
ArgsKwargs((8, 13), padding=(2, 3)),
ArgsKwargs((14, 9), padding=(0, 2, 1, 0)),
ArgsKwargs(36, pad_if_needed=True),
ArgsKwargs((7, 8), fill=1),
NotScriptableArgsKwargs(5, fill=(1, 2, 3)),
ArgsKwargs(12),
NotScriptableArgsKwargs(15, padding=2, padding_mode="edge"),
ArgsKwargs(17, padding=(1, 0), padding_mode="reflect"),
ArgsKwargs(8, padding=(3, 0, 0, 1), padding_mode="symmetric"),
],
make_images_kwargs=dict(DEFAULT_MAKE_IMAGES_KWARGS, sizes=[(26, 26), (18, 33), (29, 22)]),
),
ConsistencyConfig(
v2_transforms.RandomPerspective,
legacy_transforms.RandomPerspective,
Expand Down Expand Up @@ -558,7 +538,6 @@ def test_call_consistency(config, args_kwargs):
(v2_transforms.RandomResizedCrop, ArgsKwargs(make_image(), scale=[0.3, 0.7], ratio=[0.5, 1.5])),
(v2_transforms.ColorJitter, ArgsKwargs(brightness=None, contrast=None, saturation=None, hue=None)),
(v2_transforms.GaussianBlur, ArgsKwargs(0.3, 1.4)),
(v2_transforms.RandomCrop, ArgsKwargs(make_image(size=(61, 47)), output_size=(19, 25))),
(v2_transforms.RandomPerspective, ArgsKwargs(23, 17, 0.5)),
(v2_transforms.AutoAugment, ArgsKwargs(5)),
]
Expand Down
57 changes: 0 additions & 57 deletions test/test_transforms_v2_functional.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,63 +576,6 @@ def _compute_affine_matrix(angle_, translate_, scale_, shear_, center_):
return true_matrix


@pytest.mark.parametrize("device", cpu_and_cuda())
@pytest.mark.parametrize(
"format",
[tv_tensors.BoundingBoxFormat.XYXY, tv_tensors.BoundingBoxFormat.XYWH, tv_tensors.BoundingBoxFormat.CXCYWH],
)
@pytest.mark.parametrize(
"top, left, height, width, expected_bboxes",
[
[8, 12, 30, 40, [(-2.0, 7.0, 13.0, 27.0), (38.0, -3.0, 58.0, 14.0), (33.0, 38.0, 44.0, 54.0)]],
[-8, 12, 70, 40, [(-2.0, 23.0, 13.0, 43.0), (38.0, 13.0, 58.0, 30.0), (33.0, 54.0, 44.0, 70.0)]],
],
)
def test_correctness_crop_bounding_boxes(device, format, top, left, height, width, expected_bboxes):

# Expected bboxes computed using Albumentations:
# import numpy as np
# from albumentations.augmentations.crops.functional import crop_bbox_by_coords, normalize_bbox, denormalize_bbox
# expected_bboxes = []
# for in_box in in_boxes:
# n_in_box = normalize_bbox(in_box, *size)
# n_out_box = crop_bbox_by_coords(
# n_in_box, (left, top, left + width, top + height), height, width, *size
# )
# out_box = denormalize_bbox(n_out_box, height, width)
# expected_bboxes.append(out_box)

format = tv_tensors.BoundingBoxFormat.XYXY
canvas_size = (64, 76)
in_boxes = [
[10.0, 15.0, 25.0, 35.0],
[50.0, 5.0, 70.0, 22.0],
[45.0, 46.0, 56.0, 62.0],
]
in_boxes = torch.tensor(in_boxes, device=device)
if format != tv_tensors.BoundingBoxFormat.XYXY:
in_boxes = convert_bounding_box_format(in_boxes, tv_tensors.BoundingBoxFormat.XYXY, format)

expected_bboxes = clamp_bounding_boxes(
tv_tensors.BoundingBoxes(expected_bboxes, format="XYXY", canvas_size=canvas_size)
).tolist()

output_boxes, output_canvas_size = F.crop_bounding_boxes(
in_boxes,
format,
top,
left,
canvas_size[0],
canvas_size[1],
)

if format != tv_tensors.BoundingBoxFormat.XYXY:
output_boxes = convert_bounding_box_format(output_boxes, format, tv_tensors.BoundingBoxFormat.XYXY)

torch.testing.assert_close(output_boxes.tolist(), expected_bboxes)
torch.testing.assert_close(output_canvas_size, canvas_size)


@pytest.mark.parametrize("device", cpu_and_cuda())
def test_correctness_vertical_flip_segmentation_mask_on_fixed_input(device):
mask = torch.zeros((3, 3, 3), dtype=torch.long, device=device)
Expand Down
Loading

0 comments on commit 31d6f2b

Please sign in to comment.