-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor download tests #7546
refactor download tests #7546
Conversation
@@ -84,47 +78,45 @@ def inner_wrapper(request, *args, **kwargs): | |||
|
|||
@contextlib.contextmanager | |||
def log_download_attempts( | |||
urls_and_md5s=None, | |||
file="utils", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 1. in top comment.
@@ -84,47 +78,45 @@ def inner_wrapper(request, *args, **kwargs): | |||
|
|||
@contextlib.contextmanager | |||
def log_download_attempts( | |||
urls_and_md5s=None, | |||
file="utils", | |||
patch=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 2. in top comment.
urls_and_md5s=None, | ||
file="utils", | ||
patch=True, | ||
mock_auxiliaries=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 3. in top comment.
download_url_mocks = [] | ||
download_file_from_google_drive_mocks = [] | ||
for module in [dataset_module, "utils"]: | ||
maybe_add_mock(module=module, name="download_url", stack=stack, lst=download_url_mocks) | ||
maybe_add_mock( | ||
module=module, | ||
name="download_file_from_google_drive", | ||
stack=stack, | ||
lst=download_file_from_google_drive_mocks, | ||
) | ||
maybe_add_mock(module=module, name="extract_archive", stack=stack) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolution for 1. in top comment
@pytest.mark.parametrize(**make_parametrize_kwargs(itertools.chain())) | ||
def test_file_downloads_correctly(url, md5): | ||
retry(lambda: assert_file_downloads_correctly(url, md5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was never run due to the empty parametrization
lambda: datasets.Places365(ROOT, split=split, small=small, download=True), | ||
name=f"Places365, {split}, {'small' if small else 'large'}", | ||
file="places365", | ||
return itertools.chain.from_iterable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a style nit: Instead of itertools.chain(*[])
we now use itertools.chain.from_iterable([])
which is made for this very specific use case.
return itertools.chain( | ||
*[ | ||
collect_download_configs( | ||
lambda: datasets.Places365(ROOT, split=split, small=small, download=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ever only used the callable to parametrize the dataset class call. Thus, we can simplify this by accepting the class as well as the args and kwargs. This also allows us to automatically detect the dataset module, since this is available through the class. Note that we can't use functools.partial
here for this reason.
name=f"Places365, {split}, {'small' if small else 'large'}", | ||
file="places365", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of them can now be auto-detected
kitti(), | ||
places365(), | ||
) | ||
def stanford_cars(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Driveby for #7545 (comment). Issue was not detected since we didn't test for it before. I'm wondering if we should add a test that checks if all datasets that have a download
parameter are present here. We could also just check this manually now and enforce it through the review if a new dataset is added. However, this is what we were supposed to do before and it seems we didn't do it.
@pytest.mark.parametrize( | ||
**make_parametrize_kwargs( | ||
itertools.chain( | ||
sbu(), # https://github.com/pytorch/vision/issues/7005 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to "mixed download usage" (see 1. in top comment), we never detected that this download works again through this test. The issue is long closed, but the test never failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pmeier , this is mostly a stamp, I only looked at your comments. Didn't check the code in details.
Ci is toasted in general, but the download tests are green: https://github.com/pytorch/vision/actions/runs/4860402390/jobs/8664179225 |
Hey @pmeier! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Reviewed By: vmoens Differential Revision: D45522828 fbshipit-source-id: 8aaef29d49b656035c8c859036c252499a6145c4
Our current download tests for the datasets are overcomplicated in some parts:
torchvision.datasets.utils
or in any other module (usually the module in which the dataset is defined). This is problematic, because some datasets mixdownload_url
anddownload_and_extract_archive
. Meaning, we have to patch download functionality in multiple places.extract_archive
. This is never used.This PR does the following to resolve the points above:
Since the diff is pretty large, I'll highlight the important parts inline.