Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft creating datasets for sentinelbench #2604

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wangyi111
Copy link
Contributor

This is a draft of PRs to add datasets from SentinelBench. This specific PR starts from the SenBench-Cloud-S3 dataset for cloud segmentation. It also initializes a wrapper to dynamically load a dataset from SentinelBench with the dataset name. The latter is just a rough idea, not sure if it's good or not.

@github-actions github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels Feb 23, 2025
@wangyi111
Copy link
Contributor Author

@adamjstewart

@adamjstewart
Copy link
Collaborator

My first thoughts on seeing these:

  • Each dataset itself is a lot of code, so it makes sense to have each one in a separate file
  • We may want some kind of base class containing shared code (e.g., download, extract, len)

So as to avoid torchgeo/datasets/ becoming too cluttered, I would propose a torchgeo/datasets/senbench/ directory. This could look like:

torchgeo/datasets/
    __init__.py: from senbench import *
    senbench/
        __init__.py: from * import *, wrapper dataset
        base.py: shared base class if necessary
        cloud_s3.py: class SenBenchCloudS3
        lulc_s3.py: class SenBenchLC100SegS3
        ...

Here, we wouldn't actually import import *, we would import each individual class from each file.

This will be our first big meta dataset/benchmarking suite. Since it's the first, we have more flexibility, but also more decisions to make. Let's mull this over for a bit and I'll consult the other TorchGeo devs.

@adamjstewart
Copy link
Collaborator

Could also organize subfiles by level or sensor. Need to look at the full table again to think about how to organize.

@wangyi111
Copy link
Contributor Author

My first thoughts on seeing these:

  • Each dataset itself is a lot of code, so it makes sense to have each one in a separate file
  • We may want some kind of base class containing shared code (e.g., download, extract, len)

So as to avoid torchgeo/datasets/ becoming too cluttered, I would propose a torchgeo/datasets/senbench/ directory. This could look like:

torchgeo/datasets/
    __init__.py: from senbench import *
    senbench/
        __init__.py: from * import *, wrapper dataset
        base.py: shared base class if necessary
        cloud_s3.py: class SenBenchCloudS3
        lulc_s3.py: class SenBenchLC100SegS3
        ...

Here, we wouldn't actually import import *, we would import each individual class from each file.

This will be our first big meta dataset/benchmarking suite. Since it's the first, we have more flexibility, but also more decisions to make. Let's mull this over for a bit and I'll consult the other TorchGeo devs.

this sounds a good idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants