draft creating datasets for sentinelbench #2604

wangyi111 · 2025-02-23T20:30:44Z

This is a draft of PRs to add datasets from SentinelBench. This specific PR starts from the SenBench-Cloud-S3 dataset for cloud segmentation. It also initializes a wrapper to dynamically load a dataset from SentinelBench with the dataset name. The latter is just a rough idea, not sure if it's good or not.

wangyi111 · 2025-02-23T20:33:39Z

@adamjstewart

adamjstewart · 2025-02-23T23:15:59Z

My first thoughts on seeing these:

Each dataset itself is a lot of code, so it makes sense to have each one in a separate file
We may want some kind of base class containing shared code (e.g., download, extract, len)

So as to avoid torchgeo/datasets/ becoming too cluttered, I would propose a torchgeo/datasets/senbench/ directory. This could look like:

torchgeo/datasets/
    __init__.py: from senbench import *
    senbench/
        __init__.py: from * import *, wrapper dataset
        base.py: shared base class if necessary
        cloud_s3.py: class SenBenchCloudS3
        lulc_s3.py: class SenBenchLC100SegS3
        ...

Here, we wouldn't actually import import *, we would import each individual class from each file.

This will be our first big meta dataset/benchmarking suite. Since it's the first, we have more flexibility, but also more decisions to make. Let's mull this over for a bit and I'll consult the other TorchGeo devs.

adamjstewart · 2025-02-24T08:34:45Z

Could also organize subfiles by level or sensor. Need to look at the full table again to think about how to organize.

wangyi111 · 2025-02-24T10:49:56Z

My first thoughts on seeing these:

Each dataset itself is a lot of code, so it makes sense to have each one in a separate file

We may want some kind of base class containing shared code (e.g., download, extract, len)

So as to avoid torchgeo/datasets/ becoming too cluttered, I would propose a torchgeo/datasets/senbench/ directory. This could look like:
torchgeo/datasets/
    __init__.py: from senbench import *
    senbench/
        __init__.py: from * import *, wrapper dataset
        base.py: shared base class if necessary
        cloud_s3.py: class SenBenchCloudS3
        lulc_s3.py: class SenBenchLC100SegS3
        ...
Here, we wouldn't actually import import *, we would import each individual class from each file.

This will be our first big meta dataset/benchmarking suite. Since it's the first, we have more flexibility, but also more decisions to make. Let's mull this over for a bit and I'll consult the other TorchGeo devs.

this sounds a good idea!

draft creating datasets for sentinelbench

2ab82d6

github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels Feb 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft creating datasets for sentinelbench #2604

draft creating datasets for sentinelbench #2604

wangyi111 commented Feb 23, 2025

wangyi111 commented Feb 23, 2025

adamjstewart commented Feb 23, 2025

adamjstewart commented Feb 24, 2025

wangyi111 commented Feb 24, 2025

draft creating datasets for sentinelbench #2604

Are you sure you want to change the base?

draft creating datasets for sentinelbench #2604

Conversation

wangyi111 commented Feb 23, 2025

wangyi111 commented Feb 23, 2025

adamjstewart commented Feb 23, 2025

adamjstewart commented Feb 24, 2025

wangyi111 commented Feb 24, 2025