This repository contains the code used in the following paper:
Khallaghi, S., Abedi, R., Abou Ali, H., Asipunu, M., Alatise, I., Ha, N., Luo, B., Mai, C., Song, L., Wussah, A., Xiong, S., Zhang, Q., Estes, L. (2024). Generalization enhancement strategies to enable cross-year cropland mapping with convolutional neural networks trained using historical samples. ArXiv.
The package currently supports these semantic segmentation models:
- U-Net
- DeepLab v3
- DeepLab v3+
- Pyramid Scene Parsing Network (PSPNet)
- ExFuse
- Global Convolutional Network (GCN)
To run the repo you need to:
- modify the
config/default_config.yaml
to suit your project's parameters. - Follow the step-by-step guide in
notebooks/train_prediction_main.ipynb
to execute the workflow.
Our data preparation protocol is designed to work with composite images and is sorted into specific label groups for training efficiency.
Based on current protocol, the project uses two image composites of 2022 × 2022built separately from growing-season and off-season time series, which are in size of a tile ( 2000 × 2000) plus a buffer of 11 pixels on each side. However, the labels are in grid size of 200 x 200, and was sorted into 4 groups:
-
label_group = 0 -- labels that are not reviewed
-
label_group = 2 -- labels have both positive and negative categories, while the correctly classified positive category is between 65% and 80%
-
label_group = 3 -- labels have both positive and negative categories, while the correctly classified positive category is over 80%
-
label_group = 4 -- labels have only negative categories, but it's overall accuracy is 100%
The deeplearner
package uses csv files to load data. Therefore, two catalogs are required besides the raw images and labels. One is for train and validation, while another one is for prediction:
-
catalog for train and validation
-
It contains at least 4 groups of columns:
- columns for image directories, could be either a relative path to a data folder, or a full path in aws s3, starting with
s3://
- a column for label directories, could be either a relative path to a data folder, or a full path in aws s3, starting with
s3://
- a column named
usage
, where the usage value istrain
orvalidate
- a column named
label_group
- columns for image directories, could be either a relative path to a data folder, or a full path in aws s3, starting with
-
Here‘s an example of the table format, where
dir_gs
anddir_os
are directories to images anddir_label
is directories to labelsname usage dir_gs dir_os dir_label label_group GH0242195 train images/planet/nonfix/GS/tile539785_736815_736967.tif images/planet/nonfix/OS/tile539785_737029_737118.tif labels/semantic_segmentation/accurate/GH0242195_3241_5699.tif 3 GH0288657 validate images/planet/nonfix/GS/tile539959_736815_736967.tif images/planet/nonfix/OS/tile539959_737029_737118.tif labels/semantic_segmentation/accurate/GH0288657_3385_5774.tif 3
-
-
catalog for prediction
-
It contains at least 3 groups of columns:
- columns for image directories, could be either a relative path to a data folder, or a full path in aws s3, starting with
s3://
- columns for image directories, could be either a relative path to a data folder, or a full path in aws s3, starting with
-
two columns for naming the output, where output would be named as
score_{col1}_{col2}.tif
- a column named
type
, specifying whether each row is acenter
image whose prediction would be written out, or aneighbor
image
- a column named
-
Also an example of the table format . Here I use
dir_gs
anddir_os
as directories to images, andtile_col
andtile_row
as the naming columns to keep the naming system consistent withlearner
tile_col tile_row dir_gs dir_os type 320 560 images/planet/fix/GS/tile539601_736815_736967.tif images/planet/fix/OS/tile539601_737029_737118.tif center 321 560 images/planet/fix/GS/tile539602_736815_736967.tif images/planet/fix/OS/tile539602_737029_737118.tif neighbor
-
The training dataset, along with csv catalogs for replicating the results in our paper, is available at [dataset access link].