Skip to content

Commit

Permalink
improve readme and bugfix notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
biplovbhandari committed Jun 12, 2024
1 parent 6d443c4 commit 9219f70
Show file tree
Hide file tree
Showing 4 changed files with 882 additions and 738 deletions.
223 changes: 209 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,21 +35,21 @@ For quickly running this, we have already prepared and exported the training dat
*Note: If you're looking to produce your own datasets, you can follow this [notebook](https://colab.research.google.com/drive/1LCFLeSCu969wIW8TD68-j4hfIu7WiRIK?usp=sharing) which was used to produce these training, testing, and validation datasets provided here.*

```sh
cd DATADIR
!gsutil -m cp -r gs://dl-book/chapter-1/* DATADIR
mkdir DATADIR
gsutil -m cp -r gs://dl-book/chapter-1/dnn_planet_wo_indices/* DATADIR
```

The folder has several dataset inside it. If you're running a U-Net model, you can use the sub-folder `unet_256x256_planet_wo_indices`; for DNN model, you can use the sub-folder `dnn_planet_wo_indices`.
The parent folder (if you run `gsutil -m cp -r gs://dl-book/chapter-1/* DATADIR`) has several dataset inside it. We use `dnn_planet_wo_indices` here because it is lightweight data and would be much faster to run. If you want to test U-Net model, you can use `unet_256x256_planet_wo_indices` folder instead. Each of these have training, testing, and validation sub-folder inside them.

The datadir is constructed as `DATADIR = BASEDIR + DATADIR` \
OUTPUT_DIR (str): The base directory where the output would be saved.

You can then change the `MODEL_TYPE` which you are running. The default is "unet". You can change to "dnn" if you are using that model.
You can then change the `MODEL_TYPE` which you are running. The default is "unet", so we need to change it to "dnn" since we will be training using that model.

*Note current version does not expose all the model intracacies through the environment file but future version may include those depending on the need.*

```
MODEL_TYPE = "unet"
MODEL_TYPE = "dnn"
```

Next define the `FEATURES` and the `LABELS` variables. This dataset was prepared for the rice mapping application that uses before and during growing season information. So for this dataset, here are the example `FEATURES`. Note the `FEATURES` should be in the same format as shown below
Expand All @@ -65,13 +65,13 @@ blue_during
nir_during"
```

Similarly, since the dataset has a single label, it is going to be
Similarly, since the dataset has a single label, it is going to be as below. If you have mutiple labels, be sure to format them as the `FEATURES` above.

```
LABELS = ["class"]
LABELS = "class"
```

The `PATCH_SHAPE` for the training datasets needs to be changed as well:
While the DNN would not need this setting, if you want to run U-Net model, the `PATCH_SHAPE` for the training datasets needs to be changed as well:

```
PATCH_SHAPE = (256, 256)
Expand All @@ -97,16 +97,211 @@ After all the settings been correctly setup, here's an example of how to use the
from aces.config import Config
from aces.model_trainer import ModelTrainer

if __name__ == "__main__":
config_file = "config.env"
config = Config(config_file)
trainer = ModelTrainer(config)
trainer.train_model()
config_file = "config.env"
config = Config(config_file)
trainer = ModelTrainer(config)
trainer.train_model()
```

Once the model is finished running, it saves the trained model, evaluate the results on the testing dataset and saves it, produces the plots and saves it, and saves any other needed item on the `MODEL_DIR`, which is constructed as `OUTPUT_DIR + MODEL_DIR_NAME`.

For inference, refer to this [notebook](https://colab.research.google.com/drive/1wfzvIcpkjI4lT1oEADehD_Kp6iJD6hWr?authuser=2#scrollTo=hdFXAWSM7vKQ), scroll to `Inference using Saved U-Net Model` on how to do it.
For inference, you would need to get the images in a right format. To export images in the right way, refer to this [notebook](https://github.com/SERVIR/servir-aces/blob/main/notebook/export_image_for_prediction.ipynb), but for this, we already have the images prepared in the right way. You can download them in a similar manner you downloaded the training datasets.

```sh
mkdir IMAGEDIR
gsutil -m cp -r gs://dl-book/chapter-1/images/* IMAGEDIR
```

There are few more settings that needs to be changed before we run the predictions. One of that would be changing the `OUTPUT_NAME`. This is the name of the output prediction for GEE asset, locally (in TF Format) and gcs output (in TFRecord format). You would also need a GCP Project to push the output from GCS to GEE for the prediction. Similarly, you need GCP Bucket to store your prediction, and finally the EE_OUTPUT_ASSET to the as an output path for the prediction asset.

```
OUTPUT_NAME = "prediction_dnn_v1"
GCS_PROJECT = "your-gcs-project"
GCS_BUCKET = "your-bucket"
EE_OUTPUT_ASSET = "your-gee-output-asset-path"
```

You will have to run the configuration settings again for the new configuration to take place.

```python
from aces.config import Config

config_file = "config.env"
config = Config(config_file, override=True)
```

We can then start constructing the actual path for the output file using the `OUTPUT_NAME` as, and print it to view.

```python
OUTPUT_IMAGE_FILE = str(config.MODEL_DIR / "prediction" / f"{config.OUTPUT_NAME}.TFRecord")
print(f"OUTPUT_IMAGE_FILE: {OUTPUT_IMAGE_FILE}")
```

Now let's get all the files inside the `IMAGEDIR`, and then separate out our actual image files and the JSON mixer file. The JSON mixer file inside the `IMAGEDIR` is generated when exporting images from GEE as TFRecords. This is a simple JSON file for defining the georeferencing of the patches.

```python
import glob

image_files_list = []
json_file = None

for f in glob.glob(f"{IMGDIR}/*"):
if f.endswith(".tfrecord.gz"):
image_files_list.append(f)
elif f.endswith(".json"):
json_file = f

# Make sure the files are in the right order.
image_files_list.sort()
```

Next, we will load the trained model and look at the model summary. The trained model is stored within the trained-model subdirectory in the MODEL_DIR.

```python
import tensorflow as tf

print(f"Loading model from {str(config.MODEL_DIR)}/trained-model")
this_model = tf.keras.models.load_model(f"{str(config.MODEL_DIR)}/trained-model")

print(this_model.summary())
```

Now let's get the relevant info from the JSON mixer file.

```python
import json

with open(json_file, encoding='utf-8') as jm: mixer = json.load(jm)

# Get relevant info from the JSON mixer file.
patch_width = mixer["patchDimensions"][0]
patch_height = mixer["patchDimensions"][1]
patches = mixer["totalPatches"]
patch_dimensions_flat = [patch_width * patch_height, 1]
```

Next let's create a TFDataset from our images as:

```python

def parse_image(example_proto):
columns = [
tf.io.FixedLenFeature(shape=patch_dimensions_flat, dtype=tf.float32) for k in config.FEATURES
]
image_features_dict = dict(zip(config.FEATURES, columns))
return tf.io.parse_single_example(example_proto, image_features_dict)


# Create a dataset from the TFRecord file(s).
image_dataset = tf.data.TFRecordDataset(image_files_list, compression_type="GZIP")
image_dataset = image_dataset.map(parse_image, num_parallel_calls=5)

# Break our long tensors into many little ones.
image_dataset = image_dataset.flat_map(
lambda features: tf.data.Dataset.from_tensor_slices(features)
)

# Turn the dictionary in each record into a tuple without a label.
image_dataset = image_dataset.map(
lambda data_dict: (tf.transpose(list(data_dict.values())), )
)

image_dataset = image_dataset.batch(patch_width * patch_height)
```

Finally, let's perform the prediction.

```python
predictions = this_model.predict(image_dataset, steps=patches, verbose=1)
print(f"predictions shape: {predictions.shape}")
```

Now let's write this predictions on the file.

```python

from pathlib import Path
import numpy as np

# Create the target directory if it doesn't exist
Path(OUTPUT_IMAGE_FILE).parent.mkdir(parents=True, exist_ok=True)

print(f"Writing predictions to {OUTPUT_IMAGE_FILE} ...")
writer = tf.io.TFRecordWriter(OUTPUT_IMAGE_FILE)

# Every patch-worth of predictions we"ll dump an example into the output
# file with a single feature that holds our predictions. Since our predictions
# are already in the order of the exported data, the patches we create here
# will also be in the right order.
patch = [[], [], [], [], [], []]

cur_patch = 1

for i, prediction in enumerate(predictions):
patch[0].append(int(np.argmax(prediction)))
patch[1].append(prediction[0][0])
patch[2].append(prediction[0][1])
patch[3].append(prediction[0][2])
patch[4].append(prediction[0][3])
patch[5].append(prediction[0][4])


if i == 0:
print(f"prediction.shape: {prediction.shape}")

if (len(patch[0]) == patch_width * patch_height):
if cur_patch % 100 == 0:
print("Done with patch " + str(cur_patch) + " of " + str(patches) + "...")

example = tf.train.Example(
features=tf.train.Features(
feature={
"prediction": tf.train.Feature(
int64_list=tf.train.Int64List(
value=patch[0])),
"cropland_etc": tf.train.Feature(
float_list=tf.train.FloatList(
value=patch[1])),
"rice": tf.train.Feature(
float_list=tf.train.FloatList(
value=patch[2])),
"forest": tf.train.Feature(
float_list=tf.train.FloatList(
value=patch[3])),
"urban": tf.train.Feature(
float_list=tf.train.FloatList(
value=patch[4])),
"others_etc": tf.train.Feature(
float_list=tf.train.FloatList(
value=patch[5])),
}
)
)

# Write the example to the file and clear our patch array so it"s ready for
# another batch of class ids
writer.write(example.SerializeToString())
patch = [[], [], [], [], [], []]
cur_patch += 1

writer.close()
```

Now we have write the prediction to the `OUTPUT_IMAGE_FILE`. You can upload this to GEE for visualization. To do this, you will need to upload to GCP and then to GEE.

You can upload to GCP using gsutil. The `OUTPUT_GCS_PATH` can be any path inside the `GCS_BUCKET` (e.g. `OUTPUT_GCS_PATH = f"gs://{config.GCS_BUCKET}/{config.OUTPUT_NAME}.TFRecord"`)

```sh
gsutil cp "{OUTPUT_IMAGE_FILE}" "{OUTPUT_GCS_PATH}"
```

Once the file is available on GCP, you can then upload to earthengine using `earthengine` commnad line.

```sh
earthengine upload image --asset_id={config.EE_OUTPUT_ASSET}/{config.OUTPUT_NAME} --pyramiding_policy=mode {OUTPUT_GCS_PATH} {json_file}
```

***Note: The inferencing is also available on this [notebook](https://github.com/SERVIR/servir-aces/blob/main/notebook/Rice_Mapping_Bhutan_2021.ipynb), scroll to `Inference using Saved U-Net Model` or `Inference using Saved DNN Model` depending upon which model you're using.***

## Contributing
Contributions to ACES are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
Expand Down
Loading

0 comments on commit 9219f70

Please sign in to comment.