Brian Blaylock
October 10, 2018
As part of NOAA's Big Data Project, Amazon makes available NEXRAD, GOES, HRRR, and other data publicly available via Amazon Web Services (AWS). You can use rclone
to access this data and download for your use. (You can even use rclone to access personal OneDrive, Google Drive, Box, and other types of cloud storage.)
Rclone can be installed on any platform.
It can also be easily installed via conda
conda install -c conda-forge rclone
After rclone
has been downloaded and installed, configure a remote by typing rclone config
. Then type n
for new remote
.
Name the remote anything you like, but use a name that will remind you it accesses the public Amazon S3 buckets. I named mine publicAWS
.
Set Type of Storage as Amazon S3 Compliant Storage Providers
which is option 4 for me.
Set Storage Provider as Amazon Web Services S3
which is option 1 for me.
Leave everything else blank (push enter for the remaining prompts).
The prompt will ask you if something like following is correct:
[publicAWS]
type = s3
provider = AWS
env_auth =
access_key_id =
secret_access_key =
region =
endpoint =
location_constraint =
acl =
server_side_encryption =
storage_class =
If it looks right, accept with y
and exit the setup.
This configuration is saved in the ~/.config/rclone.config
file.
You will use the remote you just set up to access NOAA's public buckets on Amazon Web Services S3. Below are the names of some of NOAA's public buckets.
Data | Bucket Name | Documentation |
---|---|---|
GOES16 | noaa-goes16 |
link |
GOES17 | noaa-goes17 |
link |
NEXRAD | noaa-nexrad-level2 |
link |
HRRR | noaa-hrrr-bdp-pds |
link |
Note: bdp-pds stands for Big Data Program Public Data Set
You access the bucket contents by typing the command rclone [command and options] [remote name]:[bucket name]
. Documentation for all the commands and options can be found on the rclone website.
# List bucket directories
rclone lsd publicAWS:noaa-goes16/
# List bucket directories for specific folders
rclone lsd publicAWS:noaa-hrrr-bdp-pds/hrrr.20210101
# List files in bucket
rclone ls publicAWS:noaa-hrrr-bdp-pds/hrrr.20210101/conus
rclone copy publicAWS:noaa-goes16/ABI-L2-MCMIPC/2018/283/00/OR_ABI-L2-MCMIPC-M3_G16_s20182830057203_e20182830059576_c20182830100076.nc ./
The remote gateway for Pando is https://pando-rgw01.chpc.utah.edu. You could config rclone to access HRRR on Pando with
[horelS3]
type = s3
provider = Ceph
env_auth = false
endpoint = https://pando-rgw01.chpc.utah.edu
Then look at the bucket with rclone lsd horelS3:hrrr/
I like to use rclone
to list and access the files within my Python scripts.
To run a basic rclone
command within Python, you might use os.system()
import os
os.system('rclone copy publicAWS:noaa-goes16/.../.../etc. ./this_path_on_my_machine/')
If you want to capture the ouput, such as file names, you might consider using subprocess
.
import subprocess
# Get output from rclone command
files = subprocess.check_output('rclone lsd goes16AWS:noaa-goes16', shell=True)
# Change type from 'bytes' to 'string'
files = files.decode()
# Split files based on the new line and remove the empty item at the end.
files = files.split('\n')
files.remove('')
Alternatively, you can use subprocess.run()
.
a = subprocess.run(['echo','hi'], stdout=subprocess.PIPE).stdout.decode().split('\n')
a.remove('')