Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check WDPA url also a month forward #811

Merged
merged 7 commits into from
Dec 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Upcoming Release

* Rule ``retrieve_irena`` get updated values for renewables capacities.

* Rule ``retrieve_wdpa`` updated to not only check for current and previous, but also potentially next months dataset availability.

* Split configuration to enable SMR and SMR CC.

* The configuration setting for country focus weights when clustering the
Expand Down
34 changes: 23 additions & 11 deletions rules/retrieve.smk
Original file line number Diff line number Diff line change
Expand Up @@ -246,30 +246,42 @@ if config["enable"]["retrieve"]:


if config["enable"]["retrieve"]:
current_month = datetime.now().strftime("%b")
current_year = datetime.now().strftime("%Y")
bYYYY = f"{current_month}{current_year}"
# Some logic to find the correct file URL
# Sometimes files are released delayed or ahead of schedule, check which file is currently available

def check_file_exists(url):
response = requests.head(url)
return response.status_code == 200

url = f"https://d1gam3xoknrgr2.cloudfront.net/current/WDPA_{bYYYY}_Public.zip"
# Basic pattern where WDPA files can be found
url_pattern = (
"https://d1gam3xoknrgr2.cloudfront.net/current/WDPA_{bYYYY}_Public.zip"
)

if not check_file_exists(url):
prev_month = (datetime.now() - timedelta(30)).strftime("%b")
bYYYY = f"{prev_month}{current_year}"
assert check_file_exists(
f"https://d1gam3xoknrgr2.cloudfront.net/current/WDPA_{bYYYY}_Public.zip"
), "The file does not exist."
# 3-letter month + 4 digit year for current/previous/next month to test
current_monthyear = datetime.now().strftime("%b%Y")
prev_monthyear = (datetime.now() - timedelta(30)).strftime("%b%Y")
next_monthyear = (datetime.now() + timedelta(30)).strftime("%b%Y")

# Test prioritised: current month -> previous -> next
for bYYYY in [current_monthyear, prev_monthyear, next_monthyear]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the same note, if config["enable"]["retrieve"] is false, then bYYYY doesnt exist in add_electricity.smk maybe we can add the same code there ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Generally, the downloaded file should drop the month in the filename. This is because otherwise, a new month would trigger the whole workflow to re-execute and invalidate all existing (intermediate) results.

I will prepare a separate pull request.

if check_file_exists(url := url_pattern.format(bYYYY=bYYYY)):
break
else:
# If None of the three URLs are working
url = False

assert (
url
), f"No WDPA files found at {url_pattern} for bY='{current_monthyear}, {prev_monthyear}, or {next_monthyear}'"

# Downloading protected area database from WDPA
# extract the main zip and then merge the contained 3 zipped shapefiles
# Website: https://www.protectedplanet.net/en/thematic-areas/wdpa
rule download_wdpa:
input:
HTTP.remote(
f"d1gam3xoknrgr2.cloudfront.net/current/WDPA_{bYYYY}_Public_shp.zip",
url,
static=True,
keep_local=True,
),
Expand Down
Loading