Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmony integration: the story continues. Do Not Merge! #657

Draft
wants to merge 38 commits into
base: development
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
0aae498
Add migration notes document
mfisher87 Sep 17, 2024
8977a44
Document open questions and another assumption
mfisher87 Sep 17, 2024
7b18097
Add concept_id cached property to Query class
mfisher87 Sep 18, 2024
4a4adfe
Remove redundant type info in docstring
mfisher87 Sep 18, 2024
2aed2f0
Remove OBE docstring fragments
mfisher87 Sep 18, 2024
f4d5c85
Remove 1.x deprecation warning
mfisher87 Sep 18, 2024
93f5665
Add note about Harmony's ICESat-2 live date
mfisher87 Sep 25, 2024
9378a8d
Answer open question about multiple Harmony APIs
mfisher87 Sep 25, 2024
29c3952
Add a quick start to harmony migration
mfisher87 Sep 26, 2024
95d1e7e
Mention other ongoing icepyx efforts
mfisher87 Sep 27, 2024
a129884
Init harmony module with class for interacting with API
trey-stafford Nov 14, 2024
f039831
WIP init new module for v2 query capabilities
trey-stafford Nov 14, 2024
6d21bb0
Fixup `get_concept_id` UAT usage
trey-stafford Nov 18, 2024
4019ec3
WIP fixup harmony api module
trey-stafford Nov 18, 2024
7473d9d
Create new base class for query objects
trey-stafford Nov 18, 2024
07d9ca8
Init implementation of harmony support in v2 query class
trey-stafford Nov 19, 2024
948ca37
WIP support polygon subsetting with harmony
trey-stafford Nov 21, 2024
0b087ac
Update harmony migration notes for "take2" development
trey-stafford Dec 5, 2024
89a4f74
initial refactoring for harmony support
betolink Jan 25, 2025
0158168
continue refactoring
betolink Jan 27, 2025
ddab53b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 4, 2025
b150dcb
fixing things, deprecating python 3.9
betolink Feb 14, 2025
ee3152a
Merge remote-tracking branch 'origin' into harmony-integration
betolink Feb 14, 2025
65d365a
fix merge conflicts
betolink Feb 14, 2025
4e93ab6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 14, 2025
904bb60
fix paths
betolink Feb 14, 2025
cce5f26
Merge remote-tracking branch 'origin/harmony-integration' into harmon…
betolink Feb 14, 2025
58f6e2c
fix types
betolink Feb 14, 2025
09ff7ea
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 14, 2025
47b4f50
testing CI
betolink Feb 21, 2025
38c33a9
Merge pull request #652 from betolink/harmony-integration
betolink Feb 21, 2025
67e380d
merge development and resolve conflicts
betolink Feb 21, 2025
bbfc94b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 21, 2025
6bbd770
fix orders with no cycles
betolink Feb 21, 2025
007b786
Merge branch 'harmony-take2' of github.com:icesat2py/icepyx into harm…
betolink Feb 21, 2025
7f11ee8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 21, 2025
f03f5e3
Added "mission" to intro notes
tande-tw Feb 26, 2025
8ee2251
Removed "mission" from intro
tande-tw Feb 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions .github/workflows/unit_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.13"] #NOTE: min and max Python versions supported by icepyx
python-version: ["3.11", "3.13"] #NOTE: min and max Python versions supported by icepyx

steps:
- uses: "actions/checkout@v4"
Expand All @@ -33,9 +33,12 @@ jobs:
python-version: "${{ matrix.python-version }}"

- name: "Run tests"
env:
EARTHDATA_PASSWORD: "${{ secrets.EARTHDATA_PASSWORD }}"
EARTHDATA_USERNAME: ${{ secrets.EARTHDATA_USERNAME }}
NSIDC_LOGIN: "${{ secrets.EARTHDATA_PASSWORD }}" # remove this
run: |
pytest icepyx/ --verbose --cov app \
--ignore=icepyx/tests/integration
pytest icepyx/tests/unit --verbose --cov app

- name: "Upload coverage report"
uses: "codecov/codecov-action@v5.3.1"
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,9 @@ venv.bak/
*.zarr
*.tif

# harmony orders
.order_restart

# data file exception for tracking info
!clones.csv
!views.csv
Expand Down
190 changes: 190 additions & 0 deletions doc/source/HARMONY_MIGRATION_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
## Assumptions that are different in Harmony

* We can't use short name and version with Harmony like we do with ECS, we have to use
Concept ID (or DOI). We need to get this from CMR using short name and version.
* Differences in Harmony features
* Variable subsetting won't be supported on day 1.
* Reprojection and reformatting won't be supported on day 1.
* All the ICESat-2 products we currently support will not be supported on day 1.
* <https://nsidc.atlassian.net/wiki/spaces/DAACSW/pages/222593028/ICESat-2+data+sets+and+versions+we+are+supporting+for+Harmony>
* Requests to CMR and ECS share parameters and are made through Python
`requests`. Support for harmony will be implemented with `harmony-py` and
`earthaccess` will be used for granule search and non-subset orders.


## Getting started on development

### Work so far

Work in progress is on the `harmony-take2` branch.


### OBE Work and "take2"

Matt Fisher began work on implementing support for Harmony in the `harmony`
branch. This depends on the `low-hanging-refactors` branch being merged. A PR is
open.

In addition to this work, refactoring, type checking, and type annotations have
been added to the codebase to support the migration to Harmony. Many of the
refactors ended up breaking large swaths of the code. The `harmony` branch is
OBE because we decided to take a different approach after further analysis.

The initial work assumed that icepyx would be directly making requests (via
e.g., the `requests` library) to the harmony API. Further development revealed
that [harmony-py](https://harmony-py.readthedocs.io/en/main/) should be used to
interact with the harmony API. Moreover, there is a growing realization that
[earthaccess](https://earthaccess.readthedocs.io/en/latest/) can simplify large
parts of icepyx as well.

As these developments began to be worked into the existing code, it became more
clear that more was being "broken" than added. Icepyx's code has a lot of
handling of various parameters to ensure that they are formatted correctly for
various APIs. Although this made sense when icepyx was first developed,
`harmony-py` and `earthaccess` can replace much of this code.

Instead of ripping out/refactoring large chunks of existing code in the `query`
and `granules` modules, "take2" (the `harmony-take2` branch) strives to
replicate existing functionality exposed by icepyx through the development of
new classes "from scratch".

E.g,. the `queryv2` module provides a `QueryV2` class that should replicate the
functionality of the `query.Query` class that is designed to interact with CMR
and the NSIDC EGI ordering system that is being decommissioned.

This approach lets the existing code and tests continue to work as expected
while parallel functionality is developed using `harmony-py` and
`earthaccess`. As this development progresses, tests can be migrated to use the
new class.


### Familiarize with Harmony

* Check out this amazing notebook provided by Amy Steiker and Patrick Quinn:
<https://github.com/nasa/harmony/blob/main/docs/Harmony%20API%20introduction.ipynb>
* Review the interactive API documentation:
<https://harmony.uat.earthdata.nasa.gov/docs/api/> (remember, remove UAT from URL if
Harmony is live with ICESat-2 products in early October 2024)
* [harmony-py docs](https://harmony-py.readthedocs.io/en/main/)


### Watch out for broken assumptions

It's important to note that two major assumptions will require significant refactoring.
The type annotations will help with this process!

* Broken assumption: "We can query with only short_name and version number". Harmony
requires a unique identifier (concept ID or DOI). E.g.:
<https://harmony.uat.earthdata.nasa.gov/capabilities?collectionId=C1261703129-EEDTEST>
(NOTE: UAT query using a collection from a test provider; we should be using
`NSIDC_CUAT` provider in real UAT queries and `NSIDC_CPRD` for real prod queries).
Since we want the user to be able to provide short_name and version, implementing the
concept ID as a `@cached_property` on `Query` which asks CMR for the concept ID makes
sense to me.
* Broken assumption: Harmony features are equivalent to NSIDC's ECS-based
ordering system. As mentioned above, Harmony will not support variable
subsetting, reprojection, or reformatting for IS2 collections on day 1. In the
future, these features may be implemented in Harmony. For now, we need to
update existing code and user documentation to remove references to these
features.


### Don't forget to enhance along the way

* Now that we're ripping things apart and changing parameters, I think it's important to
replace the TypedDict annotations we're using with Pydantic models. This will enable us
to better encapsulate validation code that's currently spread around.


## Testing with Harmony


To run `QueryV2` specific tests (places real Harmony orders and waits for
results - this can take a while):


```
pytest icepyx/tests/integration/test_queryv2.py
```

> [!WARNING]
> I noticed that when running tests with `pytest`, sometimes I would get errors
> related to earthdata login. I see that `conftest.py` is setup to mock out some
> login credentials by editing the user's `.netrc`. This results in a broken
> `.netrc`. I'm not sure how this is intended to work, but I resorted to
> removing those bits from `conftest.py` to get things working consistently.

Be sure to run the type checker periodically as well:

```
pyright
```

Eventually, once the `QueryV2` class has the required features implemented, we
will plan to replace the existing `Query` class with it. Ideally we can migrate
other existing tests that are specific to the `Query` class to the new `QueryV2`
class at that time.


## Integrating with other ongoing Icepyx work

Harmony is a major breaking change, so we'll be releasing it in Icepyx v2.

We know the community wants to break the API in some other ways, so we want to include those in v2 as well!

Jessica is currently determining who can help work on these changes, and what that looks like. *If you, the
Harmony/ECS migration developer, identify opportunities to easily replace portions of Icepyx with _earthaccess_
or other libraries, take advantage of that opportunity.

## FAQ

### Which API?

Harmony has two APIs:

* [OGC Environmental Data Retrieval API](https://harmony.earthdata.nasa.gov/docs/edr-api)
* [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/)

Which should be used and when and why?


#### "Answer"

Use the [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/)!

> My take is that we ought to focus on the Coverages API for ICESat-2, since we aren’t
> making use of the new parameters. And this is what they primarily support. But I don’t
> have a good handle on whether we ought to pursue the EDR API at any point.
>
> - Amy Steiker

See this thread on EOSDIS Slack for more details:

<https://nsidc.slack.com/archives/CLC2SR1S6/p1716482829956969>


## Remaining tasks

Remaining tasks for "take2" development include:

* Implement support for full-granule orders via `earthaccess`
* Check user inputs against supported harmony services. E.g., see `is2ref`
module.
* Review documentation and jupyter notebooks for outdated information
* Remove references to "NSIDC" ordering service.
* Remove references to variable subsetting (this is not currently supported in
Harmony, but may be in the future. No definitive plans yet).
* Update references to OBE concepts like `reqparams` and `subsetparams`.
* Cleanup references to reprojection/reformatting without subsetting
* E.g., the [subsetting
notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access2-subsetting.html)
indicates that reformatting and reprojection are options.
* Migrate existing tests to use new query class and the harmony/earthaccess approach.
* Updates to support cloud access (e.g., see [these docs on how to access data via s3](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_cloud_data_access.html))
* Ideally the underlying code is updated to use `earthaccess`.
* There might not be much that needs to be done here.
* Update
[QUEST](https://icepyx.readthedocs.io/en/latest/example_notebooks/QUEST_argo_data_access.html)-related
code to enable support for the new query class.
* It looks like QUEST support is currently limited to [Argo](https://argo.ucsd.edu/about/) data.
* Support passing other subsetting kwargs to harmony-py
6 changes: 3 additions & 3 deletions doc/source/example_notebooks/IS2_cloud_data_access.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@
},
"outputs": [],
"source": [
"reg.order_vars.avail()"
"reg.order_vars"
]
},
{
Expand Down Expand Up @@ -359,7 +359,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "icepyx",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -373,7 +373,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
"version": "3.11.11"
}
},
"nbformat": 4,
Expand Down
76 changes: 16 additions & 60 deletions doc/source/example_notebooks/IS2_data_access.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -344,8 +344,14 @@
},
"outputs": [],
"source": [
"# region_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-01','2019-02-28'], \n",
"# start_time='00:00:00', end_time='23:59:59', version='002')"
"region_a = ipx.Query(short_name, spatial_extent, date_range, \\\n",
" start_time='03:30:00', end_time='21:30:00')\n",
"\n",
"print(region_a.product)\n",
"print(region_a.dates)\n",
"print(region_a.product_version)\n",
"print(region_a.spatial)\n",
"print(region_a.temporal)"
]
},
{
Expand Down Expand Up @@ -450,19 +456,6 @@
"region_a.avail_granules(ids=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"#print detailed information about the returned search results\n",
"region_a.granules.avail"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand All @@ -482,44 +475,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Additional Parameters and Subsetting\n",
"\n",
"Once we have generated our session, we must build the required configuration parameters needed to actually download data. These will tell the system how we want to download the data. As with the CMR search parameters, these will be built automatically when you run `region_a.order_granules()`, but you can also create and view them with `region_a.reqparams`. The default parameters, given below, should work for most users.\n",
"- `page_size` = 2000. This is the number of granules we will request per order.\n",
"- `page_num` = 1. Determine the number of pages based on page size and the number of granules available. If no page_num is specified, this calculation is done automatically to set page_num, which then provides the number of individual orders we will request given the number of granules.\n",
"- `request_mode` = 'async'\n",
"- `agent` = 'NO'\n",
"- `include_meta` = 'Y'\n",
"\n",
"#### More details about the configuration parameters\n",
"`request_mode` is \"asynchronous\" by default, which allows concurrent requests to be queued and processed without the need for a continuous connection between you and the API endpoint.\n",
"In contrast, using a \"synchronous\" `request_mode` means that the request relies on a direct, continuous connection between you and the API endpoint.\n",
"Outputs are directly downloaded, or \"streamed\", to your working directory.\n",
"For this tutorial, we will set the request mode to asynchronous.\n",
"\n",
"**Use the streaming `request_mode` with caution: While it can be beneficial to stream outputs directly to your local directory, note that timeout errors can result depending on the size of the request, and your request will not be queued in the system if NSIDC is experiencing high request volume. For best performance, NSIDC recommends setting `page_size=1` to download individual outputs, which will eliminate extra time needed to zip outputs and will ensure faster processing times per request.**\n",
"\n",
"Recall that we queried the total number and volume of granules prior to applying customization services. `page_size` and `page_num` can be used to adjust the number of granules per request up to a limit of 2000 granules for asynchronous, and 100 granules for synchronous (streaming). For now, let's select 9 granules to be processed in each zipped request. For ATL06, the granule size can exceed 100 MB so we want to choose a granule count that provides us with a reasonable zipped download size. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"print(region_a.reqparams)\n",
"# region_a.reqparams['page_size'] = 9\n",
"# print(region_a.reqparams)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Subsetting\n",
"#### Subsetting **NEEDS EDITING**\n",
"\n",
"In addition to the required parameters (CMRparams and reqparams) that are submitted with our order, for ICESat-2 data products we can also submit subsetting parameters to NSIDC.\n",
"For a deeper dive into subsetting, please see our [Subsetting Tutorial Notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access2-subsetting.html), which covers subsetting in more detail, including how to get a list of subsetting options, how to build your list of subsetting parameters, and how to generate a list of desired variables (most datasets have more than 200 variable fields!), including using pre-built default lists (these lists are still in progress and we welcome contributions!).\n",
Expand All @@ -545,7 +501,7 @@
},
"outputs": [],
"source": [
"region_a.subsetparams()"
"region_a.CMRparams"
]
},
{
Expand All @@ -564,8 +520,9 @@
},
"outputs": [],
"source": [
"region_a.order_granules()\n",
"# region_a.order_granules(verbose=True, subset=False, email=False)"
"order = region_a.order_granules()\n",
"# region_a.order_granules(verbose=True, subset=False, email=False)\n",
"order"
]
},
{
Expand All @@ -576,8 +533,7 @@
},
"outputs": [],
"source": [
"#view a short list of order IDs\n",
"region_a.granules.orderIDs"
"order.status()"
]
},
{
Expand Down Expand Up @@ -613,7 +569,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "icepyx",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -627,7 +583,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
"version": "3.11.11"
}
},
"nbformat": 4,
Expand Down
Loading
Loading