icesat2py · betolink · Sep 17, 2024 · Sep 17, 2024 · Sep 18, 2024 · Sep 18, 2024
diff --git a/.github/workflows/unit_test.yml b/.github/workflows/unit_test.yml
@@ -21,7 +21,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        python-version: ["3.9", "3.13"] #NOTE: min and max Python versions supported by icepyx
+        python-version: ["3.11", "3.13"] #NOTE: min and max Python versions supported by icepyx
 
     steps:
       - uses: "actions/checkout@v4"
@@ -33,9 +33,12 @@ jobs:
           python-version: "${{ matrix.python-version }}"
 
       - name: "Run tests"
+        env:
+          EARTHDATA_PASSWORD: "${{ secrets.EARTHDATA_PASSWORD }}"
+          EARTHDATA_USERNAME: ${{ secrets.EARTHDATA_USERNAME }}
+          NSIDC_LOGIN: "${{ secrets.EARTHDATA_PASSWORD }}" # remove this
         run: |
-          pytest icepyx/ --verbose --cov app \
-            --ignore=icepyx/tests/integration
+          pytest icepyx/tests/unit --verbose --cov app
 
       - name: "Upload coverage report"
         uses: "codecov/codecov-action@v5.3.1"

diff --git a/.gitignore b/.gitignore
@@ -120,6 +120,9 @@ venv.bak/
 *.zarr
 *.tif
 
+# harmony orders
+.order_restart
+
 # data file exception for tracking info
 !clones.csv
 !views.csv

diff --git a/doc/source/HARMONY_MIGRATION_NOTES.md b/doc/source/HARMONY_MIGRATION_NOTES.md
@@ -0,0 +1,190 @@
+## Assumptions that are different in Harmony
+
+* We can't use short name and version with Harmony like we do with ECS, we have to use
+  Concept ID (or DOI). We need to get this from CMR using short name and version.
+* Differences in Harmony features
+  * Variable subsetting won't be supported on day 1.
+  * Reprojection and reformatting won't be supported on day 1.
+* All the ICESat-2 products we currently support will not be supported on day 1.
+    * <https://nsidc.atlassian.net/wiki/spaces/DAACSW/pages/222593028/ICESat-2+data+sets+and+versions+we+are+supporting+for+Harmony>
+* Requests to CMR and ECS share parameters and are made through Python
+  `requests`. Support for harmony will be implemented with `harmony-py` and
+  `earthaccess` will be used for granule search and non-subset orders.
+
+
+## Getting started on development
+
+### Work so far
+
+Work in progress is on the `harmony-take2` branch.
+
+
+### OBE Work and "take2"
+
+Matt Fisher began work on implementing support for Harmony in the `harmony`
+branch. This depends on the `low-hanging-refactors` branch being merged. A PR is
+open.
+
+In addition to this work, refactoring, type checking, and type annotations have
+been added to the codebase to support the migration to Harmony. Many of the
+refactors ended up breaking large swaths of the code. The `harmony` branch is
+OBE because we decided to take a different approach after further analysis.
+
+The initial work assumed that icepyx would be directly making requests (via
+e.g., the `requests` library) to the harmony API. Further development revealed
+that [harmony-py](https://harmony-py.readthedocs.io/en/main/) should be used to
+interact with the harmony API. Moreover, there is a growing realization that
+[earthaccess](https://earthaccess.readthedocs.io/en/latest/) can simplify large
+parts of icepyx as well.
+
+As these developments began to be worked into the existing code, it became more
+clear that more was being "broken" than added. Icepyx's code has a lot of
+handling of various parameters to ensure that they are formatted correctly for
+various APIs. Although this made sense when icepyx was first developed,
+`harmony-py` and `earthaccess` can replace much of this code.
+
+Instead of ripping out/refactoring large chunks of existing code in the `query`
+and `granules` modules, "take2" (the `harmony-take2` branch) strives to
+replicate existing functionality exposed by icepyx through the development of
+new classes "from scratch".
+
+E.g,. the `queryv2` module provides a `QueryV2` class that should replicate the
+functionality of the `query.Query` class that is designed to interact with CMR
+and the NSIDC EGI ordering system that is being decommissioned.
+
+This approach lets the existing code and tests continue to work as expected
+while parallel functionality is developed using `harmony-py` and
+`earthaccess`. As this development progresses, tests can be migrated to use the
+new class.
+
+
+### Familiarize with Harmony
+
+* Check out this amazing notebook provided by Amy Steiker and Patrick Quinn:
+  <https://github.com/nasa/harmony/blob/main/docs/Harmony%20API%20introduction.ipynb>
+* Review the interactive API documentation:
+  <https://harmony.uat.earthdata.nasa.gov/docs/api/> (remember, remove UAT from URL if
+  Harmony is live with ICESat-2 products in early October 2024)
+* [harmony-py docs](https://harmony-py.readthedocs.io/en/main/)
+
+
+### Watch out for broken assumptions
+
+It's important to note that two major assumptions will require significant refactoring.
+The type annotations will help with this process!
+
+* Broken assumption: "We can query with only short_name and version number". Harmony
+   requires a unique identifier (concept ID or DOI). E.g.:
+   <https://harmony.uat.earthdata.nasa.gov/capabilities?collectionId=C1261703129-EEDTEST>
+   (NOTE: UAT query using a collection from a test provider; we should be using
+   `NSIDC_CUAT` provider in real UAT queries and `NSIDC_CPRD` for real prod queries).
+   Since we want the user to be able to provide short_name and version, implementing the
+   concept ID as a `@cached_property` on `Query` which asks CMR for the concept ID makes
+   sense to me.
+* Broken assumption: Harmony features are equivalent to NSIDC's ECS-based
+  ordering system. As mentioned above, Harmony will not support variable
+  subsetting, reprojection, or reformatting for IS2 collections on day 1. In the
+  future, these features may be implemented in Harmony. For now, we need to
+  update existing code and user documentation to remove references to these
+  features.
+
+
+### Don't forget to enhance along the way
+
+* Now that we're ripping things apart and changing parameters, I think it's important to
+  replace the TypedDict annotations we're using with Pydantic models. This will enable us
+  to better encapsulate validation code that's currently spread around.
+
+
+## Testing with Harmony
+
+
+To run `QueryV2` specific tests (places real Harmony orders and waits for
+results - this can take a while):
+
+
+```
+pytest icepyx/tests/integration/test_queryv2.py
+```
+
+> [!WARNING]
+> I noticed that when running tests with `pytest`, sometimes I would get errors
+> related to earthdata login. I see that `conftest.py` is setup to mock out some
+> login credentials by editing the user's `.netrc`. This results in a broken
+> `.netrc`. I'm not sure how this is intended to work, but I resorted to
+> removing those bits from `conftest.py` to get things working consistently.
+
+Be sure to run the type checker periodically as well:
+
+```
+pyright
+```
+
+Eventually, once the `QueryV2` class has the required features implemented, we
+will plan to replace the existing `Query` class with it. Ideally we can migrate
+other existing tests that are specific to the `Query` class to the new `QueryV2`
+class at that time.
+
+
+## Integrating with other ongoing Icepyx work
+
+Harmony is a major breaking change, so we'll be releasing it in Icepyx v2.
+
+We know the community wants to break the API in some other ways, so we want to include those in v2 as well!
+
+Jessica is currently determining who can help work on these changes, and what that looks like. *If you, the
+Harmony/ECS migration developer, identify opportunities to easily replace portions of Icepyx with _earthaccess_
+or other libraries, take advantage of that opportunity.
+
+## FAQ
+
+### Which API?
+
+Harmony has two APIs:
+
+* [OGC Environmental Data Retrieval API](https://harmony.earthdata.nasa.gov/docs/edr-api)
+* [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/)
+
+Which should be used and when and why?
+
+
+#### "Answer"
+
+Use the [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/)!
+
+> My take is that we ought to focus on the Coverages API for ICESat-2, since we aren’t
+> making use of the new parameters. And this is what they primarily support. But I don’t
+> have a good handle on whether we ought to pursue the EDR API at any point.
+>
+> - Amy Steiker
+
+See this thread on EOSDIS Slack for more details:
+
+<https://nsidc.slack.com/archives/CLC2SR1S6/p1716482829956969>
+
+
+## Remaining tasks
+
+Remaining tasks for "take2" development include:
+
+* Implement support for full-granule orders via `earthaccess`
+* Check user inputs against supported harmony services. E.g., see `is2ref`
+  module.
+* Review documentation and jupyter notebooks for outdated information
+  * Remove references to "NSIDC" ordering service.
+  * Remove references to variable subsetting (this is not currently supported in
+    Harmony, but may be in the future. No definitive plans yet).
+  * Update references to OBE concepts like `reqparams` and `subsetparams`.
+  * Cleanup references to reprojection/reformatting without subsetting
+    * E.g., the [subsetting
+      notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access2-subsetting.html)
+    indicates that reformatting and reprojection are options.
+* Migrate existing tests to use new query class and the harmony/earthaccess approach.
+* Updates to support cloud access (e.g., see [these docs on how to access data via s3](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_cloud_data_access.html))
+  * Ideally the underlying code is updated to use `earthaccess`.
+  * There might not be much that needs to be done here.
+* Update
+  [QUEST](https://icepyx.readthedocs.io/en/latest/example_notebooks/QUEST_argo_data_access.html)-related
+  code to enable support for the new query class.
+  * It looks like QUEST support is currently limited to [Argo](https://argo.ucsd.edu/about/) data.
+* Support passing other subsetting kwargs to harmony-py
diff --git a/doc/source/example_notebooks/IS2_cloud_data_access.ipynb b/doc/source/example_notebooks/IS2_cloud_data_access.ipynb
@@ -131,7 +131,7 @@
    },
    "outputs": [],
    "source": [
-    "reg.order_vars.avail()"
+    "reg.order_vars"
    ]
   },
   {
@@ -359,7 +359,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "icepyx",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -373,7 +373,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.7"
+   "version": "3.11.11"
   }
  },
  "nbformat": 4,

diff --git a/doc/source/example_notebooks/IS2_data_access.ipynb b/doc/source/example_notebooks/IS2_data_access.ipynb
@@ -344,8 +344,14 @@
    },
    "outputs": [],
    "source": [
-    "# region_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-01','2019-02-28'], \n",
-    "#                            start_time='00:00:00', end_time='23:59:59', version='002')"
+    "region_a = ipx.Query(short_name, spatial_extent, date_range, \\\n",
+    "   start_time='03:30:00', end_time='21:30:00')\n",
+    "\n",
+    "print(region_a.product)\n",
+    "print(region_a.dates)\n",
+    "print(region_a.product_version)\n",
+    "print(region_a.spatial)\n",
+    "print(region_a.temporal)"
    ]
   },
   {
@@ -450,19 +456,6 @@
     "region_a.avail_granules(ids=True)"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true,
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "#print detailed information about the returned search results\n",
-    "region_a.granules.avail"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -482,44 +475,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Additional Parameters and Subsetting\n",
-    "\n",
-    "Once we have generated our session, we must build the required configuration parameters needed to actually download data. These will tell the system how we want to download the data. As with the CMR search parameters, these will be built automatically when you run `region_a.order_granules()`, but you can also create and view them with `region_a.reqparams`. The default parameters, given below, should work for most users.\n",
-    "- `page_size` = 2000. This is the number of granules we will request per order.\n",
-    "- `page_num` = 1. Determine the number of pages based on page size and the number of granules available. If no page_num is specified, this calculation is done automatically to set page_num, which then provides the number of individual orders we will request given the number of granules.\n",
-    "- `request_mode` = 'async'\n",
-    "- `agent` = 'NO'\n",
-    "- `include_meta` = 'Y'\n",
-    "\n",
-    "#### More details about the configuration parameters\n",
-    "`request_mode` is \"asynchronous\" by default, which allows concurrent requests to be queued and processed without the need for a continuous connection between you and the API endpoint.\n",
-    "In contrast, using a \"synchronous\" `request_mode` means that the request relies on a direct, continuous connection between you and the API endpoint.\n",
-    "Outputs are directly downloaded, or \"streamed\", to your working directory.\n",
-    "For this tutorial, we will set the request mode to asynchronous.\n",
-    "\n",
-    "**Use the streaming `request_mode` with caution: While it can be beneficial to stream outputs directly to your local directory, note that timeout errors can result depending on the size of the request, and your request will not be queued in the system if NSIDC is experiencing high request volume. For best performance, NSIDC recommends setting `page_size=1` to download individual outputs, which will eliminate extra time needed to zip outputs and will ensure faster processing times per request.**\n",
-    "\n",
-    "Recall that we queried the total number and volume of granules prior to applying customization services. `page_size` and `page_num` can be used to adjust the number of granules per request up to a limit of 2000 granules for asynchronous, and 100 granules for synchronous (streaming). For now, let's select 9 granules to be processed in each zipped request. For ATL06, the granule size can exceed 100 MB so we want to choose a granule count that provides us with a reasonable zipped download size. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "print(region_a.reqparams)\n",
-    "# region_a.reqparams['page_size'] = 9\n",
-    "# print(region_a.reqparams)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Subsetting\n",
+    "#### Subsetting **NEEDS EDITING**\n",
     "\n",
     "In addition to the required parameters (CMRparams and reqparams) that are submitted with our order, for ICESat-2 data products we can also submit subsetting parameters to NSIDC.\n",
     "For a deeper dive into subsetting, please see our [Subsetting Tutorial Notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access2-subsetting.html), which covers subsetting in more detail, including how to get a list of subsetting options, how to build your list of subsetting parameters, and how to generate a list of desired variables (most datasets have more than 200 variable fields!), including using pre-built default lists (these lists are still in progress and we welcome contributions!).\n",
@@ -545,7 +501,7 @@
    },
    "outputs": [],
    "source": [
-    "region_a.subsetparams()"
+    "region_a.CMRparams"
    ]
   },
   {
@@ -564,8 +520,9 @@
    },
    "outputs": [],
    "source": [
-    "region_a.order_granules()\n",
-    "# region_a.order_granules(verbose=True, subset=False, email=False)"
+    "order = region_a.order_granules()\n",
+    "# region_a.order_granules(verbose=True, subset=False, email=False)\n",
+    "order"
    ]
   },
   {
@@ -576,8 +533,7 @@
    },
    "outputs": [],
    "source": [
-    "#view a short list of order IDs\n",
-    "region_a.granules.orderIDs"
+    "order.status()"
    ]
   },
   {
@@ -613,7 +569,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "icepyx",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -627,7 +583,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.7"
+   "version": "3.11.11"
   }
  },
  "nbformat": 4,