-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update datasets to use API #6126
Update datasets to use API #6126
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6126 +/- ##
==========================================
- Coverage 99.38% 99.38% -0.01%
==========================================
Files 450 451 +1
Lines 42619 42670 +51
==========================================
+ Hits 42359 42408 +49
- Misses 260 262 +2 ☔ View full report in Codecov by Sentry. |
….com:PennyLaneAI/pennylane into sc-70918-pennylane-oss-uses-new-datasets-api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the decision regarding the old list_datasets
? If we're deprecating it, it should raise a deprecation warning, and both the changelog and docs/development/deprecations.rst
should mention this deprecation.
We'll be maintaining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
….com:PennyLaneAI/pennylane into sc-70918-pennylane-oss-uses-new-datasets-api
Thanks for testing this out and leaving this useful feedback @DSGuala! To address your points:
I've updated this to give a general error message:
Since retrieving the specific values available for each parameter presents a similar issue to the problem with updating
This could be quite involved given the new formatting of the download IDs. Users can also specify these paths manually if desired. Given this is lower priority and considering the current time constraints this could be updated in a followup if we still want it.
Good catch, this was overlooked. A fix has been added now thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for addressing everything quickly 🙏
**Context:** Datasets are currently downloaded by hitting the S3 bucket (via CloudFront) where the `.h5` files are stored. This PR updates the way datasets are downloaded by directing download requests to the Software Cloud managed Datasets Service. **Description of the Change:** The `qml.data.load` function now queries the Datasets API in order to download datasets. Additionally: - Updates parameter formatting for the new API - Removes URL escaping functionality since the download URLS retrieved from the API are already escaped **Benefits:** - Removes almost all dependency on the `foldermap` and `data_struct` files, alleviating the need to manually manage them. >⚠️ There is a lingering `foldermap` dependency for the `list_datasets()` function which will likely be removed in `0.40.0` - Facilitates tracking dataset downloads for analytics. **Possible Drawbacks:** - Introduces network dependency for accessing the external API. **Related GitHub Issues:** --------- Co-authored-by: Paul Finlay <50180049+doctorperceptron@users.noreply.github.com>
Context:
Datasets are currently downloaded by hitting the S3 bucket (via CloudFront) where the
.h5
files are stored. This PR updates the way datasets are downloaded by directing download requests to the Software Cloud managed Datasets Service.Description of the Change:
The
qml.data.load
function now queries the Datasets API in order to download datasets. Additionally:Benefits:
foldermap
anddata_struct
files, alleviating the need to manually manage them.Possible Drawbacks:
Related GitHub Issues: