Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(datasets): Remove tracking datasets which are used in Kedro Viz Experiment Tracking #969

Merged
merged 5 commits into from
Jan 8, 2025

Conversation

ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla commented Dec 17, 2024

Description

Resolves kedro-org/kedro#4370

NOTE: This should be merged after kedro-datasets 6.0.0 release

Development notes

  • Remove docs references
  • Remove json_dataset and metrics_dataset
  • Remove pytests related to tracking datasets

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Updated jsonschema/kedro-catalog-X.XX.json if necessary
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes
  • Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

@ravi-kumar-pilla ravi-kumar-pilla changed the title Remove tracking datasets which are used in Kedro Viz Experiment Tracking chore(datasets): Remove tracking datasets which are used in Kedro Viz Experiment Tracking Dec 17, 2024
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review December 18, 2024 18:28
Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋🏾

Copy link
Member

@DimedS DimedS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ravi-kumar-pilla !

@ravi-kumar-pilla
Copy link
Contributor Author

Hi @ankatiyar and @DimedS ,

The docs build fails as dask deprecated the usage of dask.dataframe and shifted to dask-expr. As a temporary fix in this PR, I have updated the docstring to stable doc link. I will create another ticket if we decide on migrating to dask-expr. Let me know your thoughts.

Thank you

@ankatiyar
Copy link
Contributor

@ravi-kumar-pilla The docs fix seems fine for now, if the dataset itself is working fine. We can create an issue for the migration!

@DimedS
Copy link
Member

DimedS commented Jan 7, 2025

@ravi-kumar-pilla The docs fix seems fine for now, if the dataset itself is working fine. We can create an issue for the migration!

I agree, thanks @ravi-kumar-pilla and @ankatiyar !

@ravi-kumar-pilla
Copy link
Contributor Author

Hi @ankatiyar and @DimedS ,

Thanks for the response. After further reading, I found that Dask-Expr is a new backend for Dask DataFrame that provides query optimization and other performance improvements. Starting with Dask version 2024.3, it became the default backend.

I don't think there is a need for migration from our side unless we plan to opt out of the optimization by setting the below config -

import dask
dask.config.set({'dataframe.query-planning': False})

Note

The above config is also set to be removed as it had some issues mentioned here

As per the docs, they actually updated the docs link as mentioned here. So the fix in this PR is suffice.

Thank you

@ravi-kumar-pilla ravi-kumar-pilla merged commit 159e0a3 into main Jan 8, 2025
12 of 13 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the chore/remove-et-ds branch January 8, 2025 00:48
CF-FHB-X pushed a commit to CF-FHB-X/kedro-plugins that referenced this pull request Feb 18, 2025
… Experiment Tracking (kedro-org#969)

* remove et related kedro datasets

* update release note and static json schema

* temporary doc fix

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>
ElenaKhaustova added a commit that referenced this pull request Feb 21, 2025
* build(datasets): Release 6.0.0 (#968)

release draft

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* chore(datasets): Remove tracking datasets which are used in Kedro Viz Experiment Tracking (#969)

* remove et related kedro datasets

* update release note and static json schema

* temporary doc fix

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* docs(datasets): Move to linkcode extension (#985)

Move to linkcode extension

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* fix(datasets): Fix polars.CSVDataset `save` on Windows (#979)

* test csv win

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>

* change ci yaml for testing

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>

* change ci yaml for testing

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>

* add default encoding when opening file

* revert workflow tests

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>

* fix lint

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>

* update release note

* update release note

---------

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* feat(all): Replace trufflehog with detect-secrets (#983)

* Removed trufflehog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated github actions per plugin

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated release notes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated validate-pr check scopes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated lint command

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added key to trigger check

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated GH action to track per plugin

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed secret

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated GH for kedro-datasets

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated secrets baseline

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

---------

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* build(datasets): use intersphinx over type_targets (#801)

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* fix(datasets): Add parameter to enable/disable lazy saving for `PartitionedDataset` (#978)

* Replaced callable check

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updateds lazy_save test

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added test_callable_save

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs links

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed all docs links

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated release notes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed all docs links

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed typo

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added argument to disable lazy saving

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed save function argument

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated unit test

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated related docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Revert test changes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated baseline

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated release notes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated release notes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

---------

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* fix(datasets): use kwarg for Ibis `read_*` methods (#1005)

* fix(datasets): use kwarg for Ibis `read_*` methods

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Update RELEASE.md

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* build(datasets): pin PyArrow until `19.0.1` is out (#1006)

* build(datasets): pin PyArrow until `19.0.1` is out

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* chore(datasets): exclude `19.0.0` instead of bound

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* build(datasets): update list of extras for Ibis 10 (#1003)

* build(datasets): update list of extras for Ibis 10

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Update RELEASE.md

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* chore: remove internal devtools from release notes (#1004)

* chore: remove internal devtools from release notes

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* chore: remove internal devtools from release notes

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* chore: remove internal devtools from release notes

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* chore: remove internal devtools from release notes

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Fixed case where MemoryDatasets in catalog wouldn't trigger `_is_memory_dataset`

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Tests to ensure that MemoryDatasets are passed in mocked data catalog

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Changelog

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Linting fixes

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* build(datasets): update list of extras for Ibis 10 (#1003)

* build(datasets): update list of extras for Ibis 10

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Update RELEASE.md

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Richard <CF-FHB-X@users.noreply.github.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* chore: remove internal devtools from release notes (#1004)

* chore: remove internal devtools from release notes

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* chore: remove internal devtools from release notes

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* chore: remove internal devtools from release notes

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* chore: remove internal devtools from release notes

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Richard <CF-FHB-X@users.noreply.github.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Tests to ensure that MemoryDatasets are passed in mocked data catalog

Signed-off-by: Richard <CF-FHB-X@users.noreply.github.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Linting fixes

Signed-off-by: Richard <CF-FHB-X@users.noreply.github.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Changed function according to PR comments

Signed-off-by: Richard <CF-FHB-X@users.noreply.github.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Tests to ensure that MemoryDatasets are passed in mocked data catalog

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Linting fixes

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>
Signed-off-by: Richard <CF-FHB-X@users.noreply.github.com>
Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* 998: Tweaked release

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>

* Update RELEASE.md

Co-authored-by: ElenaKhaustova <157851531+ElenaKhaustova@users.noreply.github.com>
Signed-off-by: Richard <CF-FHB-X@users.noreply.github.com>

---------

Signed-off-by: Richard Asselin <richard.asselin@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Richard <CF-FHB-X@users.noreply.github.com>
Co-authored-by: Ravi Kumar Pilla <ravi_kumar_pilla@mckinsey.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: ElenaKhaustova <157851531+ElenaKhaustova@users.noreply.github.com>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kedro-datasets: remove MetricsTrackingDataset and JSONTrackingDataset
3 participants