Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide fate of -viz starters and viz tool #263

Open
2 tasks
astrojuanlu opened this issue Feb 7, 2025 · 10 comments
Open
2 tasks

Decide fate of -viz starters and viz tool #263

astrojuanlu opened this issue Feb 7, 2025 · 10 comments
Assignees

Comments

@astrojuanlu
Copy link
Member

Hi @ankatiyar and @DimedS , I am not sure if it is worth to keep the viz starters after removing ET. The difference I see are the plotly charts. I feel we can keep the viz starters and rename them to spaceflights-pandas and spaceflights-pyspark to get the plots. What do you think ?

Originally posted by @ravi-kumar-pilla in #262 (comment)

About keeping "Viz" as a tool in kedro new, initially the idea was to add make it add kedro-viz as a dependency, but... it was already a dependency of the non -viz starters!

Therefore we need to

  • Agree on having only spaceflights-pandas and spaceflights-pyspark, optionally merging the contents of -viz
  • Agree on what to do with the viz tool in kedro new
@astrojuanlu
Copy link
Member Author

astrojuanlu commented Feb 13, 2025

Difference between picking the Viz tool no/yes, including 1-5 and example pipeline in both cases:

--- tree_no_viz.txt	2025-02-13 10:36:20
+++ tree_yes_viz.txt	2025-02-13 10:36:28
@@ -1,4 +1,4 @@
-kedro-no-viz/
+kedro-yes-viz/
 ├── README.md
 ├── conf
 │   ├── README.md
@@ -6,7 +6,8 @@
 │   │   ├── catalog.yml
 │   │   ├── parameters.yml
 │   │   ├── parameters_data_processing.yml
-│   │   └── parameters_data_science.yml
+│   │   ├── parameters_data_science.yml
+│   │   └── parameters_reporting.yml
 │   ├── local
 │   │   └── credentials.yml
 │   └── logging.yml
@@ -30,7 +31,7 @@
 ├── pyproject.toml
 ├── requirements.txt
 ├── src
-│   └── kedro_no_viz
+│   └── kedro_yes_viz
 │       ├── __init__.py
 │       ├── __main__.py
 │       ├── pipeline_registry.py
@@ -40,7 +41,11 @@
 │       │   │   ├── __init__.py
 │       │   │   ├── nodes.py
 │       │   │   └── pipeline.py
-│       │   └── data_science
+│       │   ├── data_science
+│       │   │   ├── __init__.py
+│       │   │   ├── nodes.py
+│       │   │   └── pipeline.py
+│       │   └── reporting
 │       │       ├── __init__.py
 │       │       ├── nodes.py
 │       │       └── pipeline.py
@@ -53,4 +58,4 @@
     │       └── test_pipeline.py
     └── test_run.py
 
-24 directories, 30 files
+25 directories, 34 files

Relevant content differences between the two:

diff --color=auto -u kedro-no-viz/pyproject.toml kedro-yes-viz/pyproject.toml
--- kedro-no-viz/pyproject.toml	2025-02-13 10:35:55
+++ kedro-yes-viz/pyproject.toml	2025-02-13 10:36:09
@@ -4,7 +4,7 @@
 
 [project]
 requires-python = ">=3.9"
-name = "kedro_no_viz"
+name = "kedro_yes_viz"
 readme = "README.md"
 dynamic = ["version"]
 dependencies = [
@@ -12,13 +12,14 @@
     "jupyterlab>=3.0",
     "notebook",
     "kedro[jupyter]~=0.19.11",
-    "kedro-datasets[pandas-csvdataset, pandas-exceldataset, pandas-parquetdataset]>=3.0",
+    "kedro-datasets[pandas-csvdataset, pandas-exceldataset, pandas-parquetdataset, plotly-plotlydataset, plotly-jsondataset, matplotlib-matplotlibwriter]>=3.0",
     "kedro-viz>=6.7.0",
-    "scikit-learn~=1.5.1"
+    "scikit-learn~=1.5.1",
+    "seaborn~=0.12.1"
 ]
 
 [project.scripts]
-"kedro-no-viz" = "kedro_no_viz.__main__:main"
+"kedro-yes-viz" = "kedro_yes_viz.__main__:main"
 
 [project.entry-points."kedro.hooks"]
diff --color=auto -u kedro-no-viz/requirements.txt kedro-yes-viz/requirements.txt
--- kedro-no-viz/requirements.txt	2025-02-13 10:35:56
+++ kedro-yes-viz/requirements.txt	2025-02-13 10:36:09
@@ -1,7 +1,8 @@
 ipython>=8.10
 jupyterlab>=3.0
-kedro-datasets[pandas-csvdataset, pandas-exceldataset, pandas-parquetdataset]>=3.0
+kedro-datasets[pandas-csvdataset, pandas-exceldataset, pandas-parquetdataset, plotly-plotlydataset, plotly-jsondataset, matplotlib-matplotlibwriter]>=3.0
 kedro-viz>=6.7.0
 kedro[jupyter]~=0.19.11
 notebook
 scikit-learn~=1.5.1
+seaborn~=0.12.1

So given that we were installing Kedro-Viz in both cases anyway and that the content differences are quite minor, I contend that

  • Kedro-Viz should continue to be installed in all cases,
  • we should add information on the README of the starters on how to launch Kedro-Viz ("run kedro viz run and you will see [screenshot] in the browser")
  • while we're at it, we should mention the VS Code extension too,
  • we should add the Plotly and matplotlib datasets as well as seaborn if the user asks for an example pipeline,
  • and the Viz tool should be removed (given that it will be installed anyway)

If I understand correctly, this means merging spaceflights-pandas-viz with spaceflights-pandas, and spaceflights-pyspark-viz with spaceflights-pyspark.

Thoughts?

@astrojuanlu astrojuanlu moved this from To Do to In Progress in Kedro 🔶 Feb 13, 2025
@Huongg
Copy link

Huongg commented Feb 19, 2025

Hey @astrojuanlu , thanks for all the details on this ticket! I had a chat with @ankatiyar yesterday, and I agree that your suggested approach makes a lot of sense for this task. I’m also aligned on merging spaceflights-pandas-viz with spaceflights-pandas, and spaceflights-pyspark-viz with spaceflights-pyspark. Just to confirm, this is a nice-to-have for the ET removal rather than a blocker. I’m happy to take this on for this sprint and aim to complete it alongside the ET removal release.

@merelcht
Copy link
Member

Removing the viz tool is a breaking change and I would personally consider removing/archiving the starters as a breaking change too. We can do the change on the kedro side on the develop branch, but the starters will have to wait until closer to the 1.0.0 release (otherwise it blocks any further releases in the 19 series).

@astrojuanlu
Copy link
Member Author

My understanding was that we were more lenient with starters changes given that they don't affect existing projects, only new ones?

In any case, this is not a requirement for removing Experiment Tracking, as confirmed by @Huongg, so I'm fine deferring it. The question about our policy on breaking changes in starters & tools remains.

@merelcht
Copy link
Member

As an independent change, yes that's true, but since this is linked to the tool flow I'd say it's a bit different. Rule of thumb: removal of user facing stuff is a breaking change for anything in the framework, and so in this case by extend of the feature it's also a breaking change in starters.

@Huongg
Copy link

Huongg commented Feb 19, 2025

Thanks @merelcht ! Since this ticket introduces breaking changes for both the framework and starters, do the following steps align with your plan?

  1. Remove the Viz tool from the develop branch in framework and add a deprecation warning for the spaceflight-pandas-viz and spaceflight-pyspark-viz in kedro starters.
  2. Closer to the 1.0.0 release, deprecate the Viz tool on the main branch in framework, remove the starters, and merge spaceflight-pandas-viz and spaceflight-pyspark-viz with spaceflights-pandas and spaceflights-pyspark in starters

@merelcht
Copy link
Member

  1. Remove the Viz tool from the develop branch in framework and add a deprecation warning for the spaceflight-pandas-viz and spaceflight-pyspark-viz in kedro starters.
  2. Closer to the 1.0.0 release, deprecate the Viz tool on the main branch in framework, remove the starters, and merge spaceflight-pandas-viz and spaceflight-pyspark-viz with spaceflights-pandas and spaceflights-pyspark in starters

There's not really an easy way to add a deprecation warning to the starters themselves. We can add something to the readme, but the actual warning will need to come from the tool flow. My suggestion would be the following steps:

  1. Add a deprecation warning to the Viz tool that can be released in the next 0.19.x release
  2. Add a deprecation notice in the readme of the viz starters
  3. Remove the Viz tool & flow from the develop branch in framework
  4. Remove the viz starters & merge them with the other spaceflights

@Huongg
Copy link

Huongg commented Feb 21, 2025

Hey @astrojuanlu @stephkaiser, after discussing with Merel today, we both believe that instead of introducing a deprecation warning before removing the Viz tool and Viz starter — which would add extra steps and delay the actual removal post-release 1.0.0 — we could leverage the existing warning instead.

In the next 0.19.x release, this would mean:

  1. Removing the Viz starters and merging them with the other spaceflights.
  2. Removing the Viz tool and flow from framework, and updating the current error message. If a user includes the Viz tool in their setup, the message will guide them to select from the available tools (lint, test, log, etc.) and clarify that the Viz tool has been removed, and the viz tool is automatically included when creating a new Kedro project using kedro new. We'll need your help refining the copy to ensure it's clear for users.

Below is the two screenshots of the current error message that appears when users select something not in the available tools list for reference.

Image Image

What do you think of this approach? If we proceed with it, we should be able to complete this alongside the actual release for experiment tracking in Kedro-Viz, and we won’t need to worry about this change after 1.0.0.

cc @merelcht

@astrojuanlu
Copy link
Member Author

astrojuanlu commented Feb 21, 2025

I think removing the Viz tool goes against what @merelcht said earlier. Maybe we could

  • remove the starters,
  • but we keep the Viz/7 tool option...
    • ...in a way that it has no effect (and we explain it to the user)

?

@merelcht
Copy link
Member

Let me clarify a bit: yes, technically this is a breaking change so the above goes against what I said earlier. However, when talking to @Huongg and trying to figure out what the deprecation message should be, we came to the conclusion that adding a deprecation warning to the tool, might be confusing and lead to a worse user experience than just removing it.

The kedro new flow shows:

Tools
1) Lint: Basic linting with Ruff
2) Test: Basic testing with pytest
3) Log: Additional, environment-specific logging options
4) Docs: A Sphinx documentation setup
5) Data Folder: A folder structure for data management
6) PySpark: Configuration for working with PySpark
7) Kedro-Viz: Kedro's native visualisation tool

So adding (deprecated) or something next to 7) Kedro-Viz, might give the impression that all of Viz is deprecated, when it's only the experiment tracking feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants