-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve running kedro as part of an automated workflow (CI/CD) #1640
Comments
There are probably interesting repercussions to recommending a workflow like this. Anybody who followed such a workflow would contribute to the telemetry stats, adding a lot of automated runs (no different to adding a I think it would be good to take this opportunity to distinguish CI runs in telemetry. Maybe can read an environment variable; most CI systems (GitHub Actions, CircleCI, Travis CI, etc.) follow the convention of setting I think the one question this leaves is, what do we do if users want to opt out of telemetry in the CI. Is it OK to have this be documented, but not prompted by default, or does this introduce some risk? |
Totally agree with everything @deepyaman said here. If we want to recommend in documentation a command that programmatically disables telemetry in CI then let's go for the clearer But using the |
Another thought I had this morning was to enable telemetry by default in CI environments, but to be even more conservative in what's collected by I would advocate as capturing the |
From a data perspective I would still like to see to understand how many Kedro projects are being run with CI, because it helps give us a sense of projects that have moved into production systems. It would be great to figure out how to identify a "user" that is CI but also still observe the telemetry consent that the user set. |
We discussed this issue in a Technical Design session and decided on the following:
To do:
|
Today there was a question about this and I just wanted to confirm that the "Do you opt into usage analytics? [y/N]:` prompt does not block non-interactive terminals. Example: FROM python:3.10
RUN pip install kedro
WORKDIR /src
RUN echo "project_name: Spaceflights" > kedro.yml
RUN kedro new --config=kedro.yml --starter=spaceflights
WORKDIR /src/spaceflights
RUN pip install -r src/requirements.txt
RUN kedro run $ docker build . --platform linux/amd64 -t kedro-test:dev
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
Install the buildx component to build images with BuildKit:
https://docs.docker.com/go/buildx/
Sending build context to Docker daemon 796.3MB
Step 1/8 : FROM python:3.10
---> 700e6de1daef
Step 2/8 : RUN pip install kedro
...
Step 8/8 : RUN kedro run
---> Running in e6b62eb0a4cf
As an open-source project, we collect usage analytics.
We cannot see nor store information contained in a Kedro project.
You can find out more by reading our privacy notice:
https://github.com/kedro-org/kedro-plugins/tree/main/kedro-telemetry#privacy-notice
Do you opt into usage analytics? [y/N]: [06/22/23 07:05:15] WARNING Failed to confirm consent. No data plugin.py:265
was sent to Heap. Exception:
[06/22/23 07:05:15] INFO Kedro project spaceflights session.py:359
As an open-source project, we collect usage analytics.
We cannot see nor store information contained in a Kedro project.
You can find out more by reading our privacy notice:
https://github.com/kedro-org/kedro-plugins/tree/main/kedro-telemetry#privacy-notice
Do you opt into usage analytics? [y/N]: WARNING Failed to confirm consent. No data plugin.py:265
was sent to Heap. Exception:
INFO Loading data from 'companies' data_catalog.py:345
(CSVDataSet)...
INFO Running node: node.py:331
... |
For me it does keep hanging in a Databricks job, with Kedro==0.18.7 installed. |
Thanks @juanchodpg2, good to know. You can create a |
@juanchodpg2 reports that creating a We might need to bump the priority of this. Getting a solid consent process is a blocker for kedro-org/kedro-plugins#375 |
Let's continue the conversation about kedro-telemetry being blocked on Databricks and other places in kedro-org/kedro-plugins#484 |
Also most CI platforms include the |
My original example on #1640 (comment) had a mistake, it didn't install FROM python:3.10
RUN pip install kedro kedro-telemetry
WORKDIR /src
RUN kedro new --name=spaceflights --starter=spaceflights-pandas
WORKDIR /src/spaceflights
RUN pip install -r requirements.txt
RUN kedro run and I confirm the docker build doesn't block:
Each platform needs to be understood separately.
|
Any update on this one ? :) |
Hi @m-gris! To clarify, are you running Kedro on an unattended setting and being blocked by this? How does your setup look like? The best workaround for now is to remove |
Hi @astrojuanlu |
To set expectations clear, #2867 is fine for a workaround but the default |
Getting back to this after discussing it a bit more. We decided to not go with #2867. In the future we want that telemetry prompt to disappear. This means that, whenever the user runs a Kedro command, we will assume a default answer, and we will also add more mechanisms for users to override that answer on top of the current However, we are still in the process of minimising the amount of data we collect to make this more palatable to users, so we are not quite there yet. We will give more details in due time. |
Progress towards opt-out consent is being tracked in other issues. Closing. |
Description
When any
kedro
command is executed for the first time or in a clean environment (which is often the case in CI/CD) the telemetry prompt gets run. The user then has to answer Yes or No to running telemetry. If no user is involved, e.g. in an automated CI/CD workflow a hack needs to be put in place to programatically add a.telemetry
file.Summary of action (Edited on 2023-07-31)
kedro-telemetry
to use theCI
environment variable to indicate that data comes from a CI environment kedro-plugins#483.telemetry
file (use this issue)kedro new
andkedro run
to skip telemetry (e.g. for CI/CD environment) #2867kedro new
command that detects if it's being run in CI and if so, whether consent has been given and if that's not the case then it will advice the user on how to do this.Possible Implementation
The above hack works, but isn't well known. An alternative way of solving this issue is by running the command in a form like:
yes | kedro new
. We need to document how people can accept/deny the telemetry tracking automatically.The text was updated successfully, but these errors were encountered: