Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add scarf analytics #3773

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
e33db2a
add scarf analytics, add social links in footer
ccmao1130 Feb 6, 2025
e130c17
fix pixel snippet
ccmao1130 Feb 6, 2025
b0a0829
fix pixel snippet
ccmao1130 Feb 6, 2025
4231549
add pixel to copyright
ccmao1130 Feb 6, 2025
9a53281
update pytorch link
ccmao1130 Feb 6, 2025
e341252
footer socials
ccmao1130 Feb 6, 2025
4d07625
add pixel to readme
ccmao1130 Feb 6, 2025
eb3534d
add pixel to sphinx
ccmao1130 Feb 6, 2025
96f2156
add pixel to sphinx, add feedback widget
ccmao1130 Feb 6, 2025
a3f6175
add site author
ccmao1130 Feb 6, 2025
9fbf950
style fix
ccmao1130 Feb 6, 2025
dc58d63
daft repo in footer
ccmao1130 Feb 6, 2025
9f8b043
daft linkedin in footer
ccmao1130 Feb 6, 2025
3cece7c
adding scarf custom telemetry wip
ccmao1130 Feb 7, 2025
eeb508e
adding scarf custom telemetry wip
ccmao1130 Feb 7, 2025
a8cf8e8
connected scarf custom telemetry
ccmao1130 Feb 7, 2025
942d9f4
add opt-out info
ccmao1130 Feb 7, 2025
567a47e
removed requests
ccmao1130 Feb 7, 2025
baa6e62
Merge branch 'main' of https://github.com/Eventual-Inc/Daft into docs…
ccmao1130 Feb 7, 2025
20ac4da
minor fix
ccmao1130 Feb 7, 2025
bd17388
update tests for scarf
ccmao1130 Feb 7, 2025
e3d6df0
style fix
ccmao1130 Feb 7, 2025
f3e7ef5
update broken links
ccmao1130 Feb 7, 2025
4b3f7a6
add tests for py, ray, native runner
ccmao1130 Feb 7, 2025
eea28b9
style fix
ccmao1130 Feb 8, 2025
55e97a3
update scarf telemetry tests
ccmao1130 Feb 8, 2025
59c5f6c
Merge branch 'main' of https://github.com/Eventual-Inc/Daft into docs…
ccmao1130 Feb 8, 2025
e8af732
small fix
ccmao1130 Feb 8, 2025
827099c
style fix
ccmao1130 Feb 8, 2025
2840dc8
add version back
ccmao1130 Feb 8, 2025
1d9c91a
Merge branch 'main' into docs-scarf
ccmao1130 Feb 10, 2025
2c4600d
add social preview)
ccmao1130 Feb 10, 2025
095cfb2
updated scarf analytics to collect per runner, updated tests
ccmao1130 Feb 11, 2025
b934d8c
style fix
ccmao1130 Feb 11, 2025
f58f028
style fix
ccmao1130 Feb 11, 2025
f9c1ce2
Merge branch 'main' of https://github.com/Eventual-Inc/Daft into docs…
ccmao1130 Feb 12, 2025
86b3377
fixes per cory's suggestions
ccmao1130 Feb 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,15 +95,17 @@ Here's a list of `good first issues <https://github.com/Eventual-Inc/Daft/issues
Telemetry
---------

To help improve Daft, we collect non-identifiable data.
To help improve Daft, we collect non-identifiable data via our own analytics as well as Scarf (https://scarf.sh).

To disable this behavior, set the following environment variable: ``DAFT_ANALYTICS_ENABLED=0``
To disable this behavior, set the following environment variables:
- ``DAFT_ANALYTICS_ENABLED=0``
- ``SCARF_NO_ANALYTICS=true`` or ``DO_NOT_TRACK=true``

The data that we collect is:

1. **Non-identifiable:** events are keyed by a session ID which is generated on import of Daft
2. **Metadata-only:** we do not collect any of our users’ proprietary code or data
3. **For development only:** we do not buy or sell any user data
1. **Non-identifiable:** Events are keyed by a session ID which is generated on import of Daft
2. **Metadata-only:** We do not collect any of our users’ proprietary code or data
3. **For development only:** We do not buy or sell any user data

Please see our `documentation <https://www.getdaft.io/projects/docs/en/stable/resources/telemetry/>`_ for more details.

Expand Down
1 change: 0 additions & 1 deletion daft/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ def refresh_logger() -> None:

__version__ = get_version()


###
# Initialize analytics
###
Expand Down
17 changes: 17 additions & 0 deletions daft/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
import contextlib
import dataclasses
import logging
import os
from typing import TYPE_CHECKING, ClassVar

from daft.daft import IOConfig, PyDaftContext, PyDaftExecutionConfig, PyDaftPlanningConfig
from daft.daft import get_context as _get_context
from daft.daft import set_runner_native as _set_runner_native
from daft.daft import set_runner_py as _set_runner_py
from daft.daft import set_runner_ray as _set_runner_ray
from daft.scarf_telemetry import scarf_telemetry

if TYPE_CHECKING:
from daft.runners.runner import Runner
Expand Down Expand Up @@ -67,12 +69,17 @@ def set_runner_ray(
max_task_backlog: int | None = None,
force_client_mode: bool = False,
) -> DaftContext:
# Scarf Analytics
scarf_opt_out = os.getenv("SCARF_NO_ANALYTICS") == "true" or os.getenv("DO_NOT_TRACK") == "true"
scarf_telemetry(scarf_opt_out, runner="ray")

py_ctx = _set_runner_ray(
address=address,
noop_if_initialized=noop_if_initialized,
max_task_backlog=max_task_backlog,
force_client_mode=force_client_mode,
)

return DaftContext._from_native(py_ctx)


Expand All @@ -84,9 +91,14 @@ def set_runner_py(use_thread_pool: bool | None = None) -> DaftContext:
Returns:
DaftContext: Daft context after setting the Py runner
"""
# Scarf Analytics
scarf_opt_out = os.getenv("SCARF_NO_ANALYTICS") == "true" or os.getenv("DO_NOT_TRACK") == "true"
scarf_telemetry(scarf_opt_out, runner="py")

py_ctx = _set_runner_py(
use_thread_pool=use_thread_pool,
)

return DaftContext._from_native(py_ctx)


Expand All @@ -98,7 +110,12 @@ def set_runner_native() -> DaftContext:
Returns:
DaftContext: Daft context after setting the native runner
"""
# Scarf Analytics
scarf_opt_out = os.getenv("SCARF_NO_ANALYTICS") == "true" or os.getenv("DO_NOT_TRACK") == "true"
scarf_telemetry(scarf_opt_out, runner="native")

py_ctx = _set_runner_native()

return DaftContext._from_native(py_ctx)


Expand Down
92 changes: 92 additions & 0 deletions daft/scarf_telemetry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import platform
import urllib.parse
import urllib.request
from typing import Union

from daft import get_build_type, get_version


def scarf_telemetry(scarf_opt_out: bool, runner: str) -> tuple[Union[str, None], Union[str, None]]:
ccmao1130 marked this conversation as resolved.
Show resolved Hide resolved
ccmao1130 marked this conversation as resolved.
Show resolved Hide resolved
"""Track analytics for Daft usage via Scarf.
Args:
user_opted_out (bool): Whether the user has opted out of analytics
runner (str): The runner being used (py, ray, or native)
Returns:
tuple[str | None, str | None]: Response status and runner type, or (None, None) if analytics disabled/failed
"""
version = get_version()
build_type = get_build_type()

try:
# Skip analytics for dev builds or if user opted out
if build_type == "dev" or scarf_opt_out:
return None, None
ccmao1130 marked this conversation as resolved.
Show resolved Hide resolved

python_version = ".".join(platform.python_version().split(".")[:2])

params = {
"version": version,
"platform": platform.system(),
"python": python_version,
"arch": platform.machine(),
"runner": runner,
}

# Prepare the query string
query_string = urllib.parse.urlencode(params)

# Make the GET request
url = f"https://daft.gateway.scarf.sh/daft-runner?{query_string}"
with urllib.request.urlopen(url) as response:
return f"Response status: {response.status}", runner

except Exception as e:
return f"Analytics error: {e!s}", None

return None, None

Check warning on line 48 in daft/scarf_telemetry.py

View check run for this annotation

Codecov / codecov/patch

daft/scarf_telemetry.py#L48

Added line #L48 was not covered by tests


# def scarf_analytics(
# scarf_opt_out: bool, build_type: str, version: str, runner: str
# ) -> tuple[Union[str, None], Union[str, None]]:
# """Track analytics for Daft usage via Scarf.

# Args:
# user_opted_out (bool): Whether the user has opted out of analytics
# build_type (str): The build type from get_build_type()
# version (str): The version from get_version()
# runner (str): The runner being used (py, ray, or native)

# Returns:
# tuple[str | None, str | None]: Response status and runner type, or (None, None) if analytics disabled/failed
# """
# try:
# # Skip analytics for dev builds or if user opted out
# if build_type == "dev" or scarf_opt_out:
# return None, None

# if os.getenv("SCARF_NO_ANALYTICS") != "true" and os.getenv("DO_NOT_TRACK") != "true":
# python_version = ".".join(platform.python_version().split(".")[:2])

# params = {
# "version": version,
# "platform": platform.system(),
# "python": python_version,
# "arch": platform.machine(),
# "runner": runner,
# }

# # Prepare the query string
# query_string = urllib.parse.urlencode(params)

# # Make the GET request
# url = f"https://daft.gateway.scarf.sh/daft-runner?{query_string}"
# with urllib.request.urlopen(url) as response:
# return f"Response status: {response.status}", runner

# except Exception as e:
# return f"Analytics error: {e!s}", None

# return None, None
Comment on lines +50 to +91
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

35 changes: 27 additions & 8 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,17 @@

# Project Information
site_name: Daft Documentation
site_author: Eventual
site_url: https://www.getdaft.io/projects/docs/en/stable/
site_description: >-
site_description: |
Welcome to Daft Documentation! Daft is a unified data engine for data engineering, analytics, and ML/AI.
copyright: '&copy; Copyright 2025, Eventual <img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=2293a436-7808-4c74-9bf3-d3e86e4eed91" />'

# Repository
repo_name: Daft
repo_url: https://github.com/Eventual-Inc/Daft
docs_dir: mkdocs

# Scarf pixel for tracking analytics
# image:
# referrerpolicy: "no-referrer-when-downgrade"
# src: "https://static.scarf.sh/a.png?x-pxid=c9065f3a-a090-4243-8f69-145d5de7bfca"

# Sitemap
nav:
- Daft User Guide:
Expand Down Expand Up @@ -97,15 +94,36 @@ theme:

# Additional Configuration
extra:
analytics:
provider: google
property: G-YN4QSRPV0K
feedback:
title: Was this page helpful?
ratings:
- icon: material/emoticon-happy-outline
name: This page was helpful
data: 1
note: >-
Thanks for your feedback!
- icon: material/emoticon-sad-outline
name: This page could be improved
data: 0
note: >-
Thanks for your feedback! Help us improve this page by
<a href="https://github.com/Eventual-Inc/Daft/issues" target="_blank" rel="noopener">submitting an issue</a> on our Daft repo.
social:
- icon: fontawesome/brands/github
link: https://github.com/squidfunk
link: https://github.com/Eventual-Inc/Daft
- icon: fontawesome/brands/slack
link: https://join.slack.com/t/dist-data/shared_invite/zt-2e77olvxw-uyZcPPV1SRchhi8ah6ZCtg
- icon: fontawesome/brands/linkedin
link: https://www.linkedin.com/company/eventualcomputing/
link: https://www.linkedin.com/showcase/daft-dataframe/
- icon: fontawesome/brands/x-twitter
link: https://x.com/daft_dataframe
- icon: fontawesome/brands/youtube
link: https://www.youtube.com/@daftdf
- icon: simple/substack
link: https://blog.getdaft.io/

# This is a macro you should use to refer to paths
# When referring to methods, the syntax is {{ api_path }}/path/to/method
Expand Down Expand Up @@ -148,3 +166,4 @@ plugins:
- mkdocs-simple-hooks:
hooks:
on_pre_build: "docs.hooks:make_api_docs"
- social
2 changes: 1 addition & 1 deletion docs/mkdocs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Daft is a unified data engine for **data engineering, analytics, and ML/AI**. It
Daft boasts strong integrations with technologies common across these workloads:

* **Cloud Object Storage:** Record-setting I/O performance for integrations with S3 cloud storage, [battle-tested at exabyte-scale at Amazon](https://aws.amazon.com/blogs/opensource/amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-amazon-ec2/)
* **ML/AI Python Ecosystem:** First-class integrations with [PyTorch](https://pytorch.org/>) and [NumPy](https://numpy.org/) for efficient interoperability with your ML/AI stack
* **ML/AI Python Ecosystem:** First-class integrations with [PyTorch](https://pytorch.org/) and [NumPy](https://numpy.org/) for efficient interoperability with your ML/AI stack
* **Data Catalogs/Table Formats:** Capabilities to effectively query table formats such as [Apache Iceberg](https://iceberg.apache.org/), [Delta Lake](https://delta.io/) and [Apache Hudi](https://hudi.apache.org/)
* **Seamless Data Interchange:** Zero-copy integration with [Apache Arrow](https://arrow.apache.org/docs/index.html)
* **Multimodal/ML Data:** Native functionality for data modalities such as tensors, images, URLs, long-form text and embeddings
Expand Down
2 changes: 1 addition & 1 deletion docs/mkdocs/integrations/delta_lake.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ Here are Delta Lake features that are on our roadmap. Please let us know if you

1. Read support for [deletion vectors](https://docs.delta.io/latest/delta-deletion-vectors.html) ([issue](https://github.com/Eventual-Inc/Daft/issues/1954)).

2. Read support for [column mappings](https://docs.delta.io/latest/delta-column-mapping.html>) ([issue](https://github.com/Eventual-Inc/Daft/issues/1955)).
2. Read support for [column mappings](https://docs.delta.io/latest/delta-column-mapping.html) ([issue](https://github.com/Eventual-Inc/Daft/issues/1955)).

3. Writing new Delta Lake tables ([issue](https://github.com/Eventual-Inc/Daft/issues/1967)).

Expand Down
15 changes: 10 additions & 5 deletions docs/mkdocs/resources/telemetry.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,25 @@
# Telemetry

To help core developers improve Daft, we collect non-identifiable statistics on Daft usage in order to better understand how Daft is used, common bugs and performance bottlenecks.
To help core developers improve Daft, we collect non-identifiable statistics on Daft usage in order to better understand how Daft is used, common bugs and performance bottlenecks. Data is collected from a combination of our own analytics and [Scarf](https://scarf.sh).

We take the privacy of our users extremely seriously, and telemetry in Daft is built to be:

1. Easy to opt-out: to disable telemetry, set the following environment variable: `DAFT_ANALYTICS_ENABLED=0`
2. Non-identifiable: events are keyed by a session ID which is generated on import of Daft
3. Metadata-only: we do not collect any of our users' proprietary code or data
1. Easy to opt-out: To disable telemetry, set the following environment variables:

`DAFT_ANALYTICS_ENABLED=0`

`SCARF_NO_ANALYTICS=true` or `DO_NOT_TRACK=true`

2. Non-identifiable: Events are keyed by a session ID which is generated on import of Daft
3. Metadata-only: We do not collect any of our users' proprietary code or data

We **do not** sell or buy any of the data that is collected in telemetry.

!!! info "*Daft telemetry is enabled in versions >= v0.0.21*"

## What data do we collect?

To audit what data is collected, please see the implementation of `AnalyticsClient` in the `daft.analytics` module.
To audit what data is collected, please see the implementation of `AnalyticsClient` in the `daft.analytics` module as well as `scarf_telemetry.py`.

In short, we collect the following:

Expand Down
1 change: 1 addition & 0 deletions docs/sphinx/source/_templates/sections/header.html
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
<div class="header-container">
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=6f9dd415-f331-4826-97a7-83e94dc0f89e" />
<a href="/"><img class="header-logo" src="{{ pathto('_static/daft-logo.png', 1) }}" alt="Daft logo" /></a>
<nav class="header-nav">
<ul class="header-nav-list">
Expand Down
4 changes: 2 additions & 2 deletions docs/sphinx/source/_templates/sections/mobile-menu.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@
</div>
<ul class="header-nav-list mobile">
<li class="header-nav-listitem mobile">
<a href="https://www.getdaft.io/#get-started">
<a href="https://www.getdaft.io/projects/docs/en/stable/quickstart/index.html">
Get Started
</a>
</li>
<li class="header-nav-listitem mobile">
<a href="../../../index.html">
<a href="https://www.getdaft.io/projects/docs/en/stable/index.html">
User Guide
</a>
</li>
Expand Down
Loading
Loading