Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a daft dashboard to display queries plans and stats #3790

Open
wants to merge 138 commits into
base: main
Choose a base branch
from

Conversation

raunakab
Copy link
Contributor

@raunakab raunakab commented Feb 11, 2025

Overview

This PR introduces a new (statically built and served) dashboard web-application UI.

We have a simple server that binds to port 3238 and does 3 things:

  • listens to broadcasts from the daft query engine
  • serves an API endpoint
  • serves static HTML/CSS/JS files and assets

The static HTML files which are served can then be accessed by pointing your browser towards http::localhost:3238.

Usage

from daft import dashboard

dashboard.launch()

# rest of your daft queries here

Notes

The server process is launched off as an orphaned process. It will live on even after the script that initialized it has completed.

You can kill this process (only manually as of now) by running kill -9 $(lsof -t -i :3238).

Note to reviewers

Although this PR claims to be a big diff, in reality, it's only big because of a lot of generated React UI components (inside of src/daft-dashboard-client).

For backend reviewers, please review src/daft-dashboard-server and daft. For frontend reviewers, please review src/daft-dashboard-client.

Generated files

Please ignore all the generated HTML/CSS/JS files inside of daft/static_dashboard_assets. They are automatically generated by bun run build during prior to release.

These files have to be checked into the repo since they need to be released alongside daft wheels.

@github-actions github-actions bot added the feat label Feb 11, 2025
Copy link

codspeed-hq bot commented Feb 11, 2025

CodSpeed Performance Report

Merging #3790 will improve performances by 11.82%

Comparing dashboard (e63cea3) with main (ca36593)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
test_iter_rows_first_row[100 Small Files] 167 ms 149.3 ms +11.82%

- this is so that from Python, we can serve the embedded static
  dashboard assets from any virtual env
@raunakab raunakab marked this pull request as ready for review February 11, 2025 23:27
@raunakab raunakab requested a review from jaychia February 11, 2025 23:27
Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a high level:

  1. Why is everything in src/? Previously we had stuff in dashboard/ which made sense because the server and typescript code doesn't need to live inside of Daft itself.

  2. How much does this increase the binary size of the Daft wheel by, since we're packaging all this stuff now with the Daft wheel I'm assuming? Do we need to add a new getdaft[dashboard] perhaps?

I think to merge this we should think a little harder about the packaging story. Seems like things are all shoved into the main Daft binary today which isn't ideal. Can we lay out how things are packaged in the PR description?

@@ -144,11 +144,12 @@ dependencies = [

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Cargo lock changing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The src/daft-dashboard-server uses hyper and some other http-frameworks to:

  1. Serve the daft broadcast listener
  2. Serve the dashboard API
  3. Serve the dashboard web server

Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
daft/daft/__init__.pyi Outdated Show resolved Hide resolved
@@ -158,6 +160,45 @@ def _result(self) -> Optional[PartitionSet]:
else:
return self._result_cache.value

def _explain_broadcast(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like dashboard-specific code. Let's define this outside of dataframe.py if possible, likely some kind of Client exposed by the dashboard module I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can update it to live inside of the daft/dashboard.py file.

src/daft-dashboard-client/README.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants