feat: add support for explain analyze #3484

wkalt · 2025-02-27T04:15:46Z

This adds runtime execution metrics to all of our exec nodes. These metrics can be accessed by calling plan.analyze_plan().

github-actions · 2025-02-27T04:16:06Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

wkalt · 2025-02-27T04:25:59Z

Here is an example of current output:

AnalyzeExec verbose=true, metrics=[]
  ProjectionExec: expr=[author@7 as author, author_flair_css_class@8 as author_flair_css_class, author_flair_text@9 as author_flair_text, body@0 as body, can_gild@10 as can_gild, controversiality@1 as controversiality, created_utc@2 as created_utc, distinguished@11 as distinguished, gilded@3 as gilded, id@12 as id, is_submitter@13 as is_submitter, link_id@14 as link_id, parent_id@15 as parent_id, permalink@16 as permalink, retrieved_on@4 as retrieved_on, score@5 as score, stickied@17 as stickied, subreddit@18 as subreddit, subreddit_id@19 as subreddit_id, subreddit_type@20 as subreddit_type], metrics=[output_rows=48, elapsed_compute=24.256µs]
    Take: columns="body, controversiality, created_utc, gilded, retrieved_on, score, _rowid, (author), (author_flair_css_class), (author_flair_text), (can_gild), (distinguished), (id), (is_submitter), (link_id), (parent_id), (permalink), (stickied), (subreddit), (subreddit_id), (subreddit_type)", metrics=[output_rows=48, elapsed_compute=26.6µs]
      CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=48, elapsed_compute=350.816µs]
        FilterExec: contains(body@0, foobar), metrics=[output_rows=48, elapsed_compute=11.180169927s]
          LanceScan: uri=home/wyatt/work/lance/python/reddit.lance/data, projection=[body, controversiality, created_utc, gilded, retrieved_on, score], row_id=true, row_addr=false, ordered=true, metrics=[output_rows=10000000, elapsed_compute=316.0889ms]

I think this gets us close but not quite there. I'm going to look at replacing the builtin "BaselineMetrics" with our own thing that captures some additional detail. My goal is to replicate what's available in postgres if possible:

time to first record (in milliseconds, not ISO date)
time to last record (i.e elapsed_compute)
avg row byte width
number of rows emitted

I'll keep plugging on this but I think current state is ready for review. I need to add a test as well.

One high-level question:
Does the API seem right? Should analyze_plan be a separate method from explain_plan? In rust, seems like we have no option to extend explain_plan with an optional "analyze" param, but in python we can do this. Would that be better? IMO if we were considering python in isolation the answer would be yes, but since we care about both python and rust it may be better to keep the APIs symmetric.

wkalt · 2025-02-27T04:37:37Z

Cache hit/miss rates would be awesome to incorporate here too. I don't know that I will get to that.

codecov-commenter · 2025-02-27T05:21:11Z

Codecov Report

Attention: Patch coverage is 71.63121% with 40 lines in your changes missing coverage. Please review.

Project coverage is 78.45%. Comparing base (9e614b1) to head (484df18).

Files with missing lines	Patch %	Lines
rust/lance/src/io/exec/fts.rs	18.18%	9 Missing ⚠️
rust/lance/src/io/exec/knn.rs	73.07%	7 Missing ⚠️
rust/lance/src/dataset/scanner.rs	0.00%	6 Missing ⚠️
rust/lance/src/io/exec/scalar_index.rs	70.00%	6 Missing ⚠️
rust/lance/src/io/exec/pushdown_scan.rs	62.50%	3 Missing ⚠️
rust/lance/src/io/exec/rowids.rs	70.00%	3 Missing ⚠️
rust/lance/src/io/exec/scan.rs	90.00%	3 Missing ⚠️
rust/lance/src/io/exec/take.rs	66.66%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3484      +/-   ##
==========================================
- Coverage   78.46%   78.45%   -0.02%     
==========================================
  Files         252      252              
  Lines       93800    93917     +117     
  Branches    93800    93917     +117     
==========================================
+ Hits        73604    73679      +75     
- Misses      17201    17245      +44     
+ Partials     2995     2993       -2

Flag	Coverage Δ
unittests	`78.45% <71.63%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

westonpace

I like the framework and start.

I like the API and actually I kind of prefer two separate methods. In my mind they are quite different because analyze is going to actually execute the query where explain does not so I'm not sure how I feel about distinguishing with a flag.

Glad to see num_rows. 11s for filter compute is shocking but that does kind of match my gut of what I've been seeing in some contains queries.

Just a few nits about error handling. I'd prefer to make sure we are passing through child errors into parent error messages anytime we don't know exactly what the inner error is.

E.g. if the inner error is "table x did not exist" then it's fine to remap to "grabbing table x from database y did not exist" and drop the inner error. However, when doing a generic mapping to translate from one error type to another we generally have to assume the inner error is the "root cause" and that needs to be included somehow.

These are minor things though so marking approve and I trust your judgement with what you want to do here.

westonpace · 2025-02-28T14:45:51Z

python/python/lance/dataset.py

+        Parameters
+        ----------
+        verbose : bool, default False
+            Use a verbose output format.


I wonder what the difference even is between non-verbose and verbose explain analyze and if it's worth giving users the choice? I know for explain_plan I have always passed verbose=True and have never felt it is too verbose.

the outputs look identical to me. Maybe it's some feature we're not using. I'll remove the option.

westonpace · 2025-02-28T14:47:40Z

rust/lance/src/dataset/scanner.rs

+        if let Ok(mut stream) = analyze.execute(0, ctx) {
+            while (stream.next().await).is_some() {}
+        } else {
+            return Err(Error::Execution {
+                message: "Failed to execute analyze plan".to_string(),
+                location: location!(),
+            });
+        }


Why not use map_err and include the error message from analyze.execute?

E.g. format!(Failed to execute analyze plan: {}, err)`

westonpace · 2025-02-28T14:50:33Z

rust/lance/src/io/exec/utils.rs

+            std::task::Poll::Ready(Some(res)) => std::task::Poll::Ready(Some(
+                res.map_err(|e| DataFusionError::External(e.to_string().into())),
+            )),


Why do we need map_err here? Isn't the error already a DataFusionError?

In fact, do we even need this match statement at all? Can we just do let poll = this.stream.poll_next(cx);?

This adds runtime execution metrics to all of our exec nodes. These metrics can be accessed by calling plan.analyze_plan().

wkalt requested review from westonpace, wjones127 and BubbleCal February 27, 2025 04:15

github-actions bot added the python label Feb 27, 2025

wkalt force-pushed the wkalt/explain-analyze branch from fde124f to d50c378 Compare February 27, 2025 04:16

wkalt changed the title ~~Add support for explain analyze~~ feat: Add support for explain analyze Feb 27, 2025

github-actions bot added the enhancement New feature or request label Feb 27, 2025

wkalt changed the title ~~feat: Add support for explain analyze~~ feat: add support for explain analyze Feb 27, 2025

westonpace approved these changes Feb 28, 2025

View reviewed changes

wkalt added 5 commits February 28, 2025 14:08

Add support for explain analyze

80a297d

This adds runtime execution metrics to all of our exec nodes. These metrics can be accessed by calling plan.analyze_plan().

Fix clippy warnings

b70a2b0

Rename variable

6a1255a

Add python tests

d77c758

Address PR feedback

a1d1abf

wkalt force-pushed the wkalt/explain-analyze branch from c488bd6 to a1d1abf Compare February 28, 2025 22:09

Format code

484df18

wkalt merged commit 949c6e7 into main Feb 28, 2025
26 of 30 checks passed

wkalt deleted the wkalt/explain-analyze branch February 28, 2025 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for explain analyze #3484

feat: add support for explain analyze #3484

wkalt commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

wkalt commented Feb 27, 2025

wkalt commented Feb 27, 2025

codecov-commenter commented Feb 27, 2025 •

edited

Loading

westonpace left a comment

westonpace Feb 28, 2025

wkalt Feb 28, 2025

westonpace Feb 28, 2025

westonpace Feb 28, 2025

feat: add support for explain analyze #3484

feat: add support for explain analyze #3484

Conversation

wkalt commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

wkalt commented Feb 27, 2025

wkalt commented Feb 27, 2025

codecov-commenter commented Feb 27, 2025 • edited Loading

Codecov Report

westonpace left a comment

Choose a reason for hiding this comment

westonpace Feb 28, 2025

Choose a reason for hiding this comment

wkalt Feb 28, 2025

Choose a reason for hiding this comment

westonpace Feb 28, 2025

Choose a reason for hiding this comment

westonpace Feb 28, 2025

Choose a reason for hiding this comment

codecov-commenter commented Feb 27, 2025 •

edited

Loading