Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change from DataSize to long to remove unnecessary data creation #24582

Merged
merged 8 commits into from
Mar 6, 2025

Conversation

shangm2
Copy link
Contributor

@shangm2 shangm2 commented Feb 18, 2025

Description

  1. This is the first step of a series of efforts to make the communication between coordinator and worker slimmer by using simple type instead of DataSize / Duration / DateTime.
  2. Facebook internal test will fail due to this change but the corresponding fix is in facebook repo.

Motivation and Context

  1. From profiling, we see the creation and garbage collection of certain objects can be expensive
Screenshot 2025-02-17 at 16 47 30
  1. These objects are designed to represent DataSize in a human readable way. However they are expensive to create and cause unnecessary allocation.

Impact

Test Plan

  1. passed verifier test: https://www.internalfb.com/intern/presto/verifier/results/?test_id=212504
  2. passed cpp worker verifier: https://our.internmc.facebook.com/intern/presto/verifier/results/?test_id=212506
  3. make sure UI does not break
Screenshot 2025-03-04 at 12 02 54

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
== RELEASE NOTES ==

General Changes
* Improve scheduling by using long instead of DataSize for critical path.

@shangm2 shangm2 requested a review from a team as a code owner February 18, 2025 00:57
@shangm2 shangm2 requested a review from jaystarshot February 18, 2025 00:57
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Feb 18, 2025
@shangm2 shangm2 force-pushed the convert_datasize branch 3 times, most recently from 287711e to c35f437 Compare February 18, 2025 06:39
@steveburnett
Copy link
Contributor

Thanks for the release note! Nit suggestion rephrasing to consider, to follow the Order of changes in the Release Notes Guidelines:

== RELEASE NOTES ==

General Changes
* Improve scheduling by using long instead of DataSize for critical path.

@shangm2 shangm2 force-pushed the convert_datasize branch 3 times, most recently from a6c23a2 to 4495d51 Compare February 19, 2025 03:27
@ZacBlanco
Copy link
Contributor

ZacBlanco commented Feb 19, 2025

I have a concern that this will break compatibility with some of the existing tooling around Presto. A good number of JSON blobs get exposed to downstream tools through the query event listener framework. Is there any plan to maintain backwards compatibility or help migrate tools that will break due to this change?

@shangm2
Copy link
Contributor Author

shangm2 commented Feb 19, 2025

I have a concern that this will break compatibility with some of the existing tooling around Presto. A good number of JSON blobs get exposed to downstream tools through the query event listener framework. Is there any plan to maintain backwards compatibility or help migrate tools that will break due to this change?

Hey @ZacBlanco we certainly dont want to break anything. Could you point me to those downstream tools so I can talk to the team and come up with a plan? Thanks a lot.

@ZacBlanco
Copy link
Contributor

At first I was concerned about the event listener, however it seems to have a separate API for the objects that are returned. So that should be good. We do have a few closed-source tools which are used internally to grab information from the coordinator REST API. I think as long as we don't affect the output format of the JSONs at /v1/query/{queryId} and EXPLAIN ANALYZE queries it should be fine as that's what most of our tools use. I see that in BasicQueryStats you've already made sure that the fields serialize to DataSize when converted to JSON. After taking a closer look I think this PR is ok in the current state

@shangm2 shangm2 force-pushed the convert_datasize branch 5 times, most recently from 14e3fa2 to d35269b Compare February 25, 2025 03:33
@shangm2 shangm2 requested a review from a team as a code owner February 25, 2025 03:33
@shangm2 shangm2 force-pushed the convert_datasize branch 5 times, most recently from 31729f8 to 974fbc6 Compare February 25, 2025 05:48
@ZacBlanco
Copy link
Contributor

Sorry I have been preoccupied with some other things. I will try to review again later today or tomorrow

@facebook-github-bot
Copy link
Collaborator

@shangm2 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@shangm2 shangm2 force-pushed the convert_datasize branch 2 times, most recently from 3952248 to a324a2e Compare March 4, 2025 21:12
@shangm2 shangm2 force-pushed the convert_datasize branch from a324a2e to 5968a34 Compare March 4, 2025 21:19
Copy link
Member

@yhwang yhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the UI part

@facebook-github-bot
Copy link
Collaborator

@shangm2 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for all the updates @shangm2 . If we do end up changing public endpoints (see my comment), I would request that we also make sure to update the existing openapi spec under the presto-openapi module.

I think this change should be a topic to bring up at the upcoming TSC to see how folks feel about it and whether they are OK breaking the REST API compatibility. I will defer to the community to make the final decision, but I would err on the side that we should not break the API. If you can present some compelling performance results at the TSC meeting, it might also help convince the community that this change is worth breaking API compatibility.

How does that sound to you?

this.outputPositions = outputPositions;

this.physicalWrittenDataSize = requireNonNull(physicalWrittenDataSize, "writtenDataSize is null");
checkArgument(physicalWrittenDataSizeInBytes >= 0, "writtenDataSizeInBytes is negative");
Copy link
Contributor

@ZacBlanco ZacBlanco Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be worth adding a method in presto-common or similar called checkDataSize with the following implementation:

public long checkDataSize(long size, String name) {
    checkArgument(size >= 0, "%s is negative", name);
    return size;
}

@NikhilCollooru
Copy link
Contributor

Merging this, since we got a go ahead regarding the UI changes.

@NikhilCollooru NikhilCollooru dismissed ZacBlanco’s stale review March 6, 2025 00:13

Shang made the suggested changes and removed changes from spi .

Copy link
Contributor

@amitkdutta amitkdutta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@NikhilCollooru NikhilCollooru merged commit 6d23641 into prestodb:master Mar 6, 2025
61 of 62 checks passed
@NikhilCollooru
Copy link
Contributor

sorry ! I missed @ZacBlanco latest comment. We will address the comments in the next PR

@NikhilCollooru
Copy link
Contributor

NikhilCollooru commented Mar 6, 2025

I think this change should be a topic to bring up at the upcoming TSC to see how folks feel about it and whether they are OK breaking the REST API compatibility. I will defer to the community to make the final decision, but I would err on the side that we should not break the API. If you can present some compelling performance results at the TSC meeting, it might also help convince the community that this change is worth breaking API compatibility.

We agree, we will make sure to bring this up in the next TSC. Thanks for the review. We will also discuss internally regarding this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:Meta PR from Meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants