Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opentelemetry-sdk: speed up exemplars a bit #4260

Merged
merged 5 commits into from
Dec 2, 2024

Conversation

xrmx
Copy link
Contributor

@xrmx xrmx commented Nov 8, 2024

Description

Use a sparse dict to allocate ExemplarsBucket on demand instead of preallocating all of them in FixedSizeExemplarReservoirABC.

Make the following return around 2X more loops for both trace_based and always_off exemplars filter:

.tox/benchmark-opentelemetry-sdk/bin/pytest opentelemetry-sdk/benchmarks/metrics/ -k 'test_histogram_record_1000[7]'

I get 8% less rounds than with a8aacb0c6f2f06bf19b501d98e62f7c0e667fa4c that is the commit before the introductions of exemplars. Without this we are doing 60% less rounds.

This also adds a new benchmark file that instead of recording random numbers pick the very same bucket bounds entries.

Refs #4243

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • .tox/benchmark-opentelemetry-sdk/bin/pytest opentelemetry-sdk/benchmarks/metrics/ -k 'test_histogram_record_1000[7]'

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@xrmx xrmx requested a review from a team as a code owner November 8, 2024 16:27
@xrmx xrmx added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Nov 8, 2024
@aabmass
Copy link
Member

aabmass commented Nov 8, 2024

Does the dict have any downside vs list in the steady state (measuring the same attribute set repeatedly)?

Copy link
Contributor

@fcollonval fcollonval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xrmx
This looks good to me.

@fcollonval
Copy link
Contributor

Does the dict have any downside vs list in the steady state (measuring the same attribute set repeatedly)?

Without any proof, I would say accessing a list item by index is faster than a dict value by key. So if there are lots of buckets this may be less optimized; but the number of buckets should never be very high.

@xrmx
Copy link
Contributor Author

xrmx commented Nov 11, 2024

Does the dict have any downside vs list in the steady state (measuring the same attribute set repeatedly)?

Without any proof, I would say accessing a list item by index is faster than a dict value by key. So if there are lots of buckets this may be less optimized; but the number of buckets should never be very high.

With small data sets a list may be faster because it's simpler than the dict but with more items the O(n) list access time would lose against dict O(1) access time.

@aabmass
Copy link
Member

aabmass commented Nov 11, 2024

but the number of buckets should never be very high.

If we don't already have a benchmark for taking a measurement with a pre-existing attribute set, it would be great to add. Maybe with the default buckets and one for an ExponentialHistogram, which defaults to maximum of 160 buckets

@fcollonval
Copy link
Contributor

One important note about the exponential histogram case is that the number of buckets for the exemplar has a lower upper bound for buckets than the histogram:

Base2 Exponential Histogram Aggregation SHOULD use a SimpleFixedSizeExemplarReservoir with a reservoir equal to the smaller of the maximum number of buckets configured on the aggregation or twenty (e.g. min(20, max_buckets)).
Xref: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exemplar-defaults

@fcollonval
Copy link
Contributor

the O(n) list access time would lose against dict O(1) access time.

Is it that complexity for search rather than for accessing? If you know the index like in this case, I doubt the complexity of the list is higher.

@xrmx
Copy link
Contributor Author

xrmx commented Nov 12, 2024

the O(n) list access time would lose against dict O(1) access time.

Is it that complexity for search rather than for accessing? If you know the index like in this case, I doubt the complexity of the list is higher.

Got nerdsniped in this and yes the getting a list item looks O(1) too:

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (!valid_index(i, Py_SIZE(op))) {
        _Py_DECLARE_STR(list_err, "list index out of range");
        PyErr_SetObject(PyExc_IndexError, &_Py_STR(list_err));
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];
}

So I guess the difference in performance comes just from the lazyness of the ExemplarBucket() instantiations.

@aabmass
Copy link
Member

aabmass commented Nov 14, 2024

If we don't already have a benchmark for taking a measurement with a pre-existing attribute set, it would be great to add.

@xrmx would you be open to adding a benchmark for this to make sure the dict doesn't make this much slower? That seems like the more common case to optimize for vs churning attributes.

@xrmx
Copy link
Contributor Author

xrmx commented Nov 18, 2024

If we don't already have a benchmark for taking a measurement with a pre-existing attribute set, it would be great to add.

@xrmx would you be open to adding a benchmark for this to make sure the dict doesn't make this much slower? That seems like the more common case to optimize for vs churning attributes.

Need to sort out how to write such benchmark 😅

@xrmx
Copy link
Contributor Author

xrmx commented Nov 26, 2024

@aabmass is something like this what you had in mind?

@pytest.mark.parametrize("num_labels", [0, 1, 3, 5, 7])
def test_histogram_record_1000(benchmark, num_labels):
    values = itertools.cycle(range(1, 10))
    def benchmark_histogram_record_1000():
        hist1000.record(next(values))

    benchmark(benchmark_histogram_record_1000)

Before

---------------------------------------------------------------------------------------- benchmark: 5 tests ----------------------------------------------------------------------------------------
Name (time in us)                    Min                Max              Mean            StdDev            Median               IQR             Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_histogram_record_1000[0]     2.9318 (1.15)     13.4856 (1.32)     3.1635 (1.11)     0.3693 (2.12)     3.1190 (1.11)     0.1350 (1.20)         22;69      316.1018 (0.90)       1022           1
test_histogram_record_1000[1]     2.5900 (1.01)     14.5938 (1.43)     2.8457 (1.00)     0.1850 (1.06)     2.8145 (1.00)     0.1146 (1.02)     3821;2833      351.4110 (1.00)      52373           1
test_histogram_record_1000[3]     2.5583 (1.00)     13.5032 (1.33)     2.8566 (1.01)     0.1873 (1.07)     2.8266 (1.01)     0.1201 (1.07)     3910;2536      350.0655 (0.99)      52902           1
test_histogram_record_1000[5]     2.5555 (1.0)      14.0648 (1.38)     2.8462 (1.00)     0.1965 (1.13)     2.8135 (1.00)     0.1127 (1.0)      3217;2935      351.3447 (1.00)      53338           1
test_histogram_record_1000[7]     2.5723 (1.01)     10.1784 (1.0)      2.8408 (1.0)      0.1742 (1.0)      2.8098 (1.0)      0.1136 (1.01)     4774;3252      352.0196 (1.0)       55710           1

After

---------------------------------------------------------------------------------------- benchmark: 5 tests ----------------------------------------------------------------------------------------
Name (time in us)                    Min                Max              Mean            StdDev            Median               IQR             Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_histogram_record_1000[0]     2.6487 (1.02)     13.6066 (1.0)      3.0223 (1.06)     0.3527 (1.82)     2.9970 (1.06)     0.2645 (2.56)        100;41      330.8757 (0.95)       2367           1
test_histogram_record_1000[1]     2.6105 (1.00)     14.9934 (1.10)     2.8952 (1.01)     0.2371 (1.22)     2.8554 (1.01)     0.1248 (1.21)     3357;3879      345.4039 (0.99)      65935           1
test_histogram_record_1000[3]     2.6114 (1.00)     17.8963 (1.32)     2.8626 (1.0)      0.1938 (1.0)      2.8303 (1.0)      0.1034 (1.0)      3534;3742      349.3333 (1.0)       53803           1
test_histogram_record_1000[5]     2.6058 (1.00)     24.4612 (1.80)     2.8805 (1.01)     0.2325 (1.20)     2.8433 (1.00)     0.1183 (1.14)     3105;3773      347.1662 (0.99)      55704           1
test_histogram_record_1000[7]     2.5993 (1.0)      18.7624 (1.38)     2.9343 (1.03)     0.2049 (1.06)     2.8927 (1.02)     0.1760 (1.70)     5919;1259      340.7929 (0.98)      59382           1

@aabmass
Copy link
Member

aabmass commented Nov 27, 2024

@aabmass is something like this what you had in mind?

Yes LGTM. If I'm reading those results right, it looks like pretty similar performance in the steady state?

Are you planning to push that new case to this PR? I think it would be good to keep.

@xrmx
Copy link
Contributor Author

xrmx commented Nov 27, 2024

@aabmass is something like this what you had in mind?

Yes LGTM. If I'm reading those results right, it looks like pretty similar performance in the steady state?

That's my understanding as well.

Are you planning to push that new case to this PR? I think it would be good to keep.

Yes, will add new benchmarks functions to the PR.

Copy link
Member

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving in advance 😃

@xrmx xrmx force-pushed the cheaper-exemplars-reservoir branch from c3cd760 to 8f02f78 Compare November 28, 2024 09:53
xrmx added 2 commits November 28, 2024 11:25
Use a sparse dict to allocate ExemplarsBucket on demand instead of
preallocating all of them in FixedSizeExemplarReservoirABC.

Make the following return around 2X more loops for both trace_based and
always_off exemplars filter:

.tox/benchmark-opentelemetry-sdk/bin/pytest opentelemetry-sdk/benchmarks/metrics/ -k 'test_histogram_record_1000[7]'
@xrmx xrmx force-pushed the cheaper-exemplars-reservoir branch from 8f02f78 to 0070080 Compare November 28, 2024 10:25
Copy link
Member

@emdneto emdneto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Do we need to changelog this?
Noticed that we don't have benchmark tests for exponential histograms, but we can leave for another PR.

@xrmx xrmx removed the Skip Changelog PRs that do not require a CHANGELOG.md entry label Nov 28, 2024
@xrmx xrmx enabled auto-merge (squash) December 2, 2024 08:46
@xrmx xrmx merged commit 72fad82 into open-telemetry:main Dec 2, 2024
354 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants