Introduces Profiling Data Model v2 #239

petethepig · 2023-11-02T05:35:09Z

This is second version of the Profiling Data Model OTEP. After we've gotten feedback from the greater OTel community we went back to the drawing board and came up with a new version of the data model. The main difference between the two versions is that the new version is more similar to the original pprof format, which makes it easier to understand and implement. It also has better performance characteristics. We've also incorporated a lot of the feedback we've gotten on the first PR into this OTEP.

Some minor details about the data model are still being discussed and will be flushed out in the future OTEPs. We intend to finalize these details after doing experiments with early versions of working client + collector + backend implementations and getting feedback from the community. The goal of this OTEP is to provide a solid foundation for these experiments.

So far we've done a number of things to validate it:

we've written a new profiles proto described in this OTEP
we've documented decisions made along the way in a decision log
we've done benchmarking to refine the data representation (see Benchmarking section in a collector PR)
diff between original pprof and the new proto: link

We're seeking feedback and hoping to get this approved.

For (a lot) more details, see:

OTel Profiling SIG Meeting Notes

mtwo · 2023-11-20T17:20:14Z

@carlosalberto @jack-berg can we mark this as triaged with priority p0?

tigrannajaryan · 2023-11-20T19:00:06Z

The main difference between the two versions is that the new version is more similar to the original pprof format

@petethepig More similar, but not fully backwards compatible with pprof?

(I haven't had a chance to review the OTEP yet, just wanted to understand what to expect before I started revewing).

[UPDATE]: I have now reviewed the PR, which mostly answers this question. I still have a follow up comment/question on pprof compatiblity, see below.

tigrannajaryan

How much difference does AnyValue/KeyValue interning make? I would prefer not to have a different type just for profiles. If it is a significant difference we may want to consider adding interning support to existing AnyValue/KeyValue.

text/profiles/0239-profiles-data-model.md

aalexand · 2023-11-28T05:39:29Z

How much difference does AnyValue/KeyValue interning make? I would prefer not to have a different type just for profiles. If it is a significant difference we may want to consider adding interning support to existing AnyValue/KeyValue.

One thing is interning, another thing that pprof labels have is unit support. E.g. int64 bytes can be distinguished from int64 microseconds or count of something. This is used by pprof for using appropriate unit suffixes and scaling the label values appropriately.

petethepig · 2023-11-30T02:23:27Z

@tigrannajaryan

How much difference does AnyValue/KeyValue interning make? I would prefer not to have a different type just for profiles. If it is a significant difference we may want to consider adding interning support to existing AnyValue/KeyValue.

We did some benchmarking for that. Here are the results (it's also in this spreadsheet, "Attribute Represenations Summary" sheet):

And here's the difference between the 3 implementations:

// Extended
message Sample {
  repeated opentelemetry.proto.common.v1.KeyValue attributes = 9;
}


// ExtendedInterned
message Sample {
  repeated opentelemetry.proto.common.v1.KeyValueInterned attributes = 6;
}


// ExtendedLookup
message Profile {
  // lookup table
  repeated opentelemetry.proto.common.v1.KeyValue attributes = 9;
}

message Sample {
  repeated uint64 attributes = 10;
}

To be clear, in the OTEP we're using ExtendedLookup represenation. It does not do the same kind of interning as it's done in original pprof (that would be ExtendedInterned).

ExtendedInterned uses a string_table for string values and ExtendedLookup uses an attributes lookup table. Extended uses a schema that's more common for other OTEL signals where the attributes are embedded inside other messages (Sample in our case).

text/profiles/0239-profiles-data-model.md

aalexand · 2024-01-11T15:30:20Z

How much difference does AnyValue/KeyValue interning make? I would prefer not to have a different type just for profiles. If it is a significant difference we may want to consider adding interning support to existing AnyValue/KeyValue.

One thing is interning, another thing that pprof labels have is unit support. E.g. int64 bytes can be distinguished from int64 microseconds or count of something. This is used by pprof for using appropriate unit suffixes and scaling the label values appropriately.

We still need units for labels I think.

tigrannajaryan · 2024-01-11T20:03:35Z

If label/attribute units are critical I suggest that we start this discussion early.

I looked at other signal types and there is a few attribute conventions that would benefit from units (but not many):

Span attributes (all bytes):
message.compressed_size
message.uncompressed_size
messaging.message.body.size
messaging.message.envelope.size
faas.max_memory

No many metric attributes that can have units, I was only able to find one (WAh):
hw.battery.capacity

Similarly just one resource attribute with unit (bytes):
host.cpu.cache.l2.size

There may be others that I missed.

You can file an issue in https://github.com/open-telemetry/opentelemetry-proto to begin the discussion.

This is a follow up to [OTEP 239: Introduces Profiling Data Model v2](open-telemetry/oteps#239) The main motivation behind this PR is that this will allow us to start experimenting with the profiles proto in opentelemetry-collector. I marked the profiles part as `Experimental` to indicate that this is not a final version of the data model. I copied the proto from the OTEP, and moved `pprofextended.proto` from `profiles/v1/alternatives/pprofextended.proto` to just `profiles/v1/pprofextended.proto`. I did this because I figured we no longer have alternative representations and this will reduce confusion for people outside of Profiling SIG. The rest of the proto stayed the same. I tested this file with a collector fork and I it compiles properly.

For `abc;def` the `locations_start_index` should be `4` as `2` points to `baz`. Follow up of #239 (comment)

This is a follow up to [OTEP 239: Introduces Profiling Data Model v2](open-telemetry/oteps#239) The main motivation behind this PR is that this will allow us to start experimenting with the profiles proto in opentelemetry-collector. I marked the profiles part as `Experimental` to indicate that this is not a final version of the data model. I copied the proto from the OTEP, and moved `pprofextended.proto` from `profiles/v1/alternatives/pprofextended.proto` to just `profiles/v1/pprofextended.proto`. I did this because I figured we no longer have alternative representations and this will reduce confusion for people outside of Profiling SIG. The rest of the proto stayed the same. I tested this file with a collector fork and I it compiles properly.

For `abc;def` the `locations_start_index` should be `4` as `2` points to `baz`. Follow up of open-telemetry/oteps#239 (comment)

This is second version of the Profiling Data Model OTEP. After [we've gotten feedback from the greater OTel community](open-telemetry#237) we went back to the drawing board and came up with a new version of the data model. The main difference between the two versions is that the new version is more similar to the original pprof format, which makes it easier to understand and implement. It also has better performance characteristics. We've also incorporated a lot of the feedback we've gotten on the first PR into this OTEP. Some minor details about the data model are still being discussed and will be flushed out in the future OTEPs. We intend to finalize these details after doing experiments with early versions of working client + collector + backend implementations and getting feedback from the community. The goal of this OTEP is to provide a solid foundation for these experiments. So far we've done a number of things to validate it: * we've written a new profiles proto described in this OTEP * we've documented decisions made along the way in a [decision log](https://github.com/open-telemetry/opentelemetry-proto-profile/blob/main/opentelemetry/proto/profiles/v1/decision-log.md) * we've done benchmarking to refine the data representation (see Benchmarking section in a [collector PR](petethepig/opentelemetry-collector#1)) * diff between original pprof and the new proto: [link](open-telemetry/opentelemetry-proto-profile@2cf711b...petethepig:opentelemetry-proto:pprof-experiments#diff-9cb689ea05ecfd2edffc39869eca3282a3f2f45a8e1aa21624b452fa5362d1d2) We're seeking feedback and hoping to get this approved. --- For (a lot) more details, see: * [OTel Profiling SIG Meeting Notes](https://docs.google.com/document/d/19UqPPPlGE83N37MhS93uRlxsP1_wGxQ33Qv6CDHaEp0/edit) --------- Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de> Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co> Co-authored-by: Felix Geisendörfer <felix@felixge.de> Co-authored-by: Reiley Yang <reyang@microsoft.com>