[receiver/vcenter] Repeated timeseries values fixed #23669

BominRahmani · 2023-06-25T23:39:09Z

Description:
There was an issue where multiple same attribute values of a metric (time-series) would have the same timestamps.This PR adds a new resource attribute that allows the metrics that are being emitted with the repeating timestamps in the time-series to be more accurately represented.

djaglowski · 2023-06-26T13:28:49Z

receiver/vcenterreceiver/metrics.go

+				// Increment values of timestamps when finished with all nested values
+				if j == len(val.Value)-1 {
+					for ind, metricMapVal := range metricMap[val.Name] {
+						incrementedTime := time.Duration(metricMapVal.Interval) * time.Duration(j+1) * time.Nanosecond


Can you help me understand what this is doing?

Is vcenter returning multiple data points with the same timestamp and no other differentiating information?

Yes. processHostPerformance() goes through a map with a Value array (image 1)

The issue is that several Values of the same name

will go through and report their datapoint values. There are 5 example timestamps that are supposed to go 1-1 with the length of the inner value it seems, but since they get reported multiple times, the timestamps get recycled. the above code you mentioned is just me keeping track and incrementing those timestamps so when they get encountered, its not being recycled. The differentiating factor between these values seems to be the Instance

but the recording of the data doesn't take the difference of the Instance into account.

but the recording of the data doesn't take the difference of the Instance into account.

That seems to be the problem. If I'm understanding correctly, we need to differentiate these data points with a resource attribute. In other words, we need to emit multiple resources, each with one of these data points.

@schmikei, WDYT?

Sorry for the delay, was on vacation!

It feels like this is best solved by adding a vcenter.host.disk.instance attribute or something. What's being reported from the PerformanceManager API for this particular host seems to have 2 drives and that is the crux of the case you're showing right now. Seems like that is the approach I think makes the most sense @BominRahmani messing with the timestamps seems like the incorrect approach.

djaglowski · 2023-07-25T19:19:41Z

receiver/vcenterreceiver/metrics.go

+			if idx >= 1 && val.Instance != m.Value[idx-1].Instance {
+				v.mb.EmitForResource(metadata.WithVcenterClusterName(clusterName),
+					metadata.WithVcenterHostName(hostname),
+					metadata.WithVcenterVMID(vmUUID),
+					metadata.WithVcenterVMName(vmMain.Name()),
+					metadata.WithVcenterSystemDeviceID(m.Value[idx-1].Instance))
+			}


What is going on here? What are we checking for in the condition and why are we emitting a resource when we find it?

It looks like the new logic is saying "If we just finished all of the metrics for one Instance, emit those metrics," but I think it's going to emit the metrics for the first item of each instance with the metrics for the previous instance.

Instead of just sorting on line 219, we should instead create a new map[string][]object & use it to group by instance, then iterate over the map and emit after parsing each grouped list.

Yeah just chiming in and saying that the approach @dehaansa suggests would make what you're trying to do here more readable @BominRahmani. I also would probably want to validate more on if there is a content in the Instance variable. I know at least particularly for the VM network metrics that the instance can actually be the VM name itself which is an auto aggregation of the rest of the network interfaces...

All that to say is that we should do our due diligence and make sure content on this change is validated and the expected.yaml is strictly looked at to ensure we know which of the performance metrics are going to be affected.

dehaansa · 2023-08-03T13:48:43Z

receiver/vcenterreceiver/metrics.go

+			if idx >= 1 && val.Instance != m.Value[idx-1].Instance {
+				v.mb.EmitForResource(metadata.WithVcenterClusterName(clusterName),
+					metadata.WithVcenterHostName(hostname),
+					metadata.WithVcenterVMID(vmUUID),
+					metadata.WithVcenterVMName(vmMain.Name()),
+					metadata.WithVcenterSystemDeviceID(m.Value[idx-1].Instance))
+			}


It looks like the new logic is saying "If we just finished all of the metrics for one Instance, emit those metrics," but I think it's going to emit the metrics for the first item of each instance with the metrics for the previous instance.

Instead of just sorting on line 219, we should instead create a new map[string][]object & use it to group by instance, then iterate over the map and emit after parsing each grouped list.

schmikei · 2023-08-03T14:12:58Z

receiver/vcenterreceiver/metrics.go

+			if idx >= 1 && val.Instance != m.Value[idx-1].Instance {
+				v.mb.EmitForResource(metadata.WithVcenterClusterName(clusterName),
+					metadata.WithVcenterHostName(hostname),
+					metadata.WithVcenterVMID(vmUUID),
+					metadata.WithVcenterVMName(vmMain.Name()),
+					metadata.WithVcenterSystemDeviceID(m.Value[idx-1].Instance))
+			}


Yeah just chiming in and saying that the approach @dehaansa suggests would make what you're trying to do here more readable @BominRahmani. I also would probably want to validate more on if there is a content in the Instance variable. I know at least particularly for the VM network metrics that the instance can actually be the VM name itself which is an auto aggregation of the rest of the network interfaces...

All that to say is that we should do our due diligence and make sure content on this change is validated and the expected.yaml is strictly looked at to ensure we know which of the performance metrics are going to be affected.

schmikei · 2023-08-03T14:15:41Z

receiver/vcenterreceiver/metadata.yaml

@@ -31,6 +31,10 @@ resource_attributes:
    description: The instance UUID of the virtual machine.
    enabled: true
    type: string
+  vcenter.system.device.id:
+    description: The unique identifier of the specific hardware or virtual component being utilized in the vCenter environment.
+    enabled: true


Suggested change

enabled: true

enabled: false

@djaglowski I know this receiver is still in alpha state, but what's your thoughts on this RA being enabled by default?

I have a slight concern about cardinality increase for current users even if it dropped information previously

I'm mostly concerned with whether or not data points are uniquely identifiable. If we disable the attribute, then we can't tell the data points apart, so I think it has to correlate with disabling the data points as well.

That said, would you feel better about it if we switch back to having device id as a data point attribute? I've recently been convinced that this is allowable based on #23565 (comment), specifically the suggestion that "We typically put multiple entities in a Resource."

I think I'm of the opinion that the first fix for this case might just be to find and only report the datapoint referring to the correct instance of a host or VM. I caught that at least for most of these metrics that there exists a datapoint that correlates correctly to the host system and the VM. So rather than recording on every datapoint, we just find the corresponding correct resource and record only that datapoint for the current metrics.

I think further work beyond that would be to effectively record Network Interfaces, Disks, and other fields of the Instance fields on the sampleInfo.

Like for this example with "net.bytesRx.average", we can see that there is an already aggregated value for the instance of the VM along with the corresponding Network Interfaces.

So there is more hierarchy here
| VM
| -- vmnic-0
| -- vmnic-1
| -- vmnic-2
and we have a metric vcenter.vm.network.throughput

So that is a peculiarity that is interesting in this case. I believe that the "correct" is to correlate that datapoint only and then we can add more metrics later for the different network interfaces and other instances and create identifying resource attributes on those.

Those are just my opinions though and you probably know a tad bit more about the correctness of the data model so I'll defer to you, but I just wanted to write out my thoughts on what seemed like the correct and least destructive approach.

Let me know if those ramblings make sense or where we disagree on the approach. I could imagine an alternative approach that we move away from a metric named vcenter.vm.network.throughput in favor of vcenter.vm.network.interface.throughput, but I'm not sure if that would warrant the removal of vcenter.vm.network.throughput.

If you're confident that we can isolate only one data point which correctly represents the entity, then let's do that.

github-actions bot added the receiver/vcenter label Jun 25, 2023

github-actions bot requested review from djaglowski and schmikei June 25, 2023 23:39

djaglowski reviewed Jun 26, 2023

View reviewed changes

BominRahmani requested a review from djaglowski June 27, 2023 13:27

BominRahmani force-pushed the bugfix/vcenter-repeated-timeseries branch from 941844e to 2fc1a85 Compare July 10, 2023 17:34

djaglowski reviewed Jul 25, 2023

View reviewed changes

dehaansa suggested changes Aug 3, 2023

View reviewed changes

schmikei suggested changes Aug 3, 2023

View reviewed changes

BominRahmani added 5 commits August 4, 2023 15:39

[receiver/vcenter] fixed timestamps + recreated golden file

d62ec67

added changelog

63af00d

Add attributes for network instance and disk instance

53b6bdd

Add new resource attribute

6d82401

pre-rebase

2d0ab43

BominRahmani force-pushed the bugfix/vcenter-repeated-timeseries branch 3 times, most recently from dc9ce9f to 81e788b Compare August 4, 2023 20:39

only emitted when name matches instance

fa8e3b4

BominRahmani force-pushed the bugfix/vcenter-repeated-timeseries branch from 81e788b to fa8e3b4 Compare August 7, 2023 03:33

schmikei mentioned this pull request Aug 11, 2023

[receiver/vcenter] Metric attributes for host and vm performance metrics #25149

Merged

BominRahmani closed this Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/vcenter] Repeated timeseries values fixed #23669

[receiver/vcenter] Repeated timeseries values fixed #23669

BominRahmani commented Jun 25, 2023 •

edited

Loading

djaglowski Jun 26, 2023

BominRahmani Jun 26, 2023

djaglowski Jun 27, 2023

schmikei Jul 5, 2023

djaglowski Jul 25, 2023

dehaansa Aug 3, 2023

schmikei Aug 3, 2023

dehaansa Aug 3, 2023

schmikei Aug 3, 2023

schmikei Aug 3, 2023

djaglowski Aug 3, 2023

schmikei Aug 3, 2023

djaglowski Aug 4, 2023

[receiver/vcenter] Repeated timeseries values fixed #23669

[receiver/vcenter] Repeated timeseries values fixed #23669

Conversation

BominRahmani commented Jun 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BominRahmani commented Jun 25, 2023 •

edited

Loading