-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/vcenter] Repeated timeseries values fixed #23669
[receiver/vcenter] Repeated timeseries values fixed #23669
Conversation
receiver/vcenterreceiver/metrics.go
Outdated
// Increment values of timestamps when finished with all nested values | ||
if j == len(val.Value)-1 { | ||
for ind, metricMapVal := range metricMap[val.Name] { | ||
incrementedTime := time.Duration(metricMapVal.Interval) * time.Duration(j+1) * time.Nanosecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand what this is doing?
Is vcenter returning multiple data points with the same timestamp and no other differentiating information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. processHostPerformance() goes through a map with a Value array (image 1)
The issue is that several Values of the same name
will go through and report their datapoint values. There are 5 example timestamps that are supposed to go 1-1 with the length of the inner value it seems, but since they get reported multiple times, the timestamps get recycled. the above code you mentioned is just me keeping track and incrementing those timestamps so when they get encountered, its not being recycled. The differentiating factor between these values seems to be the Instance
but the recording of the data doesn't take the difference of the Instance into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the recording of the data doesn't take the difference of the Instance into account.
That seems to be the problem. If I'm understanding correctly, we need to differentiate these data points with a resource attribute. In other words, we need to emit multiple resources, each with one of these data points.
@schmikei, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, was on vacation!
It feels like this is best solved by adding a vcenter.host.disk.instance
attribute or something. What's being reported from the PerformanceManager API for this particular host seems to have 2 drives and that is the crux of the case you're showing right now. Seems like that is the approach I think makes the most sense @BominRahmani messing with the timestamps seems like the incorrect approach.
941844e
to
2fc1a85
Compare
receiver/vcenterreceiver/metrics.go
Outdated
if idx >= 1 && val.Instance != m.Value[idx-1].Instance { | ||
v.mb.EmitForResource(metadata.WithVcenterClusterName(clusterName), | ||
metadata.WithVcenterHostName(hostname), | ||
metadata.WithVcenterVMID(vmUUID), | ||
metadata.WithVcenterVMName(vmMain.Name()), | ||
metadata.WithVcenterSystemDeviceID(m.Value[idx-1].Instance)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is going on here? What are we checking for in the condition and why are we emitting a resource when we find it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the new logic is saying "If we just finished all of the metrics for one Instance, emit those metrics," but I think it's going to emit the metrics for the first item of each instance with the metrics for the previous instance.
Instead of just sorting on line 219, we should instead create a new map[string][]object
& use it to group by instance, then iterate over the map and emit after parsing each grouped list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just chiming in and saying that the approach @dehaansa suggests would make what you're trying to do here more readable @BominRahmani. I also would probably want to validate more on if there is a content in the Instance variable. I know at least particularly for the VM network metrics that the instance can actually be the VM name itself which is an auto aggregation of the rest of the network interfaces...
All that to say is that we should do our due diligence and make sure content on this change is validated and the expected.yaml
is strictly looked at to ensure we know which of the performance metrics are going to be affected.
receiver/vcenterreceiver/metrics.go
Outdated
if idx >= 1 && val.Instance != m.Value[idx-1].Instance { | ||
v.mb.EmitForResource(metadata.WithVcenterClusterName(clusterName), | ||
metadata.WithVcenterHostName(hostname), | ||
metadata.WithVcenterVMID(vmUUID), | ||
metadata.WithVcenterVMName(vmMain.Name()), | ||
metadata.WithVcenterSystemDeviceID(m.Value[idx-1].Instance)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the new logic is saying "If we just finished all of the metrics for one Instance, emit those metrics," but I think it's going to emit the metrics for the first item of each instance with the metrics for the previous instance.
Instead of just sorting on line 219, we should instead create a new map[string][]object
& use it to group by instance, then iterate over the map and emit after parsing each grouped list.
receiver/vcenterreceiver/metrics.go
Outdated
if idx >= 1 && val.Instance != m.Value[idx-1].Instance { | ||
v.mb.EmitForResource(metadata.WithVcenterClusterName(clusterName), | ||
metadata.WithVcenterHostName(hostname), | ||
metadata.WithVcenterVMID(vmUUID), | ||
metadata.WithVcenterVMName(vmMain.Name()), | ||
metadata.WithVcenterSystemDeviceID(m.Value[idx-1].Instance)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just chiming in and saying that the approach @dehaansa suggests would make what you're trying to do here more readable @BominRahmani. I also would probably want to validate more on if there is a content in the Instance variable. I know at least particularly for the VM network metrics that the instance can actually be the VM name itself which is an auto aggregation of the rest of the network interfaces...
All that to say is that we should do our due diligence and make sure content on this change is validated and the expected.yaml
is strictly looked at to ensure we know which of the performance metrics are going to be affected.
@@ -31,6 +31,10 @@ resource_attributes: | |||
description: The instance UUID of the virtual machine. | |||
enabled: true | |||
type: string | |||
vcenter.system.device.id: | |||
description: The unique identifier of the specific hardware or virtual component being utilized in the vCenter environment. | |||
enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enabled: true | |
enabled: false |
@djaglowski I know this receiver is still in alpha state, but what's your thoughts on this RA being enabled by default?
I have a slight concern about cardinality increase for current users even if it dropped information previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly concerned with whether or not data points are uniquely identifiable. If we disable the attribute, then we can't tell the data points apart, so I think it has to correlate with disabling the data points as well.
That said, would you feel better about it if we switch back to having device id as a data point attribute? I've recently been convinced that this is allowable based on #23565 (comment), specifically the suggestion that "We typically put multiple entities in a Resource."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm of the opinion that the first fix for this case might just be to find and only report the datapoint referring to the correct instance of a host or VM. I caught that at least for most of these metrics that there exists a datapoint that correlates correctly to the host system and the VM. So rather than recording on every datapoint, we just find the corresponding correct resource and record only that datapoint for the current metrics.
I think further work beyond that would be to effectively record Network Interfaces, Disks, and other fields of the Instance
fields on the sampleInfo.
Like for this example with "net.bytesRx.average", we can see that there is an already aggregated value for the instance of the VM along with the corresponding Network Interfaces.
So there is more hierarchy here
| VM
| -- vmnic-0
| -- vmnic-1
| -- vmnic-2
and we have a metric vcenter.vm.network.throughput
So that is a peculiarity that is interesting in this case. I believe that the "correct" is to correlate that datapoint only and then we can add more metrics later for the different network interfaces and other instances and create identifying resource attributes on those.
Those are just my opinions though and you probably know a tad bit more about the correctness of the data model so I'll defer to you, but I just wanted to write out my thoughts on what seemed like the correct and least destructive approach.
Let me know if those ramblings make sense or where we disagree on the approach. I could imagine an alternative approach that we move away from a metric named vcenter.vm.network.throughput
in favor of vcenter.vm.network.interface.throughput
, but I'm not sure if that would warrant the removal of vcenter.vm.network.throughput
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're confident that we can isolate only one data point which correctly represents the entity, then let's do that.
dc9ce9f
to
81e788b
Compare
81e788b
to
fa8e3b4
Compare
Description:
There was an issue where multiple same attribute values of a metric (time-series) would have the same timestamps.This PR adds a new resource attribute that allows the metrics that are being emitted with the repeating timestamps in the time-series to be more accurately represented.