Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[transform] Implementation of transform processing #8252

Closed
3 of 4 tasks
anuraaga opened this issue Mar 3, 2022 · 18 comments · Fixed by #10367
Closed
3 of 4 tasks

[transform] Implementation of transform processing #8252

anuraaga opened this issue Mar 3, 2022 · 18 comments · Fixed by #10367
Assignees
Labels
processor/metricstransform Metrics Transform processor processor/spanmetrics Span Metrics processor

Comments

@anuraaga
Copy link
Contributor

anuraaga commented Mar 3, 2022

The transform processor currently supports transformation of traces. This issue tracks next steps

  • Make function invocation logic generic for use across signals
  • Add metrics data model
  • Add logs data model
  • Add an initial set of transformation functions on top of initial set and keep_keys
@pureklkl
Copy link
Contributor

pureklkl commented Mar 8, 2022

Just tried this processor, is there a way to check whether an attribute exists? like where attr["name"] != nil

@aunshc
Copy link
Contributor

aunshc commented Mar 10, 2022

@anuraaga does the transform processor support using wildcards for operations with attributes through queries?
For example, set(attributes["http.*"], "/foo")

@anuraaga
Copy link
Contributor Author

@pureklkl Thanks for the suggestion! nil being a supported literal did come up, we didn't add it in the first version but will try to get that in soon as it does seem important.

@aunshc Wildcards in the path expressions is not currently supported. Could you describe your use case for wildcards? Is it to remove attributes for entire namespaces?

@aunshc
Copy link
Contributor

aunshc commented Mar 18, 2022

@anuraaga The use case is to perform actions like insert, update, delete listed for the attributesprocessor https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/attributesprocessor for all attributes within a namespace with wildcards (eg.db.*, http.*). Would that be in the scope of this processor?

@pureklkl
Copy link
Contributor

I need following features for the metric side, and I think they are already supported for the span(feel free to to correct me):

  • Promote a metric label to a resource
  • Rename a promoted metric label
  • Replace a value for a resource attribute based on a condition
  • Keep selected metric labels only (i.e. only preserve 'foo' and 'bar' labels)

I am wondering what is the timeline to add support for metric side. If possible, I would also like to help the contribution to accelerate the progress.

@anuraaga
Copy link
Contributor Author

Hi @pureklkl - indeed I think they are supported or would only require some small tweaks.

I am currently working on a change to make the core function handling logic generic, instead of only working with spans, which will enable adding the metrics data model. Should be able to send it out this week and it will hopefully be relatively mechanical to get metrics in after that.

@TylerHelmuth
Copy link
Member

I can start working on adding the metrics data model

@anuraaga
Copy link
Contributor Author

anuraaga commented May 2, 2022

Thanks @TylerHelmuth - just one point about metrics which you may have already realized is we would scope the transformations to a point so need to expose the metrics descriptor as well in the field expressions, this is briefly mentioned in the design doc.

https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/processing.md#telemetry-query-language

Let me know if anything's not clear about that

@TylerHelmuth
Copy link
Member

That helps a lot. Where did descriptor originate? I anticipated using metric.name, metric.description, metric.unit, and metric.data for the fields on Metric. Also, how are you thinking we should get access to some of the fields specific to a data type? For example, how should we access Sum. aggregation_temporality or Sum.is_montonic? Feels like we might need another virtual, or the ability to access the data type via descriptor.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented May 3, 2022

@anuraaga what do you think will be the best way to do the access functions for the data points themselves since there are multiple? Unlike traces and logs, attributes would apply to all data points on a Metric. Should the getter return a slice of all the attribute maps, with the position in the slice matching the position in the DataPoints slice, and the setter take a slice of attribute maps and set the attributes in the DataPoints slice based on the position?

There is cleanup that should be done, but could look like this

func accessAttributes() pathGetSetter {
	return pathGetSetter{
		getter: func(ctx common.TransformContext) interface{} {
			metric := ctx.GetItem().(pmetric.Metric)
			switch metric.DataType() {
			case pmetric.MetricDataTypeGauge:
				dps := metric.Gauge().DataPoints()
				dataPointAttrs := make([]pcommon.Map, dps.Len())
				for i := 0; i < dps.Len(); i++ {
					dataPointAttrs[i] = dps.At(i).Attributes()
				}
				return dataPointAttrs
			case pmetric.MetricDataTypeSum:
				dps := metric.Sum().DataPoints()
				dataPointAttrs := make([]pcommon.Map, dps.Len())
				for i := 0; i < dps.Len(); i++ {
					dataPointAttrs[i] = dps.At(i).Attributes()
				}
				return dataPointAttrs
			case pmetric.MetricDataTypeHistogram:
				dps := metric.Histogram().DataPoints()
				dataPointAttrs := make([]pcommon.Map, dps.Len())
				for i := 0; i < dps.Len(); i++ {
					dataPointAttrs[i] = dps.At(i).Attributes()
				}
				return dataPointAttrs
			case pmetric.MetricDataTypeExponentialHistogram:
				dps := metric.ExponentialHistogram().DataPoints()
				dataPointAttrs := make([]pcommon.Map, dps.Len())
				for i := 0; i < dps.Len(); i++ {
					dataPointAttrs[i] = dps.At(i).Attributes()
				}
				return dataPointAttrs
			case pmetric.MetricDataTypeSummary:
				dps := metric.Summary().DataPoints()
				dataPointAttrs := make([]pcommon.Map, dps.Len())
				for i := 0; i < dps.Len(); i++ {
					dataPointAttrs[i] = dps.At(i).Attributes()
				}
				return dataPointAttrs
			}
			return nil
		},
		setter: func(ctx common.TransformContext, val interface{}) {
			metric := ctx.GetItem().(pmetric.Metric)
			switch metric.DataType() {
			case pmetric.MetricDataTypeGauge:
				if attrs, ok := val.([]pcommon.Map); ok {
					dps := metric.Gauge().DataPoints()
					for i := 0; i < len(attrs); i++ {
						dps.At(i).Attributes().Clear()
						attrs[i].CopyTo(dps.At(i).Attributes()) 
					}
				}
			case pmetric.MetricDataTypeSum:
				if attrs, ok := val.([]pcommon.Map); ok {
					dps := metric.Sum().DataPoints()
					for i := 0; i < len(attrs); i++ {
						dps.At(i).Attributes().Clear()
						attrs[i].CopyTo(dps.At(i).Attributes())
					}
				}
			case pmetric.MetricDataTypeHistogram:
				if attrs, ok := val.([]pcommon.Map); ok {
					dps := metric.Histogram().DataPoints()
					for i := 0; i < len(attrs); i++ {
						dps.At(i).Attributes().Clear()
						attrs[i].CopyTo(dps.At(i).Attributes())
					}
				}
			case pmetric.MetricDataTypeExponentialHistogram:
				if attrs, ok := val.([]pcommon.Map); ok {
					dps := metric.ExponentialHistogram().DataPoints()
					for i := 0; i < len(attrs); i++ {
						dps.At(i).Attributes().Clear()
						attrs[i].CopyTo(dps.At(i).Attributes())
					}
				}
			case pmetric.MetricDataTypeSummary:
				if attrs, ok := val.([]pcommon.Map); ok {
					dps := metric.Summary().DataPoints()
					for i := 0; i < len(attrs); i++ {
						dps.At(i).Attributes().Clear()
						attrs[i].CopyTo(dps.At(i).Attributes())
					}
				}
			}
		},
	}
}

Or would the metricsTransformContext GetItem() return an individual data point instead of a whole Metric, and the access function is only dealing with one data point?

func accessAttributes() pathGetSetter {
	return pathGetSetter{
		getter: func(ctx common.TransformContext) interface{} {
			switch ctx.GetItem().(type) {
			case pmetric.NumberDataPoint:
				return ctx.GetItem().(pmetric.NumberDataPoint).Attributes()
			case pmetric.HistogramDataPoint:
				return ctx.GetItem().(pmetric.HistogramDataPoint).Attributes()
			case pmetric.ExponentialHistogramDataPoint:
				return ctx.GetItem().(pmetric.ExponentialHistogramDataPoint).Attributes()
			case pmetric.SummaryDataPoint:
				return ctx.GetItem().(pmetric.SummaryDataPoint).Attributes()
			}
			return nil
		},
		setter: func(ctx common.TransformContext, val interface{}) {
			switch ctx.GetItem().(type) {
			case pmetric.NumberDataPoint:
				if attrs, ok := val.(pcommon.Map); ok {
					ctx.GetItem().(pmetric.NumberDataPoint).Attributes().Clear()
					attrs.CopyTo(ctx.GetItem().(pmetric.NumberDataPoint).Attributes())
				}
			case pmetric.HistogramDataPoint:
				if attrs, ok := val.(pcommon.Map); ok {
					ctx.GetItem().(pmetric.HistogramDataPoint).Attributes().Clear()
					attrs.CopyTo(ctx.GetItem().(pmetric.HistogramDataPoint).Attributes())
				}
			case pmetric.ExponentialHistogramDataPoint:
				if attrs, ok := val.(pcommon.Map); ok {
					ctx.GetItem().(pmetric.ExponentialHistogramDataPoint).Attributes().Clear()
					attrs.CopyTo(ctx.GetItem().(pmetric.ExponentialHistogramDataPoint).Attributes())
				}
			case pmetric.SummaryDataPoint:
				if attrs, ok := val.(pcommon.Map); ok {
					ctx.GetItem().(pmetric.SummaryDataPoint).Attributes().Clear()
					attrs.CopyTo(ctx.GetItem().(pmetric.SummaryDataPoint).Attributes())
				}
			}
		},
	}
}

@TylerHelmuth
Copy link
Member

Also, in the design doc the metrics are accessed by name:

create_gauge("pod.cpu.utilized", read_gauge("pod.cpu.usage") / read_gauge("node.cpu.limit")

In a situation like this, it looks like the read_gauge function needs access to the Gauge() of a metric with a name of "pod.cpu.usage". How would our switch statement handle a situation like this?

@TylerHelmuth
Copy link
Member

I went forward with the DataPoint approach. I also see now why "descriptor" was chosen as the virtual. The draft PR can be found here. Would love feedback.

@TylerHelmuth
Copy link
Member

The metrics data model has been merged, I will get started on wiring up the processor to the metrics pipeline.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented May 16, 2022

@anuraaga for the alpha release of the processor, what functions would you like to see added in addition to set, keep_keys, truncate_all, and limit?

@anuraaga
Copy link
Contributor Author

I think replace_wildcards as described in the design doc would be nice too, cutting down on cardinality of data from highly generic instrumentation is an important use case and one that the other processors don't currently cover IIRC. That would probably be enough for an initial release.

@TylerHelmuth
Copy link
Member

I'll get a PR out for replace_wildcards this week.

@TylerHelmuth
Copy link
Member

All required PRs have been merged. Once 0.52.0 is released I'll update the processor's status table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/metricstransform Metrics Transform processor processor/spanmetrics Span Metrics processor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants