Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support full range of monitored resource types for GCP #114

Closed
ptone opened this issue Nov 10, 2020 · 8 comments
Closed

support full range of monitored resource types for GCP #114

ptone opened this issue Nov 10, 2020 · 8 comments
Labels

Comments

@ptone
Copy link
Member

ptone commented Nov 10, 2020

subdivideGCPTypes only resolves kubernetes and GCE resource types.

func subdivideGCPTypes(labelMap map[string]string) (string, map[string]string) {

This makes this library not usable from other environments such as GAE, or with a generic_task resource with additional lables.

Since the otel resource model doesn't have a special type field, perhaps a "special" label of "__resource_type" can use a literal value, instead of smart matching to a constant?

@nilebox
Copy link
Contributor

nilebox commented Nov 10, 2020

Since the otel resource model doesn't have a special type field, perhaps a "special" label of "__resource_type" can use a literal value, instead of smart matching to a constant?

The way resource types are currently inferred is via semantic conventions, based on the set of present resource attributes/labels

GAE would probably fit into FaaS category in that case: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/resource/semantic_conventions/faas.md

@ptone
Copy link
Member Author

ptone commented Nov 10, 2020

This seems to be a two-phased mapping. One is from a bag of attributes, to a semantic model from the links you suggest. But then there is another mapping from the list of resource labels/attributes to a GCP specific monitored resource. The match of attributes is not exact between expected GCP labels, and the otel semantic models.

The list of GCP resources likely can't map in a simple way to the semantic model:
https://cloud.google.com/monitoring/api/resources

The private method I reference is what currently does the latter. It doesn't seem to have any opinion/enforcement of the otel semantic model, neither does resource.

My concern is outside the few resource types implemented in the brute matching algorithm, no other resource types are supported with this library, as the Type field in the GCP API is set solely based on this matching algorithm.

GAE, GCF, and Cloud Run might all fit into the FaaS semantic model for otel, but need GCP specific support in this library.

More context on why collapsing to "global" is bad:
https://cloud.google.com/monitoring/custom-metrics/creating-metrics#global-v-generic

As it stands, I have to back out the use of this library and go back to using GCP proto based client libraries because of this limitation.

@nilebox
Copy link
Contributor

nilebox commented Nov 11, 2020

GAE, GCF, and Cloud Run might all fit into the FaaS semantic model for otel, but need GCP specific support in this library.

The problem here is that if you add a GCP-specific__resource_type field (or similar), you would have to support injecting it in all client SDKs, regardless of which exporter is being used. Or, you'll have to manually inject this attribute in the application code rather than by some auto-detection mechanism.

In other words, resource attributes are normally produced at the "source", whereas exporter just translates them into GCP-specific naming. If there is no __resource_type produced at the source, there is nothing to translate on the exporter side.

While we may make exporter "detect" resource types based on the environment it's running in (e.g. look at GCP metadata or environment variables), this will only work in the case where exporter is running in the same process as the instrumented application.

This is especially a problem if you use OpenTelemetry collector as a middleware, i.e.
App ---(OTLP exporter)---> OpenTelemetry Collector ---(GCP exporter)---> GCP, in which case GCP exporter is running in a different process (and hence, "environment") than the app which produced signals, so it can't "auto-detect" resource types anymore.

Overall this is definitely a valid concern for "semantic conventions" not working for every vendor-specific use case, but it's broader than just adding explicit __resource_type field I'm afraid, and probably needs to be discussed within the Specification SIG.

/cc @jsuereth @kjordy

@ptone
Copy link
Member Author

ptone commented Nov 12, 2020

I agree with the suggestion of __resource_type being a poor ultimate solution, and a hack. I'm looking for some stop-gap workaround for now given the current implementation is private API, so that this library can be used for resource-types, at least generic_task. The "global" resource task does not allow sufficient labels to support proper sharding of time-series writes to stay under service quotas. The result is that this library is currently not usable for many custom metric scenarios with GCP Monitoring.

@ptone
Copy link
Member Author

ptone commented Nov 19, 2020

@ptone
Copy link
Member Author

ptone commented Dec 18, 2020

FWIW - my immediate case in unblocked by #118

However this is no less a semantic solution than what I was proposing, it simply hardcodes a 'namespace' hint as the resource type, in fact mapping one GCP resource type (cloud_run_revision) to a monitoring catchall of generic_task. cloud_run_revision is a platform managed type that does not support the cardinality of an additional key needed for write throughput.

@nilebox
Copy link
Contributor

nilebox commented Dec 20, 2020

@ptone how does that resource with namespace = "cloud_run_revision" get exported to Cloud Monitoring?
Shouldn't there be a code in this repo's metric exporter recognizing this special namespace?

Also, if this convention is not implemented across Go, Java, Python and JS it's not portable across Cloud Run if I understand correctly, e.g. Java application will be instrumented with OpenTelemetry Java SDK, which doesn't have this detector IIUC.

@dashpole dashpole added enhancement New feature or request priority: p3 labels Aug 30, 2022
@damemi
Copy link
Contributor

damemi commented Jan 30, 2023

All platforms are supported that we're aware of, though some map to generic_task

@damemi damemi closed this as completed Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants