Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify usage of "component" attribute in semantic conventions #336

Closed
tigrannajaryan opened this issue Oct 24, 2019 · 10 comments
Closed

Clarify usage of "component" attribute in semantic conventions #336

tigrannajaryan opened this issue Oct 24, 2019 · 10 comments

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Oct 24, 2019

Span Semantic conventions currently define an attribute "component". It is unclear what exactly this attribute is intended to reflect.

Semantic conventions define different values of "component" attribute in each of the sections of the document, as if the sections are exclusive of one another, which seems to be an incorrect assumption. A counter-example to that assumption would be a Database system which is accessible via HTTP transport (e.g. https://wiki.postgresql.org/wiki/HTTP_API). In this counter-example it is unclear whether the "component" should be set to "http" or to "postgresql".

Unless there is a clear intent on the meaning of "component" attribute and what value should be populated when the emitting component logically belongs to different kinds of emitters defined in this document I suggest that we remove the attribute altogether. Consumers of attributes which are interested in learning the type of the component that emitted the span can do so by examining the presence or absence of "Required" attribute of appropriate attribute group. For example the presence of "http.method" attribute would mean that the emitter is an "http" component (perhaps in addition to being something else at the same time).

@Oberon00
Copy link
Member

Related: #245

@Oberon00
Copy link
Member

Oberon00 commented Oct 24, 2019

In this counter-example it is unclear whether the "component" should be set to "http" or to "postgresql".

At some point the back end at least will often want to decide whether this Span is a DB or a HTTP span (e.g. to display an icon). So the question is: Should the back end decide this (e.g. using heuristics like HTTP + DB => DB), or should the instrumentation decide this? If the latter, the component attribute is helpful.

One could also make the point that in practice most likely, the Postresql integration would create a Span with DB attributes and the HTTP integration would create a child span with HTTP attributes instead of the attributes being mixed in a single span. I wonder if we can find an example where we really would have mixed span types.

@jmacd
Copy link
Contributor

jmacd commented Oct 24, 2019

My understanding had been that component would equal the name of the named Tracer or Meter.

@tigrannajaryan
Copy link
Member Author

My understanding had been that component would equal the name of the named Tracer or Meter.

The semantic convention requires hard-coded values for "component" field based on the type of the emitter, e.g. "http", so I think it does not align with your understanding (not saying your understanding is wrong - I am not sure what is right here - to me the whole "component" attribute is superfluous).

@jmacd
Copy link
Contributor

jmacd commented Oct 24, 2019

@tedsuo What do you think?

@tedsuo
Copy link
Contributor

tedsuo commented Oct 24, 2019

I think we can resolve this by switching to a structured data format. In this case, the "component" tag becomes the key to a dictionary containing semantically defined keys and values.

The named tracer which is reporting the spans can also be represented as span attributes. For clarity, let's call that dictionary key "reporter" rather than "component."

For example, a python client speaking to mongodb over HTTP might generate a span with the following attributes:

  {
    "reporter": {
      "name": "pymongo-opentelemetry",
      "version": "1.2.3"
    },
    "span":{
       "kind": "client",
     },
    "http_client": {
      "method": "GET",
      "url": "https://example.com:779/path/12314/?q=ddds#123",
      "status_code": "200"
    },
    "db": {
      "type": "mongodb",
      "statement": "db.getCollection('foo').find()",
      "instance": "customers",
    },
    "tcp": {
      "ipv4": "127.0.0.1",
      "port": "80"
    }
  }

In this world, as long as there is no collision between key names, there shouldn't be an issue with a span representing multiple types.

@Oberon00 I understand the issue you bring up: Should instrumentation define a span "type" to make some UI decisions clearer, such as what icon to show? I lean towards no: it would be better for instrumentation to focus on providing unambiguous, objective data.

Subjective decisions, such as wether the icon should be "http" or "client" or "db" might be difficult. Which icon would you pick for an HTTP PUT request to an address which clearly describes an AWS S3 bucket? S3 logo, db icon, client request icon? Best to apply labels like that when the data hits a specific back-end system.

@tigrannajaryan
Copy link
Member Author

@tedsuo the reporter as you defined it makes a lot of sense. That is different from how component is defined today (and I find reporter a lot more useful and meaningful than component).

Subjective decisions, such as wether the icon should be "http" or "client" or "db" might be difficult. Which icon would you pick for an HTTP PUT request to an address which clearly describes an AWS S3 bucket? S3 logo, db icon, client request icon? Best to apply labels like that when the data hits a specific back-end system.

I tend to agree. I think backends should make these decisions and to make such decisions they don't need component attribute (nor probably reporter as you defined it), since they can infer the same (and more) information from the presence of relevant attributes defined by semantic conventions (e.g. http.method or db.type)

That said I disagree with this part:

I think we can resolve this by switching to a structured data format.

I think the structured vs unstructured format is orthogonal to the issue I raised. We can encode reporter data using equivalent flat structure with no loss of semantics and using the same dot notation as currently used by semantic conventions, e.g.:

{
  "reporter.name": "pymongo-opentelemetry",
  "reporter.version": "1.2.3"
}

@jmacd
Copy link
Contributor

jmacd commented Oct 24, 2019

@tigrannajaryan I see no value in having a hard-coded list of component values the way the spec currently is written. @tedsuo you used an example of what I think would come from a named Tracer, not one of the hard-coded values currently in the spec. Are you thinking of this component/reporter name as equal to the name of the Tracer or Meter?

I would propose that we specify a conventional name for the name of of the named Meter or Tracer. It can be reporter if you like, or component. It should have structure, so a .name and a .version, the version having been discussed in https://github.com/open-telemetry/oteps/blob/master/text/0038-version-semantic-attribute.md.

@tedsuo
Copy link
Contributor

tedsuo commented Oct 25, 2019

@jmacd sounds good. I sense that we are all mostly on the same page, so just to summarize what I believe we are suggesting:

  • We can remove the current component tag. Spans do not require a type field other than span_kind.
  • Instead, spans can be composed of multiple "types" at the same time, via the semantic definitions. Attributes for each semantic definition are namespaced. So, all HTTP client attributes are namespaced with http_client. Whether this namespacing is done via dot-notation or structured data is moot.
  • The name value which comes from the TracerFactory was intended to represent the specific package name of the instrumentation library - for example io.opentelemetry.contrib.mongodb or pymongo-opentelemetry. It was not intended to match a component value, such as db. If this TracerFactory data needs to be stapled onto every span, that could be done with a new semantic definition, such as reporter.name and reporter.version.

(BTW, I'm not necessarily arguing we should staple the TracerFactory data onto every span, just pointing out how the data would be modelled.)

@tigrannajaryan
Copy link
Member Author

"component" attribute has been removed from semantic conventions via #447
Closing this as no longer applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants