Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve descriptions by mentioning unit of measurement and how the value is reported #21

Closed
ramonsmits opened this issue Jul 25, 2017 · 7 comments · Fixed by #72
Closed
Labels
Milestone

Comments

@ramonsmits
Copy link
Member

ramonsmits commented Jul 25, 2017

It would be helpful for an operator to see what this unit of measurement is in the performance counter description:

image

Also, the descriptions of the metrics can be improved. Take a look at the counters for Logical disk:

image

Two descriptions were taken:

Avg. Disk Write Queue Length is the average number of write requests that were queued for the selected disk during the sample interval.

Current Disk Queue Length is the number of requests outstanding on the disk at the time the performance data is collected. It also includes requests in service at the time of the collection. This is an instantaneous snapshot, not an average over the time interval. Multi-spindle disk devices can have multiple requests that are active at one time, but other concurrent requests are awaiting service. This counter might reflect a transitory high or low queue length, but if there is a sustained load on the disk drive, it is likely that this will be consistently high. Requests experience delays proportional to the length of this queue minus the number of spindles on the disks. For good performance, this difference should average less than two.

Suggested names and descriptions:

  • Critical Time = > Current Critical Time
    • Current Critical Time is the duration in rounded seconds for sending and processing the last processed message. This is an instantaneous snapshot, not an average over the time interval.
  • Processing Time => Current Processing Time
    • Current Processing Time is the duration in rounded seconds of the last successfully processed message. This is an instantaneous snapshot, not an average over the time interval.
  • SLA violation countdown = > Current SLA violation countdown
    • Current SLA violation countdown is the duration in seconds until the configured Service Level Agreement (SLA) for this endpoint is breached. This is an instantaneous snapshot, not an average over the time interval.
  • Critical Time Average = -> Avg. Critical Time
    • Avg. Critical Time is the average duration in seconds for sending and processing of all messages during the sample interval.
  • Processing Time Average => Avg. Processing Time Average
    • Avg. Processing Time Average is the average duration in seconds of all successfully processed messages during the sample interval.

I don't think the names themselves can be adjusted anymore for the already publicly released version. So that means it must be decided for the new average counters if the current Average postfix must become a Avg. prefix

I think that the descriptions can be updated.

@tmasternak
Copy link
Member

@ramonsmits

I don't think the names themselves can be adjusted anymore for the already publicly released version.

Correct. So this leaves Critical Time, Processing Time and SLA violation count down out of scope. For new once the name change and unit is covered by #23.

Change to description is trickier. First it is taken from the probe description which gives definition of e.g. Critical Time - it cannot provide description for the performance counter only for the meaning of a single measurement.

@ramonsmits
Copy link
Member Author

Average counters

Avg. {name} - {description} The value is the average duration in seconds during the sample interval.

Current counters:

{name} - {description} The value is in rounded seconds. This is an instantaneous snapshot, not an average over the time interval.

Examples:

  • SLA Violation Countdown - Seconds until the SLA for this endpoint is breached. The value is in rounded seconds. This is an instantaneous snapshot, not an average over the time interval.
  • Processing time - The time it took to successfully process a message. The value is in rounded seconds. This is an instantaneous snapshot, not an average over the time interval.
  • Avg. Processing time - The time it took to successfully process a message. The value is the average duration in seconds during the sample interval.

@ramonsmits
Copy link
Member Author

As discussed, the probes are only about what a single measurement represents but not how the value is converted into a metric which is up to the implementation of the probe receiver.

@tmasternak
Copy link
Member

I propose to close this as we want to remove the script generators altogether #46. Proposals from this issues should be used as constant text in the static script we will deliver.

@DavidBoike
Copy link
Member

@tmasternak as #46 is now closed, we are about to release a version with a static script, with this source.

Are you saying descriptions from this issue description should be comments in the script?

@tmasternak
Copy link
Member

I think that the original suggestion was to change both the name and the descriptions.

That begin said changing the description was the main part of it. So if we could do that in the release it would be great.

@ramonsmits
Copy link
Member Author

Explained to @DavidBoike that the counter names cannot be modified as that would introduce breaking changes. We probably want counter names to be backward compatible to even non-supported major versions. The counter order is also important, and must not be modified!

What can be changed are the counter descriptions. These are not used in any value/index matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants