Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stdlib/contrib): Add stdlib/contrib package events #3266

Merged
merged 2 commits into from
Oct 22, 2020
Merged

feat(stdlib/contrib): Add stdlib/contrib package events #3266

merged 2 commits into from
Oct 22, 2020

Conversation

tomhollingworth
Copy link
Contributor

Pull request for a community contributed package.

Done checklist

Why?

My use case is getting the total time in states. The elapsed function would drop a value and associate the duration with the subsequent record and stateDuration would continuously totalize on a function from a starting time. The issue with state duration is that if the same value appears multiple times in a row it's duration in state could be counted multiple times. It also logged the time against the subsequent record. I was previously using a custom flux function, see below, however the performance wasn't great and I'm sure others would benefit from this.

How?

This package contains a function events.duration which iterates on the timeColumn and compares the current record against the next record and calculates a difference between the two. For the last record it compares against the stopColumn unless a stop time is provided.

Basic Example:

import "contrib/tomhollingworth/events"

from(bucket: "example-bucket")
    |> range(start: -24h)
    |> events.duration()

Comparison between elapsed, stateDuration and events.duration

Consider the following dataset of a door opening and closing:

import "csv"

inData = "
#datatype,string,long,dateTime:RFC3339,string,string
#group,false,true,false,false,false
#default,,,,,
,result,table,_time,_value,_field
,,0,2020-01-01T08:00:00Z,Closed,value
,,0,2020-01-01T08:15:00Z,Open,value
,,0,2020-01-01T08:15:08Z,Closed,value
,,0,2020-01-01T08:21:00Z,Open,value
,,0,2020-01-01T08:21:07Z,Closed,value
,,0,2020-01-01T08:24:00Z,Open,value
,,0,2020-01-01T08:24:12Z,Closed,value
"

csv.from(csv: inData)
  |> range(start: 2020-01-01T08:00:00Z, stop: 2020-01-01T08:30:00Z)

|> elapsed() yields the following. The first record is dropped and durations are associated with subsequent records. Totalizing on filter on value and summing the elasped column would have the duration swapped between open and closed.

-,  result, table,               _start,                _stop, _time, _value, _field
+,  result, table,               _start,                _stop, _time, _value, _field,       elasped
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:00:00Z, Closed,  value
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:10:00Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:10:00Z, Closed,  value,          1600
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:00Z,   Open,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:00Z,   Open,  value,          1300
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:08Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:08Z, Closed,  value,             8
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:00Z,   Open,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:00Z,   Open,  value,          1352
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:07Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:07Z, Closed,  value,             7
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:00Z,   Open,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:00Z,   Open,  value,           173
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:12Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:12Z, Closed,  value,            12

|> stateDuration(fn: (r) => true) yields the following. The duration is continuously totalized. For a particular state we could also include that in our stateDuration function, however time is counted twice on subsequent events. Totalizing this would have that time counted twice.

-,  result, table,               _start,                _stop,                _time, _value, _field
+,  result, table,               _start,                _stop,                _time, _value, _field, stateDuration
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:00:00Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:00:00Z, Closed,  value,             0
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:10:00Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:10:00Z, Closed,  value,           600
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:00Z,   Open,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:00Z,   Open,  value,           900
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:08Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:08Z, Closed,  value,           908
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:00Z,   Open,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:00Z,   Open,  value,          1260
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:07Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:07Z, Closed,  value,          1267
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:00Z,   Open,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:00Z,   Open,  value,          1440
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:12Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:12Z, Closed,  value,          1452

|> events.duration() yields the following.

-,  result, table,               _start,                _stop,                _time, _value, _field
+,  result, table,               _start,                _stop,                _time, _value, _field,      duration
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:00:00Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:00:00Z, Closed,  value,           600
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:10:00Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:10:00Z, Closed,  value,           300
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:00Z,   Open,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:00Z,   Open,  value,             8
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:08Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:15:08Z, Closed,  value,           352
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:00Z,   Open,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:00Z,   Open,  value,             7
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:07Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:21:07Z, Closed,  value,           173
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:00Z,   Open,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:00Z,   Open,  value,            12
-,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:12Z, Closed,  value
+,        ,     0, 2020-01-01T08:00:00Z, 2020-01-01T08:30:00Z, 2020-01-01T08:24:12Z, Closed,  value,           348

Notice how time in duration is associated with the correct value. One can easily now group by _value and sum to get the total time in each state.

Custom Flux Function

As mentioned above I was previously doing something similar to events.duration in a inline flux function but performance wasn't great.

import "math"

stateInDuration = (tables=<-) =>
  tables
    |> sort(columns: ["_time"], desc: true)
    |> map(fn: (r) => ({r with _stateInDuration: (int(v: r._time) - int(v: r._start))/1000000000}))
    |> map(fn: (r) => ({r with _stateInDurationLast: (int(v: r._stop) - int(v: r._time))/1000000000}))
    |> difference(columns: ["_stateInDuration"], keepFirst: true)
    |> map(fn: (r) => ({r with stateInDuration: if exists r._stateInDuration != 0 then math.abs(x: float(v: r._stateInDuration)) else math.abs(x: float(v: r._stateInDurationLast))}))
    |> drop(columns: ["_stateInDuration", "_stateInDurationLast"])
    |> sort(columns: ["_time"], desc: false)

Reviewer Notes

@codecov-io
Copy link

Codecov Report

Merging #3266 into master will decrease coverage by 0.04%.
The diff coverage is 38.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3266      +/-   ##
==========================================
- Coverage   49.63%   49.59%   -0.05%     
==========================================
  Files         344      346       +2     
  Lines       35852    35984     +132     
==========================================
+ Hits        17794    17845      +51     
- Misses      15583    15655      +72     
- Partials     2475     2484       +9     
Impacted Files Coverage Δ
stdlib/contrib/tomhollingworth/events/duration.go 37.69% <37.69%> (ø)
stdlib/contrib/tomhollingworth/events/flux_gen.go 100.00% <100.00%> (ø)
stdlib/test_packages.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09b7dd9...adc4b24. Read the comment docs.

@nathanielc
Copy link
Contributor

@tomhollingworth Thanks! The new events.duration function behavior looks great and I agree it seems like others will find value in it as well. I'll ask someone on the team to take at the code and get back to you.

@nathanielc nathanielc requested review from a team, nathanielc and rockstar and removed request for a team and nathanielc October 21, 2020 16:07
Copy link
Contributor

@rockstar rockstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this work. I have some very minor copy editing things I'd like to see changed, but those issues are surrounded by excellent documentation. Thank you for taking the time to do that.

* fix(stdlib/contrib): copy editing

* feat(stdlib/contrib): additional test coverage
@tomhollingworth
Copy link
Contributor Author

Thanks for the feedback and opportunity to contribute!

Last commit includes:

  • Updated the copy editing as per review,
  • Added some additional test cases to pass the codecov/patch target

Copy link
Contributor

@rockstar rockstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this work. It is the kind of contribution I'm not only happy to include, but am excited to do so. It's a good example of what good work looks like. Thank you again.

@rockstar rockstar merged commit c6a0809 into influxdata:master Oct 22, 2020
@rockstar
Copy link
Contributor

We release at least once a week, usually on Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants