Skip to content
This repository has been archived by the owner on Mar 6, 2023. It is now read-only.

Add option for limiting the number of logs in a Span #39

Merged
merged 1 commit into from
Sep 29, 2016

Conversation

RaduBerinde
Copy link
Contributor

@RaduBerinde RaduBerinde commented Sep 26, 2016

The MaxLogsPerSpan option limits the number of logs in a Span. Oldest
events are dropped as necessary. The first log in the finished span indicates how
many were dropped.

The default limit is 100.

Fixes #38.

CC @tschottdorf @bensigelman

if maxLogs := s.tracer.options.MaxLogsPerSpan; maxLogs > 0 && len(s.raw.Logs) == maxLogs {
// We have too many logs. Overwrite the earliest log (use Logs as a circular
// buffer).
s.raw.Logs[s.numDroppedLogs%maxLogs] = ld
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like us to drop from the middle of the span log, not from the beginning. A log would read something like:
log 0
log 1
log 2
** dropped 5 events **
log 8
log 9
log 10
log 11

Since this is a significant change, we should plan to do it consistently across basictracer libraries. I'd like to get @bhs and @yurishkuro to comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Would that be about half of max before the dropped ones, and half after?
It's easy enough to implement whichever policy on my end; I will wait until there is agreement. It might be easier to access the discussion post-mortem if it's in #38 instead of here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make this efficient, how about treating the 2nd half of the array as a circular buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what we would do (if folks agree on that policy).

// dropped event.
}

// The MaxLogsPerSpan limit does not apply to BulkLogData.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of limiting is to cap memory usage. If the caller is supplying a huge BulkLogData, they are already storing that in memory. That said, I don't feel strongly - we can make those go through the normal limiting logic just as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough - can you add this to the comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

@@ -87,7 +94,8 @@ type Options struct {
// returned object with a Tracer.
func DefaultOptions() Options {
return Options{
ShouldSample: func(traceID uint64) bool { return traceID%64 == 0 },
ShouldSample: func(traceID uint64) bool { return traceID%64 == 0 },
MaxLogsPerSpan: 100,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaduBerinde you changed your mind about default being infinite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as per the discussion in #38

}
// Replace the oldest event. This means we are effectively dropping one more
// event.
s.raw.Logs[0].Event = fmt.Sprintf("** dropped %d events **", s.numDroppedLogs+1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this makes more sense as a tag than a log

@jmacd
Copy link
Collaborator

jmacd commented Sep 26, 2016

@yurishkuro If it's a tag, then we don't know where the dropped logs span
begins and ends.

On Mon, Sep 26, 2016 at 2:27 PM, Yuri Shkuro notifications@github.com
wrote:

@yurishkuro commented on this pull request.

In span.go
#39 (review)
:

  •   // buffer in place, moving the first pos elements to the back. This
    
  • // algorithm is described in:
    
  • //  http://www.cplusplus.com/reference/algorithm/rotate
    
  • for first, middle, next := 0, pos, pos; first != middle; {
    
  •     s.raw.Logs[first], s.raw.Logs[next] = s.raw.Logs[next], s.raw.Logs[first]
    
  •     first++
    
  •     next++
    
  •     if next == len(s.raw.Logs) {
    
  •         next = middle
    
  •     } else if first == middle {
    
  •         middle = next
    
  •     }
    
  • }
    
  • // Replace the oldest event. This means we are effectively dropping one more
    
  • // event.
    
  • s.raw.Logs[0].Event = fmt.Sprintf("*\* dropped %d events **", s.numDroppedLogs+1)
    

perhaps this makes more sense as a tag than a log


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#39 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADdiiRwl9YyR7uTxKIEzB9KpHeTIX6JVks5quDjGgaJpZM4KG8nA
.

@RaduBerinde
Copy link
Contributor Author

Also, in some cases tags might not be visible everywhere where logs are. A prime example is NetTraceIntegrator which ignores tags (though that should be fixed IMO).

@yurishkuro
Copy link
Member

then we don't know where the dropped logs span begins and ends.

hm, if your limit is 10 and you dropped 50 logs, how exactly do you know where the dropped ones begin? My reasoning is that the number of dropped logs is an attribute of the whole span, attaching a timestamp to that information (via log) does not have a lot of meaning.

@RaduBerinde
Copy link
Contributor Author

I think what was meant was that the message makes it obvious which logs were dropped in relation to the ones which were not dropped. In the current change, the ** dropped ** message comes first so it's clear the older logs were dropped. In the proposal above, it comes in the middle; we definitely need some kind of "separator" between the oldest and the newest logs in that case.

@jmacd
Copy link
Collaborator

jmacd commented Sep 26, 2016

I was thinking that a limit serves two purposes:
(1) limit memory
(2) limit bandwidth
From the service's point of view (and network egress costs) it's useful to have a limit, and I would probably wish for it to be applied to bulk log data as well. Though I suppose if you're using bulk logs, you can limit it yourself.

As for the timestamp, my proposal was that by inserting a log record at the point were logs are dropped, at least the user can see two complete subsections of the log, the beginning up until the drop point, and the drop point to the end. I.e., there are two pieces of information, how many were dropped and which time range was dropped.

@RaduBerinde
Copy link
Contributor Author

Sure, I will apply the limit to the bulk data as well.

Also, is the default limit of 100 ok? (I don't feel strongly about any particular value, I would just avoid setting it too small given that it will change the behavior of current users)

}
// Replace the oldest event. This means we are effectively dropping one more
// event.
s.raw.Logs[0].Event = fmt.Sprintf("** dropped %d events **", s.numDroppedLogs+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, you have unlucky timing... this is not going to compile at master since the OpenTracing KV logging thing just landed today.

Thankfully, this is a good case for KV logging :)

@RaduBerinde
Copy link
Contributor Author

Are there any objections to @jmacd's proposal to drop the events in the "middle"? If not, I will work on that.

@bhs
Copy link
Contributor

bhs commented Sep 27, 2016

@RaduBerinde I think @jmacd's "middle-out" (har har har) proposal is the optimal thing if you can spare the time. We did something similar with Dapper's version of this change and it worked well.

@RaduBerinde
Copy link
Contributor Author

Updated to "middle-out" :)

One thing I'm not sure is how to "format" the "dropped logs" log record (see span.go:214).

Copy link
Contributor

@bhs bhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: I didn't even try to review the possible off-by-one cases here. Aside from that caveat, the intent (and the default of 100) LGTM.

numNew := len(s.raw.Logs) - numOld
rotateLogBuffer(s.raw.Logs[numOld:], s.numDroppedLogs%numNew)

// Replace the log in the middle (the earliest "new" log) with information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

terminology nit: can you do s/earliest/oldest/? It's an awkward term, but the only one that's unambiguous IMO. My brain things "early" means "new", though of course in this case they're opposite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

// Keep the timestamp of the last dropped event.
Timestamp: s.raw.Logs[numOld].Timestamp,
Fields: []log.Field{
otlog.String("event", "** dropped events **"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my two cents:

[]log.Field{
    otlog.String("event", "dropped Span logs"),
    otlog.Int("dropped_log_count", s.numDroppedLogs+1),
    otlog.String("component", "basictracer"),
},

The "component" thing is just an idea... I think it'll make the error easier to track down for an unsuspecting tracing system user, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: what's the +1 for here? (I just copied it from you)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding component. +1 is explained by the comment above - we are overwriting an event, hence dropping one more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the +1 right next to the comment.

@RaduBerinde RaduBerinde force-pushed the logevent-limit branch 2 times, most recently from fcd1b20 to a67d8c1 Compare September 27, 2016 10:33
@RaduBerinde
Copy link
Contributor Author

I see that Lightstep only shows the value of the event field. Should I include the number of dropped logs in the "event" string (at least for now)?

@bhs
Copy link
Contributor

bhs commented Sep 27, 2016

@RaduBerinde no, that is a lightstep limitation and it should be fixed on that side.

@bhs
Copy link
Contributor

bhs commented Sep 27, 2016

This LGTM... I'd like to wait for @yurishkuro and @jmacd to give a 👍 before merging, though.

@@ -7,6 +7,7 @@ import (
opentracing "github.com/opentracing/opentracing-go"
"github.com/opentracing/opentracing-go/ext"
"github.com/opentracing/opentracing-go/log"
otlog "github.com/opentracing/opentracing-go/log"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, fixed.

@@ -5,6 +5,7 @@ import (
"reflect"
"testing"

"github.com/cockroachdb/cockroach/util/caller"
Copy link
Member

@yurishkuro yurishkuro Sep 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we didn't pull a new dependency purely for a nicer error message that never happens in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, definitely, didn't mean to do that. goimports is enabling my habits :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to runtime.Caller()

@bhs
Copy link
Contributor

bhs commented Sep 28, 2016

@RaduBerinde you'll need to rebase this one...

@RaduBerinde
Copy link
Contributor Author

Updated. Thanks for the reviews!

@bhs
Copy link
Contributor

bhs commented Sep 29, 2016

LGTM. I'll give others until tomorrow to raise any other concerns before merging (esp @jmacd since @yurishkuro already approved the change). Thanks again!

The MaxLogsPerSpan option limits the number of logs in a Span. Events are
dropped as necessary; a log in the finished span indicates how many were
dropped.

The default limit is 100.

Fixes opentracing#38.
@bhs bhs merged commit 1b32af2 into opentracing:master Sep 29, 2016
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Sep 30, 2016
We added log limits to basictracer in opentracing/basictracer-go#39
This change sets better options for our tracers:
 - drop all logs for spans which we are not recording (those that go net/trace)
 - increase the default value of the new maxLogsPerSpan setting to 1000

Fixes cockroachdb#9529.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Oct 2, 2016
We added log limits to basictracer in opentracing/basictracer-go#39
This change sets better options for our tracers:
 - drop all logs for spans which we are not recording (those that go net/trace)
 - increase the default value of the new maxLogsPerSpan setting to 1000

Fixes cockroachdb#9529.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants