-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing support #227
tracing support #227
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems too "not invented here".
(we discussed this on a call but I misunderstood some important context) If we were trying to trace across machines, I'd understand maybe doing this ourselves. Given that we're just tracing events locally, this seems way over the top. Why not just use Zap? If the answer is "performance", have you tested that? I really doubt logging is going to be a bottleneck given encrypted transports, signing, verifying, etc. |
Unless the concern is bandwidth. That's the one thing this buys us over a general-purpose logger like zap. |
trace.go
Outdated
|
||
// Generic event tracer interface | ||
type EventTracer interface { | ||
Trace(evt interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pb.Message
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could do that to avoid the cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, it takes a *pb.TraceEvent
now.
Ok, I've discussed this with raul and looked at it a bit more. This still feels like a lot of code for something so simple but I understand why we're using protobufs (gRPC & bandwidth). |
Yes, we are trying to trace across machines, it's just that only the local tracer has been written so far. |
I'm talking about including trace IDs in messages. However, message IDs should give us that anyways. |
The peer ID of the originating peer is included in the message. |
Rebased on master. |
forgotten!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After my initial "why are we doing this" reaction, this LGTM (modulo a few questions/nits).
} | ||
|
||
func (fs *FloodSubRouter) AddPeer(peer.ID, protocol.ID) {} | ||
func (fs *FloodSubRouter) AddPeer(p peer.ID, proto protocol.ID) { | ||
fs.tracer.AddPeer(p, proto) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in the router or in the part that calls AddPeer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have it in the control substrate, it just felt more natural to do it in the router.
Other than that there is no particular reason for the choice.
|
||
func (fs *FloodSubRouter) RemovePeer(peer.ID) {} | ||
func (fs *FloodSubRouter) RemovePeer(p peer.ID) { | ||
fs.tracer.RemovePeer(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, it could be in either place, just felt more natural to do it in the router.
@@ -0,0 +1,290 @@ | |||
package pubsub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking, this is moving to a new repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep this file here, so that we have a one-stop construction of the pubsub system without requiring an external dependency.
Don't accumulate memory if the tracer is being slow or unavailable, just drop the trace and log.
tracer.go
Outdated
} | ||
|
||
// wait a bit to accumulate a batch | ||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could end up with a pretty large buffer this way and/or drop messages. Is there a better way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check the size of the buffer and sleep only if the batch is smaller than a threshold.
We can also reduce the sleep time and poll a few times, say up to a second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check for the size of the batch, and polling every 100ms (for up to 1s) in order to accumulate buffer in safer way.
this avoids holding on memory while we are waiting.
Instead check the batch size and poll every 100ms (up to 1s) until the minimum batch size is accumulated.
that way we don't have to connect every time we open the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits but LGTM
tracer.go
Outdated
tr := &RemoteTracer{ctx: ctx, host: host, pi: pi, basicTracer: basicTracer{ch: make(chan struct{}, 1), lossy: true}} | ||
tr := &RemoteTracer{ctx: ctx, host: host, peer: pi.ID, basicTracer: basicTracer{ch: make(chan struct{}, 1), lossy: true}} | ||
for _, addr := range pi.Addrs { | ||
host.Peerstore().AddAddr(pi.ID, addr, peerstore.PermanentAddrTTL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
host.Peerstore().AddAddrs(pi.ID, pi.Addrs, ...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we can do them all at once)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good point; will do.
t.mx.Unlock() | ||
time.Sleep(100 * time.Millisecond) | ||
goto again | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for loop?
t.mx.Lock()
for len(t.buf) < MinTraceBatchSize && time.Now().Before(deadline) {
t.mx.Unlock()
time.Sleep(100 * time.Millisecond)
t.mx.Lock()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the alergy to gotos strikes again! Ok, I'll write it as a for loop.
Hey, would’ve liked to review this properly before it was merged to master :-( Didn’t get a chance due to other things, but I’d have appreciated a mention and a heads-up warning that the merge train was leaving. |
@raulk you were extremely busy, so didn't want to burden you. |
Currently only the tracing scaffolding, but this should be enough for initial review pass.TBD: