tracing support #227

vyzo · 2019-11-04T17:23:10Z

~~Currently only the tracing scaffolding, but this should be enough for initial review pass.~~

TBD:

implement tracer methods with particular event types
implement tracing to file
implement tracing to peer

Stebalien

This seems too "not invented here".

Stebalien · 2019-11-12T22:08:30Z

(we discussed this on a call but I misunderstood some important context)

If we were trying to trace across machines, I'd understand maybe doing this ourselves. Given that we're just tracing events locally, this seems way over the top. Why not just use Zap?

If the answer is "performance", have you tested that? I really doubt logging is going to be a bottleneck given encrypted transports, signing, verifying, etc.

Stebalien · 2019-11-13T00:14:15Z

Unless the concern is bandwidth. That's the one thing this buys us over a general-purpose logger like zap.

(not going to block this)

Stebalien · 2019-11-13T00:39:43Z

trace.go

+
+// Generic event tracer interface
+type EventTracer interface {
+	Trace(evt interface{})


pb.Message?

Yeah, we could do that to avoid the cast.

done, it takes a *pb.TraceEvent now.

Stebalien · 2019-11-13T00:48:46Z

Ok, I've discussed this with raul and looked at it a bit more. This still feels like a lot of code for something so simple but I understand why we're using protobufs (gRPC & bandwidth).

vyzo · 2019-11-13T08:40:38Z

Yes, we are trying to trace across machines, it's just that only the local tracer has been written so far.
Tracing to peer (and gRPC based tracing) are forthcoming.
Also note, bandwidth is a real concern here.

Stebalien · 2019-11-13T14:05:34Z

Yes, we are trying to trace across machines

I'm talking about including trace IDs in messages. However, message IDs should give us that anyways.

vyzo · 2019-11-13T14:13:55Z

I'm talking about including trace IDs in messages.

The peer ID of the originating peer is included in the message.

vyzo · 2019-11-15T18:41:21Z

Rebased on master.

forgotten!

Stebalien

After my initial "why are we doing this" reaction, this LGTM (modulo a few questions/nits).

Stebalien · 2019-11-18T17:59:05Z

floodsub.go

 }

-func (fs *FloodSubRouter) AddPeer(peer.ID, protocol.ID) {}
+func (fs *FloodSubRouter) AddPeer(p peer.ID, proto protocol.ID) {
+	fs.tracer.AddPeer(p, proto)


Should this be in the router or in the part that calls AddPeer?

We could have it in the control substrate, it just felt more natural to do it in the router.
Other than that there is no particular reason for the choice.

Stebalien · 2019-11-18T17:59:09Z

floodsub.go


-func (fs *FloodSubRouter) RemovePeer(peer.ID) {}
+func (fs *FloodSubRouter) RemovePeer(p peer.ID) {
+	fs.tracer.RemovePeer(p)


same, it could be in either place, just felt more natural to do it in the router.

Stebalien · 2019-11-18T18:04:15Z

tracer.go

@@ -0,0 +1,290 @@
+package pubsub


Double checking, this is moving to a new repo?

I think we can keep this file here, so that we have a one-stop construction of the pubsub system without requiring an external dependency.

Don't accumulate memory if the tracer is being slow or unavailable, just drop the trace and log.

tracer.go

Stebalien · 2019-11-18T21:34:43Z

tracer.go

+		}
+
+		// wait a bit to accumulate a batch
+		time.Sleep(time.Second)


We could end up with a pretty large buffer this way and/or drop messages. Is there a better way to do this?

We could check the size of the buffer and sleep only if the batch is smaller than a threshold.
We can also reduce the sleep time and poll a few times, say up to a second.

I added a check for the size of the batch, and polling every 100ms (for up to 1s) in order to accumulate buffer in safer way.

tracer.go

this avoids holding on memory while we are waiting.

Instead check the batch size and poll every 100ms (up to 1s) until the minimum batch size is accumulated.

that way we don't have to connect every time we open the stream.

Stebalien

nits but LGTM

Stebalien · 2019-11-18T23:58:59Z

tracer.go

-	tr := &RemoteTracer{ctx: ctx, host: host, pi: pi, basicTracer: basicTracer{ch: make(chan struct{}, 1), lossy: true}}
+	tr := &RemoteTracer{ctx: ctx, host: host, peer: pi.ID, basicTracer: basicTracer{ch: make(chan struct{}, 1), lossy: true}}
+	for _, addr := range pi.Addrs {
+		host.Peerstore().AddAddr(pi.ID, addr, peerstore.PermanentAddrTTL)


host.Peerstore().AddAddrs(pi.ID, pi.Addrs, ...)?

(we can do them all at once)

ah, good point; will do.

Stebalien · 2019-11-19T00:01:34Z

tracer.go

+			t.mx.Unlock()
+			time.Sleep(100 * time.Millisecond)
+			goto again
+		}


for loop?

t.mx.Lock() for len(t.buf) < MinTraceBatchSize && time.Now().Before(deadline) { t.mx.Unlock() time.Sleep(100 * time.Millisecond) t.mx.Lock() }

the alergy to gotos strikes again! Ok, I'll write it as a for loop.

raulk · 2019-11-19T13:50:34Z

Hey, would’ve liked to review this properly before it was merged to master :-( Didn’t get a chance due to other things, but I’d have appreciated a mention and a heads-up warning that the merge train was leaving.

vyzo · 2019-11-19T14:42:14Z

@raulk you were extremely busy, so didn't want to burden you.
Feel free to make comments about potential improvements, and we can address them.
Note that the grpc trace client will live out of tree, as we don't want a dependency on grpc for mainline.

vyzo requested review from raulk and Stebalien November 4, 2019 17:23

Stebalien previously requested changes Nov 12, 2019

View reviewed changes

Stebalien reviewed Nov 13, 2019

View reviewed changes

vyzo force-pushed the feat/tracing branch from d33964a to 2fc5518 Compare November 15, 2019 18:41

vyzo added 17 commits November 15, 2019 20:42

tracing scaffolding

67275a6

trace publish

89c7ed4

add tracing to floodsub/randomsub

fd73973

remove useless nil check when initializing subsystem tracers

958e09a

initialize tracer with peer ID, trace RPC from join/leave announcements

fb11aa9

trace event protobuf

0a25f24

some minor fixes in trace pb

040cabe

implement tracing details

151ec25

add traces for send/drop rpc

ae0fcc6

forgotten!

track topics in message tracing

3f30acd

json tracer

3545acf

move tracer implementation to its own file

8ff321c

add protobuf file tracer

f134d65

use *pb.TraceEvent as argument for Trace in the EventTracer interface

0aa629c

remote tracer

57ea27e

remote tracer: wait a second to accumulate batches

2fc5518

remove CompressedTraceEventBatch from trace protobuf

db8e219

vyzo added 2 commits November 18, 2019 17:13

compress entire stream in remote tracer

abe4763

reset remote tracer stream on errors

cce30a4

vyzo mentioned this pull request Nov 18, 2019

initial traced implementation libp2p/go-libp2p-pubsub-tracer#1

Merged

Stebalien reviewed Nov 18, 2019

View reviewed changes

lossy tracing for remote tracers

91527e2

Don't accumulate memory if the tracer is being slow or unavailable, just drop the trace and log.

vyzo changed the title ~~[WIP] tracing support~~ tracing support Nov 18, 2019

Stebalien requested changes Nov 18, 2019

View reviewed changes

vyzo added 4 commits November 19, 2019 00:07

move niling of trace buffer to the end

24a1181

this avoids holding on memory while we are waiting.

don't blanket wait for 1s to accumulate a batch.

7a5aaa8

Instead check the batch size and poll every 100ms (up to 1s) until the minimum batch size is accumulated.

store the remote trace peer address in the peerstore

40e5a49

that way we don't have to connect every time we open the stream.

make tracer.Close safer

cd7f42e

vyzo requested a review from Stebalien November 18, 2019 22:31

Stebalien approved these changes Nov 19, 2019

View reviewed changes

nits and beauty

7065297

vyzo merged commit 01b9825 into master Nov 19, 2019

vyzo deleted the feat/tracing branch November 19, 2019 00:50

adam-hanna mentioned this pull request Nov 19, 2019

Incorporate new tracer feature agencyenterprise/go-libp2p-pubsub-benchmark-tools#18

Open

raulk mentioned this pull request Dec 2, 2019

Document how to enable the tracer in the README #238

Closed

araskachoi mentioned this pull request Dec 4, 2019

Rerun tests using different MQ Size whiteblock/gossipsub-testing#15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracing support #227

tracing support #227

vyzo commented Nov 4, 2019 •

edited

Loading

Stebalien left a comment

Stebalien commented Nov 12, 2019

Stebalien commented Nov 13, 2019

Stebalien Nov 13, 2019

vyzo Nov 13, 2019

vyzo Nov 13, 2019

Stebalien commented Nov 13, 2019

vyzo commented Nov 13, 2019

Stebalien commented Nov 13, 2019

vyzo commented Nov 13, 2019

vyzo commented Nov 15, 2019

Stebalien left a comment

Stebalien Nov 18, 2019

vyzo Nov 18, 2019

Stebalien Nov 18, 2019

vyzo Nov 18, 2019

Stebalien Nov 18, 2019

vyzo Nov 18, 2019

Stebalien Nov 18, 2019

vyzo Nov 18, 2019

vyzo Nov 18, 2019

Stebalien left a comment

Stebalien Nov 18, 2019

Stebalien Nov 18, 2019

vyzo Nov 19, 2019

Stebalien Nov 19, 2019

vyzo Nov 19, 2019

raulk commented Nov 19, 2019

vyzo commented Nov 19, 2019 •

edited

Loading

tracing support #227

tracing support #227

Conversation

vyzo commented Nov 4, 2019 • edited Loading

Stebalien left a comment

Choose a reason for hiding this comment

Stebalien commented Nov 12, 2019

Stebalien commented Nov 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien commented Nov 13, 2019

vyzo commented Nov 13, 2019

Stebalien commented Nov 13, 2019

vyzo commented Nov 13, 2019

vyzo commented Nov 15, 2019

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulk commented Nov 19, 2019

vyzo commented Nov 19, 2019 • edited Loading

vyzo commented Nov 4, 2019 •

edited

Loading

vyzo commented Nov 19, 2019 •

edited

Loading