-
Notifications
You must be signed in to change notification settings - Fork 289
Conversation
This feature didn't work properly what spawned errors and data overwriting before spans was sent. ``` WARNING: DATA RACE Write at 0x00c4202223d8 by goroutine 18: github.com/uber/jaeger-client-go.(*Tracer).newSpan() github.com/uber/jaeger-client-go/tracer.go:357 +0xff github.com/uber/jaeger-client-go.(*Tracer).startSpanWithOptions() github.com/uber/jaeger-client-go/tracer.go:289 +0xbdd github.com/uber/jaeger-client-go.(*Tracer).StartSpan() github.com/uber/jaeger-client-go/tracer.go:200 +0x17d github.com/uber/jaeger-client-go.TestRemoteReporterAppendWithPollAllocator() ``` Signed-off-by: Dmitry Ponomarev <demdxx@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #381 +/- ##
==========================================
+ Coverage 87.8% 88.27% +0.47%
==========================================
Files 54 55 +1
Lines 3033 3061 +28
==========================================
+ Hits 2663 2702 +39
+ Misses 264 255 -9
+ Partials 106 104 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I think it's a good direction, but I am bothered by so many changes to the tests to Retain/Release, I don't understand why they are necessary (given that the default allocator is not pooling), and they make the tests harder to read.
tracer.options.poolSpans = poolSpans | ||
if poolSpans { | ||
tracer.spanAllocator = newSpanSyncPool() | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
not needed, constructor defaults to this anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to save the current functionality of PoolSpans
, otherwise need to make checks inside the option that the allocators are the same what is more complicated. So I believe that current overhead easier for understanding.
In case if PoolSpans()
will be called several times it won't be a big problem.
@yurishkuro I tried don't broke the logic that you have right now, exactly that There were two options on how to fix it: After adding I will add another test with Span lifecycle into the span_test.go, also I'm open to suggestions on how this can be solved. |
I understand that introducing life cycle management for spans will require changes in the reporters. When a reporter receives a span, it needs to call Retain() and then Release() at the end of async processing. In case of RemoteReporter span can be released after converting it to thrift. In case of in memory reporter, span can be released after it is purged from cache. But the latter should not have any effect on the unit tests, since they rely on the fact that span is being retained by in memory reporter. |
BTW I am extremely interested in this change. Could you elaborate on your performance goals?
|
Ok then, I can use
Of course, there is no big overhead, especially because we will use custom
|
See the pull comments jaegertracing#381 (comment) Signed-off-by: Dmitry Ponomarev <demdxx@gmail.com>
@yurishkuro check this out demdxx@6b2d873 |
@demdxx the reason I am concerned with Retain/Release in the tests is that the tests often represent the use by an external user, who interacts with Jaeger lib via the OpenTracing API, which does not expose Retain/Release.
It only becomes invalid if you pool the span, that most of the tests don't do. The default mode is not to pool spans. |
TestRemoteReporterAppendWithPollAllocator => TestRemoteReporterAppendWithPoolAllocator Revert tests related to span allocation Retain/Release functions See the pull comments jaegertracing#381 (comment) Revert the tests to make them work by original logic Revert transport and tracer tests Signed-off-by: Dmitry Ponomarev <demdxx@gmail.com>
TestRemoteReporterAppendWithPollAllocator => TestRemoteReporterAppendWithPoolAllocator Revert tests related to span allocation Retain/Release functions See the pull comments jaegertracing#381 (comment) Revert the tests to make them work by original logic Revert transport and tracer tests Signed-off-by: Dmitry Ponomarev <demdxx@gmail.com>
reporter.go
Outdated
@@ -122,7 +123,14 @@ func (r *InMemoryReporter) GetSpans() []opentracing.Span { | |||
func (r *InMemoryReporter) Reset() { | |||
r.lock.Lock() | |||
defer r.lock.Unlock() | |||
r.spans = nil | |||
|
|||
if len(r.spans) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary if(), just run the loop
reporter_test.go
Outdated
@@ -301,6 +312,10 @@ func (s *fakeSender) Append(span *Span) (int, error) { | |||
s.mutex.Lock() | |||
defer s.mutex.Unlock() | |||
|
|||
// Validation of span | |||
if span.tracer == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when can this happen? Maybe better to panic?
span.go
Outdated
|
||
// retainCounter used to increase the lifetime of | ||
// the object before return it into the pool. | ||
retainCounter int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic vars must be placed at the top of the struct to ensure word alignment
span.go
Outdated
@@ -225,18 +234,66 @@ func (s *Span) OperationName() string { | |||
return s.operationName | |||
} | |||
|
|||
// Retain increases object counter to increase the lifetime of the object | |||
func (s *Span) Retain(count ...int32) *Span { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the vararg argument really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, It was needed for the first commit implementation.
Thank you, I will remove it
span.go
Outdated
} | ||
|
||
if atomic.AddInt32(&s.retainCounter, counter) < 0 { | ||
if tr := s.tracer; tr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a race condition to me: the counter can go up after the check.
Also, why do we need to be checking for s.tracer
? It seems like it's trying to compensate for some other issue. I think the returning to the pool should be based purely on the ref counting.
span.go
Outdated
@@ -174,6 +179,9 @@ func (s *Span) BaggageItem(key string) string { | |||
} | |||
|
|||
// Finish implements opentracing.Span API | |||
// After finishing of the Span object it returns back to the allocator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless reporter retains it again
span_allocator.go
Outdated
IsPool() bool | ||
} | ||
|
||
type spanSyncPool struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syncPollSpanAllocator
span_allocator.go
Outdated
return true | ||
} | ||
|
||
type spanSimpleAllocator struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simpleSpanAllocator
|
||
func (pool spanSimpleAllocator) Put(span *Span) { | ||
// https://github.com/jaegertracing/jaeger-client-go/pull/381#issuecomment-475904351 | ||
// span.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what the link is trying to say. I would add a comment // since finished spans are not reused, no need to reset them
span_allocator.go
Outdated
type SpanAllocator interface { | ||
Get() *Span | ||
Put(*Span) | ||
IsPool() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? It's not used in any biz logic, only in the test, which can just check for type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
* TestRemoteReporterAppendWithPollAllocator => TestRemoteReporterAppendWithPoolAllocator * Revert tests related to span allocation Retain/Release functions * See the pull comments jaegertracing#381 (comment) * Revert the tests to make them work by original logic * Revert transport and tracer tests * Fixes after code review jaegertracing#381 (review) Signed-off-by: Dmitry Ponomarev <demdxx@gmail.com>
@yurishkuro can we merge the PR? func BenchmarkSpanCommon(b *testing.B) {
service := "DOOP"
tracer, closer := NewTracer(service, NewConstSampler(true), NewNullReporter(), TracerOptions.PoolSpans(true))
defer closer.Close()
sub := func(name string, span opentracing.Span, cb func(span opentracing.Span)) {
subSpan := tracer.StartSpan(name, opentracing.ChildOf(span.Context()))
if cb != nil {
cb(subSpan)
}
subSpan.Finish()
}
b.ReportAllocs()
b.ResetTimer()
b.RunParallel(func(bp *testing.PB) {
for bp.Next() {
var wg sync.WaitGroup
span := tracer.StartSpan("base")
wg.Add(1)
go sub("sub1", span, func(span1 opentracing.Span) {
span1.SetTag("level", 1)
span1.LogEvent("enter.sub1")
span1.SetOperationName("sub1")
// span1.(*Span).Retain()
go sub("sub1.1", span1, func(span11 opentracing.Span) {
span11.SetTag("level", 2)
// span1.(*Span).Release()
wg.Done()
})
})
wg.Add(2)
go sub("sub2", span, func(span2 opentracing.Span) {
span2.LogEvent("enter.sub2")
span2.SetBaggageItem("baggage2", "v1")
sub("sub2.1", span2, func(span21 opentracing.Span) {
span21.SetTag("level", 2)
wg.Done()
})
sub("sub2.2", span2, func(span22 opentracing.Span) {
span22.SetTag("level", 2)
wg.Done()
})
})
wg.Add(3)
go sub("sub3", span, func(span3 opentracing.Span) {
span3.LogEvent("enter.sub3")
span3.SetBaggageItem("baggage3", "v3")
// span3.(*Span).Retain()
go sub("sub3.1", span3, func(span31 opentracing.Span) {
// span3.(*Span).Release()
span31.SetTag("level", 2)
sub("sub3.1.1", span31, func(span311 opentracing.Span) {
span311.SetTag("level", 3)
wg.Done()
})
})
sub("sub3.2", span3, func(span32 opentracing.Span) {
span32.LogEvent("enter.sub3.2")
wg.Done()
})
sub("sub3.3", span3, func(span33 opentracing.Span) {
span33.LogEvent("enter.sub3.3")
wg.Done()
})
})
wg.Wait()
span.Finish()
}
})
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Sorry for delay.
Which problem is this PR solving?
Allocation in the pool didn't work properly because of tracer tried to reuse the span object before processing finishing, what spawned errors and data overwriting before spans were sent.
Short description of the changes
.Retain()
and.Release()
methods.remoteReporter
what became the reason for racing in the spans allocations.