-
Notifications
You must be signed in to change notification settings - Fork 231
Conversation
The check “commandQueue.size() == maxQueueSize” is not thread safe, this lead to: java.lang.IllegalStateException: Queue full It’s better to .offer() and check the returned value
When the command queue is full it’s not possible to queue control commands - FlushCommand or CloseCommand. They were throwing exceptions. This change replaces BLockingQueue.add with BlockingQueue.offer to handle this situation. I’ve modified test InMemorySender to allow it to block. This makes it possible to easily fill the RemoteReporter’s command queue in tests.
@@ -62,7 +68,7 @@ public void setUp() throws Exception { | |||
public void testRemoteReporterReport() throws Exception { | |||
Span span = (Span) tracer.buildSpan("raza").start(); | |||
reporter.report(span); | |||
Thread.sleep(5); | |||
Thread.sleep(50); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I increased it because it was failing every time on my laptop. There's already a PR to make it right: #177
Codecov Report
@@ Coverage Diff @@
## master #180 +/- ##
===========================================
+ Coverage 80.85% 81.45% +0.6%
- Complexity 459 463 +4
===========================================
Files 77 77
Lines 1786 1790 +4
Branches 208 210 +2
===========================================
+ Hits 1444 1458 +14
+ Misses 252 244 -8
+ Partials 90 88 -2
Continue to review full report at Codecov.
|
|
||
/** | ||
* InMemorySender "appends" span only earlier "append" was permitted. Appending is a blocking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean "only if earlier"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'd rather change it to:
Sender which stores spans in memory. Appending a new span is a blocking operation unless "permitted". By default Integer.MAX_VALUE "appends" are permitted.
try { | ||
// best-effort: if we can't add CloseCommand in this time then it probably will never happen | ||
closeCommandAdded = commandQueue | ||
.offer(new CloseCommand(), CLOSE_ENQUEUE_TIMEOUT_MILLIS, TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. Only I would increase the timeout to 1sec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
/** | ||
* InMemorySender "appends" span only if the "append" was permitted. Otherwise append blocks | ||
* until it it permitted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment not needed here
} | ||
|
||
// When: at this point the queue is full or there is one slot empty (if the worker thread has | ||
// already picked up some command). We add two spans to make sure that we overfill the queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the workers are still reading from the queue, then it looks like you have a race condition and no guarantee that the next two commands would actually cause a queue overflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I changed InMemorySender to become blocking:
// change sender to blocking mode
sender.permitAppend(0);
Thanks to this nothing is processing the queue - QueueProcessor picks only 1 AppendCommand and blocks on the append call.
So this test:
- adds maxQueueSize spans to the queue
- QueueProcessor takes 1 commands and blocks
- adds additional span to make the queue full
- adds one more span and it is dropped without throwing exception
Point 2 sometimes might not happen (due to scheduling) - then both span from points 3) and 4) are dropped.
I tried to come with some simple way of testing various cases without possibility of race conditions - adding that semaphore/permits to InMemorySender was the simplest solution I could find.
When the command queue is full further FlushCommands are not enqueued and queue length metric does not update. I moved it outside the command - now it updates whenever scheduled flush() executes.
} | ||
|
||
@Override | ||
public void close() { | ||
commandQueue.add(new CloseCommand()); | ||
boolean closeCommandAdded = false; | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was just having a second look - why do we need a separate try/catch? Seems like you can just call offer
in L102
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - removed
Thanks, @mabn |
This PR:
Currently when RemoteReporter's queue is full all 3 operations (append, close, flush) throw the exception:
I've changed all 3 to use
offer
instead which does not block/throw. It's a clearly good choice forappend
andflush
, but I'm not sure if in case ofclose
this is the best approach. I madeclose
wait for a short time (100ms) and then just ignore theQueueProcessor
thread, but maybe it should throw to indicate that it failed? Or block indefinitely?There's one more issue I just spotted - updating "reporterQueueLength" metric happens inside
FlushCommand
- but when the queue is full this command is almost never queued. E.g. when it has compete for a flow with 99AppendCommand
s then it will run only once in 100 attempts.This would explain the weird plot I have when max size was set to 100
To fix that I moved
reporterQueueLength
update to theflush()
method triggered by theTimerTask