Skip to content
This repository has been archived by the owner on Jul 1, 2022. It is now read-only.

Handle full queue in RemoteReporter #180

Merged
merged 7 commits into from
May 14, 2017

Conversation

mabn
Copy link
Contributor

@mabn mabn commented May 13, 2017

This PR:

Currently when RemoteReporter's queue is full all 3 operations (append, close, flush) throw the exception:

java.lang.IllegalStateException: Queue full
at java.util.AbstractQueue.add(AbstractQueue.java:98)
at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:312)

I've changed all 3 to use offer instead which does not block/throw. It's a clearly good choice for appendand flush, but I'm not sure if in case of close this is the best approach. I made close wait for a short time (100ms) and then just ignore the QueueProcessor thread, but maybe it should throw to indicate that it failed? Or block indefinitely?

There's one more issue I just spotted - updating "reporterQueueLength" metric happens inside FlushCommand - but when the queue is full this command is almost never queued. E.g. when it has compete for a flow with 99 AppendCommands then it will run only once in 100 attempts.
This would explain the weird plot I have when max size was set to 100
Alt text

To fix that I moved reporterQueueLength update to the flush() method triggered by the TimerTask

Marcin Biegan added 2 commits May 13, 2017 17:39
The check “commandQueue.size() == maxQueueSize” is not thread safe, this lead to:
java.lang.IllegalStateException: Queue full

It’s better to .offer() and check the returned value
When the command queue is full it’s not possible to queue control
commands - FlushCommand or CloseCommand. They were throwing exceptions.

This change replaces BLockingQueue.add with BlockingQueue.offer to
handle this situation.

I’ve modified test InMemorySender to allow it to block. This makes it
possible to easily fill the RemoteReporter’s command queue in tests.
@@ -62,7 +68,7 @@ public void setUp() throws Exception {
public void testRemoteReporterReport() throws Exception {
Span span = (Span) tracer.buildSpan("raza").start();
reporter.report(span);
Thread.sleep(5);
Thread.sleep(50);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I increased it because it was failing every time on my laptop. There's already a PR to make it right: #177

@codecov-io
Copy link

codecov-io commented May 13, 2017

Codecov Report

Merging #180 into master will increase coverage by 0.6%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##             master     #180     +/-   ##
===========================================
+ Coverage     80.85%   81.45%   +0.6%     
- Complexity      459      463      +4     
===========================================
  Files            77       77             
  Lines          1786     1790      +4     
  Branches        208      210      +2     
===========================================
+ Hits           1444     1458     +14     
+ Misses          252      244      -8     
+ Partials         90       88      -2
Impacted Files Coverage Δ Complexity Δ
...java/com/uber/jaeger/reporters/RemoteReporter.java 89.55% <100%> (+8.59%) 8 <5> (+3) ⬆️
.../uber/jaeger/samplers/RemoteControlledSampler.java 92.95% <0%> (+1.4%) 19% <0%> (+1%) ⬆️
...jaeger/reporters/protocols/ThriftUdpTransport.java 86.44% <0%> (+3.38%) 16% <0%> (ø) ⬇️
...c/main/java/com/uber/jaeger/senders/UdpSender.java 74.54% <0%> (+3.63%) 11% <0%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 797ef4f...e3653d1. Read the comment docs.


/**
* InMemorySender "appends" span only earlier "append" was permitted. Appending is a blocking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean "only if earlier"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'd rather change it to:

Sender which stores spans in memory. Appending a new span is a blocking operation unless "permitted". By default Integer.MAX_VALUE "appends" are permitted.

try {
// best-effort: if we can't add CloseCommand in this time then it probably will never happen
closeCommandAdded = commandQueue
.offer(new CloseCommand(), CLOSE_ENQUEUE_TIMEOUT_MILLIS, TimeUnit.MILLISECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Only I would increase the timeout to 1sec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


/**
* InMemorySender "appends" span only if the "append" was permitted. Otherwise append blocks
* until it it permitted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment not needed here

}

// When: at this point the queue is full or there is one slot empty (if the worker thread has
// already picked up some command). We add two spans to make sure that we overfill the queue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the workers are still reading from the queue, then it looks like you have a race condition and no guarantee that the next two commands would actually cause a queue overflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I changed InMemorySender to become blocking:

// change sender to blocking mode
sender.permitAppend(0);

Thanks to this nothing is processing the queue - QueueProcessor picks only 1 AppendCommand and blocks on the append call.
So this test:

  1. adds maxQueueSize spans to the queue
  2. QueueProcessor takes 1 commands and blocks
  3. adds additional span to make the queue full
  4. adds one more span and it is dropped without throwing exception

Point 2 sometimes might not happen (due to scheduling) - then both span from points 3) and 4) are dropped.

I tried to come with some simple way of testing various cases without possibility of race conditions - adding that semaphore/permits to InMemorySender was the simplest solution I could find.

When the command queue is full further FlushCommands are not enqueued
and queue length metric does not update. I moved it outside the
command - now it updates whenever scheduled flush() executes.
}

@Override
public void close() {
commandQueue.add(new CloseCommand());
boolean closeCommandAdded = false;
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was just having a second look - why do we need a separate try/catch? Seems like you can just call offer in L102

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - removed

@yurishkuro yurishkuro merged commit 4c06ea6 into jaegertracing:master May 14, 2017
@yurishkuro
Copy link
Member

Thanks, @mabn

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RemoteReporter throws exception when commandQueue is full Error handing missing for RemoteReporter TimerTask
3 participants