Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18838][CORE] Introduce multiple queues in LiveListenerBus #18253

Closed
wants to merge 11 commits into from
Closed

[SPARK-18838][CORE] Introduce multiple queues in LiveListenerBus #18253

wants to merge 11 commits into from

Conversation

bOOm-X
Copy link

@bOOm-X bOOm-X commented Jun 9, 2017

What changes were proposed in this pull request?

In this PR the single queue of the LiveListenerBus was replaced by multiple independent queues.

  • The EventLoggingListener was put in an independent queue (it was the most time consuming listener).
  • The executorAllocationManager was put in an independent qeue (it is very impacted by event drop of the main queue due to other listener impacts, so isolation purpose)
  • The UI listeners were put in the same independent queue (One of them was very time consuming, we want no impact of the GUI on the other listener)
  • The "extralisteners" were put in the same independent queue, for isolation purpose
  • The streamingListenerBus was put in an independent queue. this llistener is a bus too, and the processing time of its listeners can be quite significant

The queue and its processing thread have been extracted from LiveListenerBus into a class BusQueue. The definition of most of the methods of ListenerBus has been extracted into a trait WithListenerBus. The LiveListenerBus implements it directly. It hold the "default" queue associated with a group of listeners and a list of queues. The method addListener of the WithListenerBus has a new optional boolean parameter (default value false) to require a independent queue for this listener instead of the default one. This parameter is ignored in the default implementation (inListenerBus) .
A listener which is also a set of listeners has been added. It allows to keep the current behavior for a group of dependent listeners or the default queue. It handles the per listener metrics.
The methods addProcessor and removeProcessor have been added to LiveListenerBus to be able to add message processing at the super type SparkListenerEvent in addition to the per event type processing of the listener interface.

How was this patch tested?

utest + manual tests on the cluster

@bOOm-X
Copy link
Author

bOOm-X commented Jun 13, 2017

@vanzin @cloud-fan Can I have a review on this new PR ?

@vanzin
Copy link
Contributor

vanzin commented Jun 13, 2017

I'm busy, but I'll get to it eventually. You could at least write a proper commit summary in the meantime.

@vanzin
Copy link
Contributor

vanzin commented Jun 13, 2017

ok to test

@SparkQA
Copy link

SparkQA commented Jun 13, 2017

Test build #78001 has finished for PR 18253 at commit fc6f609.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@bOOm-X
Copy link
Author

bOOm-X commented Jun 15, 2017

Ok I rebased & updated my commit message.
I will update the PR message and push more commit to use what is done.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78111 has finished for PR 18253 at commit 97cb911.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78116 has finished for PR 18253 at commit 18cb952.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78117 has finished for PR 18253 at commit 52505d9.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jun 15, 2017

I took a quick look and this does indeed look very much like work in progress. I also have a feeling that it's way over-engineered; there's a lot of base classes that are not that interesting, for example:

  • why SynchronousListenerBus? It doesn't seem to add anything interesting over the existing interface.
  • the whole "group of listeners" abstraction seems unnecessary. As far as I see it can be folded into the "listener queue" concept - a queue feeds events to a collection of listeners.

Changing "post" to "postToAll" as part of this change also is adding a lot of unnecessary noise. I'm not a fan of the current class hierarchy of the listener bus and I think that change makes sense, but at the same time it should be done separately since it's distracting here.

I also saw methods that are not fully implemented in the code, so I assume you're still working on this.

I'd also like to see better justification for your custom queue implementation. Have you identified the use of BlockingQueue as a hotspot in the current code? My main worry is the sleep; if you're unlucky, your queue will always be 20ms behind in processing events and may suffer if there's a sudden burst while it's sleeping. You can probably squeeze better performance out of BlockingQueue without having to write your own - e.g. by using drainTo instead of reading events one by one.

The approach in #16291 had a lot of good things going for it, and mostly needed some clean up (and be modified to only change the live listener bus, and not the replay one). Your current approach seems a lot more complicated than that.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78128 has finished for PR 18253 at commit a436b09.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78134 has finished for PR 18253 at commit 9427be0.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78136 has finished for PR 18253 at commit d429be6.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2017

Test build #78138 has finished for PR 18253 at commit 1fe9161.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SparkListenerBlockUpdated(blockUpdatedInfo: BlockUpdatedInfo)

@SparkQA
Copy link

SparkQA commented Jun 16, 2017

Test build #78139 has finished for PR 18253 at commit 123aa92.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2017

Test build #78141 has finished for PR 18253 at commit b73b020.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2017

Test build #78146 has finished for PR 18253 at commit a239f83.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2017

Test build #78165 has finished for PR 18253 at commit 24721b9.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2017

Test build #78167 has finished for PR 18253 at commit cafcd96.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2017

Test build #78168 has finished for PR 18253 at commit 13003b2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2017

Test build #78169 has finished for PR 18253 at commit d3d2cbe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2017

Test build #78186 has finished for PR 18253 at commit be3d560.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 17, 2017

Test build #78205 has finished for PR 18253 at commit 7a5df2d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2017

Test build #78219 has finished for PR 18253 at commit 4596c61.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2017

Test build #78225 has finished for PR 18253 at commit 965b105.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2017

Test build #78226 has finished for PR 18253 at commit 16ba70a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bOOm-X bOOm-X changed the title [WIP][SPARK-18838][CORE] Introduce multiple queues in LiveListenerBus [SPARK-18838][CORE] Introduce multiple queues in LiveListenerBus Jun 20, 2017
@bOOm-X
Copy link
Author

bOOm-X commented Jun 20, 2017

@vanzin Ok it is ready now.

why SynchronousListenerBus? It doesn't seem to add anything interesting over the existing interface.

I removed it.

the whole "group of listeners" abstraction seems unnecessary. As far as I see it can be folded into the "listener queue" concept - a queue feeds events to a collection of listeners.

I simplified it and add usages in the other commits. It is basically usefull to hold the metrics, and I need a common way to add a group of dependent listeners to the LiveListenerBus and the ReplayBus.

Changing "post" to "postToAll" as part of this change also is adding a lot of unnecessary noise. I'm not a fan of the current class hierarchy of the listener bus and I think that change makes sense, but at the same time it should be done separately since it's distracting here

100% agree. I removed it

I'd also like to see better justification for your custom queue implementation. Have you identified the use of BlockingQueue as a hotspot in the current code?

This implementation had 2 advantages: it is a 1 producer - 1 consumer queue whereas the BlockingQueue is a n producers - m consumers. So it use much less synchronization. The other advantage (the main one) is that no object is created for each message added to the queue. So it produces a lot less garbage. More independent queues we have, more it is significant.

My main worry is the sleep; if you're unlucky, your queue will always be 20ms behind in processing events and may suffer if there's a sudden burst while it's sleeping.

I change it to 1 ms instead of 20ms. This time is much less than the average processing time of the fastest listener (around 5 ms for the HeartbeatListener). It is just to force the consumer thread to escape in case of empty queue to give more chance to the producer thread to be scheduled. I can remove it if you want.

@vanzin
Copy link
Contributor

vanzin commented Jun 20, 2017

This implementation had 2 advantages... it use much less synchronization.

Yes, but have you quantified how much you win with that? If the blocking queue approach has enough throughput for the listener bus, it's safer to use it.

is that no object is created for each message added to the queue

Well, you could use an ArrayBlockingQueue. Then no extra object is allocated either.

Here's a link with numbers for ArrayBlockingQueue:
https://github.com/LMAX-Exchange/disruptor/wiki/Performance-Results

4M ops per sec in the 1P-1C case looks plenty fast for Spark's need.

I change it to 1 ms instead of 20ms.

I think if you really insist on going this route, you should use LockSupport.park/unpark instead of fighting for CPU time like this.

@bOOm-X
Copy link
Author

bOOm-X commented Jun 21, 2017

Well, you could use an ArrayBlockingQueue. Then no extra object is allocated either.

Yes I agree. But you get the synchronization too. I am still agree that it should not have a big impact yet. But using an ArrayBlockingQueue does not simplify the code a lot. The current implementation is not complicated, not too verbose, and base on a simple pure scala array. I do not think that it has a huge complexity cost compared to the java ArrayBlockingQueue.

I change the Thread.sleep to a Thread.yield to be less agressive for the thread unscheduling. Even with very few messages it should not consume too much CPU, and it will be much more reactive when messages are bursting.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing, but again, I noticed the same thing I commented on before. This is way over-engineered. You can do this in a much, much simpler way. There's no need to create all the different abstractions you're adding - the current listener abstraction is enough to achieve what is being proposed here.

All you need is to add a "queue name" parameter to the addListener method, and potentially an "event filter" parameter. Everything else is hidden in the listener implementation, and doesn't need to be exposed to any calling code.

@@ -532,7 +533,10 @@ class SparkContext(config: SparkConf) extends Logging {
new EventLoggingListener(_applicationId, _applicationAttemptId, _eventLogDir.get,
_conf, _hadoopConfiguration)
logger.start()
listenerBus.addListener(logger)
listenerBus.addProcessor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time finding the declaration of this method. I can't find it in your code nor in the existing master branch. Can you link to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LiveListenerBus.scala line 86

@@ -2350,13 +2354,12 @@ class SparkContext(config: SparkConf) extends Logging {
try {
val listenerClassNames: Seq[String] =
conf.get("spark.extraListeners", "").split(',').map(_.trim).filter(_ != "")
for (className <- listenerClassNames) {
// Use reflection to find the right constructor
val extraListeners = listenerClassNames.map{ className =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.map { className =>

You have a lot of style issues in your code - indentation, spacing, etc. Please read the style section in http://spark.apache.org/contributing.html and try to follow it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.
I do not understand, the PR pass the Scala style tests. How can I still have style issues ?

listener
}
if (extraListeners.nonEmpty) {
val group = new FixGroupOfListener(extraListeners, "extraListeners")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixGroupOfListener is a bad class name. I'm not even sure what it's supposed to be, but the closest I can think of is ListenerGroup.

But perhaps this shouldn't be exposed at all. If you add a queue name to the listener registration method, you can hide this from callers altogether. That is, if I understood what this class is in the first place.

Then you wouldn't need addIsolatedListener either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin are you suggesting making this a call to addProcessor? Or, and addListener override? Just trying to understand the code at this stage.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I think that modifying the existing addListener method is a bad idea. It will impact a lot the code. We want to keep this method with its current behavior (add a listener to the "default" queue) and be able to add listener in other queue. I think that adding a String label and doing matching on it to determine the queue is quite error prone. I prefer having a more constrained API.
For the FixedGroupOfListener name, I can change it. But I have 2 kind of group of listeners:

  • FixGroupOfListener : For group of inter-dependant listeners (like UI listeners). I can rename it to ListenerImmutableGroup
  • ModifiableGroupOfListener : For the "default" queue. I can rename it to ListenerGroup

Are these name OK for you ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I think that modifying the existing addListener method is a bad idea. It will impact a lot the code.

That's why overloaded methods exist.

But I have 2 kind of group of listeners

I don't think there's really a distinction between the two types of groups you mention. The "UI group" is just a modifiable group that you don't modify after it's been created.

@@ -227,6 +169,7 @@ private[spark] class EventLoggingListener(
* ".inprogress" suffix.
*/
def stop(): Unit = {
flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be necessary (close() does it).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { }

override def onOtherEvent(event: SparkListenerEvent): Unit = {
def log(event: SparkListenerEvent): Unit = {
if (event.logEvent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're adding an event filter, you could perform this check there...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the current behavior, it is not simple to put this filtering (if (event.logEvent)) in the event filter. Indeed I want to perform it only if the type of the event is not a "basic" type. It would imply to complexify a lot the EventFilter, which acts here as a "pre-filter" (discard only part of the event that we do not want to log)

import org.apache.spark.scheduler.bus.ListenerBusQueue.{FixGroupOfListener, ModifiableGroupOfListener}

// For generic message processor (like event logging)
private[scheduler] class ProcessorListenerBusQueue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the name of this file is weird. But more importantly, why are these classes even necessary?

Why can't you have a single queue implementation that manages a group of listeners? Whether the group has a single listener or multiple shouldn't matter - the implementation can be the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the name of the file, I can change it ! Do you have a better name ? I can even put the content of the file (the 2 concrete implementations) in the BusQueue.scala file.

I refactored a bit this file. Now I have only 2 implementations:

  • ProcessorBusQueue: This is the implementation for generic processor (In which we do not do the dispatch by event type)
  • ListenerBusQueue: This is the implementation for listener (with the dispatch by event type)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused about why you need 2 implementations. Why doesn't ListenerBusQueue work for everybody? And why shouldn't it?

I need more time to actually grok all this code, but like Wenchen suggested before, this is a big change and it would benefit from a more detailed explanation of exactly how you're organizing the hierarchy of listener, groups, etc. Your PR description only explains which queues you created, but not any of the changes that were needed to achieve that.

If it makes it easier, you can create a README.md file with a longer explanation for how things are organized. (for example, check ommon/network-common/src/main/java/org/apache/spark/network/crypto where I added a README to explain details of what that whole body of code is doing).

Copy link
Contributor

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi just some comments as I try to understand the code.

listener
}
if (extraListeners.nonEmpty) {
val group = new FixGroupOfListener(extraListeners, "extraListeners")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin are you suggesting making this a call to addProcessor? Or, and addListener override? Just trying to understand the code at this stage.

@@ -532,7 +533,10 @@ class SparkContext(config: SparkConf) extends Logging {
new EventLoggingListener(_applicationId, _applicationAttemptId, _eventLogDir.get,
_conf, _hadoopConfiguration)
logger.start()
listenerBus.addListener(logger)
listenerBus.addProcessor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
logEvent(toLog)
nbMessageProcessed = nbMessageProcessed + 1
if (nbMessageProcessed == FLUSH_FREQUENCY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be >= FLUSH_FREQUENCY.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* This method is thread-safe and can be called in any thread.
*/
final override def addListener(listener: SparkListenerInterface): Unit = {
startStopAddRemoveLock.lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be in a try/finally block, with unlock in the finally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for other lock/unlocks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done using Scala Try

} else {
onDropEvent(event)
throw new IllegalStateException("LiveListener bus already started!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely want to unlock before this.

@@ -27,7 +27,12 @@ private[spark] trait SparkListenerBus

protected override def doPostEvent(
listener: SparkListenerInterface,
event: SparkListenerEvent): Unit = {
event: SparkListenerEvent): Unit = SparkListenerEventDispatcher.dispatch(listener, event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just extracted the dispatch method to be able to use it in GroupOfListenersBusQueue and in SingleListenerBusQueue (in the file ListenerBusQueueImpl.scala)

@bOOm-X
Copy link
Author

bOOm-X commented Aug 1, 2017

retest this please

@bOOm-X
Copy link
Author

bOOm-X commented Aug 7, 2017

@vanzin I simplifed a lot the code. There are now only one implementation for the queue and for the group of listeners. I removed the extra trait in the listener hierarchy too.
Can you take a look ?

/**
* Add a generic listener to an isolated pool.
*/
def addProcessor(processor: SparkListenerEvent => Unit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between addProcessor and addListener?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with addProcessor, you do not have to provide a SparkListenerInterface (So with a method per message type), but just a generic function which handle SparkListenerEvent (the super type of each event type). So when you do a generic processing (see EventLoggingListener for example) it is very convenient, and cherry on the top you avoid the horrible and costly dispatch function , which in this case (generic processing) is a burden

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we do it in a separated PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a small technical refactoring which came for almost free with the new qeue object. It is also very convenient to be able to handle the asynchronous LiveListenerBus in the test. I think that we can keep it in this PR.

@cloud-fan
Copy link
Contributor

The PR description is good for explaining the new behavior, but can you say more about the implementation?

IMO, we just need to duplicate the event queue for each important listener like event logging listener, and non-important listeners can share one event queue like the current behavior. Then each event queue is processed by an individual thread.

@bOOm-X
Copy link
Author

bOOm-X commented Aug 10, 2017

@cloud-fan PR description updated with some details on the implementation

@cloud-fan
Copy link
Contributor

cloud-fan commented Aug 11, 2017

I think BusQueue is a good abstraction, but we can still simplify other parts. My proposal:

// Do we need a better name like ListenesGroup? it's very similar to the previous LiveListenerBus.
class BusQueue extends SparkListenerBus {
  val eventQueue = ...
  val listenerThread = ...
}

class LiveListenerBus {
  val listenersGroups: Map[String, BusQueue] = ...

  def addListener(listener, groupName = "default") {
    val group = listenerGroups.getOrUpdate(groupName, new BusQueue ...)
    group.addListener(listener)
  }

  def post(event) {
    listenersGroups.foreach { case (_, group) =>
      group.post(event)
    }
  }
}

@vanzin
Copy link
Contributor

vanzin commented Aug 22, 2017

Agree that it still seems like there's too many moving parts here. I don't see a whole lot of difference between BusQueue and GroupOfListener (which is a weird name, btw). A queue can have a bunch of listeners and a LiveListenerBus can have a bunch of queues. I don't see the need for anything more complicated than that.

WithListenerBus is another thing I don't understand; aside from the really weird name, it seems to be mostly the same thing as ListenerBus.

Also to reinforce a previous comment, your code has a ton of style issues. It doesn't matter that checkstyle doesn't complain about them; you still have to follow the code convention of the project or we'll be forever pointing out style issues in your code.

@bOOm-X
Copy link
Author

bOOm-X commented Aug 28, 2017

@cloud-fan
In my opinion, having the queues indexed by a string and reflecting that in the API is a bit too error prone in this case. What you want is to be in an isolated queue. With this kind of API it is easy to have an other listener added to the same queue because of a conflicting label. In the contrary, it is quite easy to not had 2 depending listeners not in the same queue because of case, ..., of the label. With a map of queues, it enforces the mutability of the queue, which is not really necessary here. And I think that it is a good idea to enforce the fact that depending listeners should be put in the same queue, it increases the readability and the clarity of the API.

@bOOm-X
Copy link
Author

bOOm-X commented Aug 28, 2017

@vanzin
GroupOfListener is just a set of listeners, which handle all the metric stuff. it allows to decouple this metric aspect from the ListernerBus and the LiveListenerBus. And It allows to enforce the dependency model between listeners (one message processed sequentially). I can change its name if you want.
The BusQueue just handles the queuing / dequeueing generic processing, which includes the consumer thread start an stop and the queuing strategy (drop if full).

WithListenerBus is indeed just the declaration of the methods implemented in ListenerBus. This declaration is shared between the LiveListenerBus and the ReplayListenerBus, and the UI listeners are added to them through it. The ListenerBus still contains a simple synchronous implementation of them used by the ReplayListenerBus. The LiveListenerBus has its own based on multiple queues for some kind of set of listeners. I can rename this interface if you want.

I will do a pass to try to fix the code style issues.

@vanzin
Copy link
Contributor

vanzin commented Sep 5, 2017

I really dislike WithListenerBus - both as a name and as a concept. There's already a ListenerBus trait; if it's not enough or is broken in some way, it should be fixed, instead of being patched by introducing yet more complexity in the hierarchy.

I think part of the confusion here is that the current code is trying to both refactor the ListenerBus hierarchy and add the concept of queues are the same time. From my point of view they're different things and could be done separately. For example you could add queues to LiveListenerBus only, which is really the only place that matters in the end. Maybe it won't be optimal in its first iteration, but it would be a much easier change to review.

I don't doubt that there's benefit in taking a holistic look into this part of the class hierarchy; but it would be good to do that separately, both so that we can clearly see that the proposed hierarchy makes sense, and so that it's easier to review things. It's easier to wrap your head around the code if it's focused on one problem instead of two.

@vanzin
Copy link
Contributor

vanzin commented Sep 6, 2017

@bOOm-X

I pushed some code to my repo: https://github.com/vanzin/spark/tree/SPARK-18838

Which is an attempt to do things the way I've been trying to explain. It tries to keep changes as local as possible to LiveListenerBus, creating a few types that implement the grouping and asynchronous behavior. You could do filtering by extending the new AsyncListener, for example, and adding it to the live listener bus.

It's just a p.o.c. so I cut a few corners (like metrics), and I only ran SparkListenerSuite, but I'm just trying to show a different approach that leaves the ListenerBus hierarchy mostly the same as now.

@bOOm-X
Copy link
Author

bOOm-X commented Sep 12, 2017

@vanzin I pushed some comments on your code. I think that trying to keep the exact same class hierarchy leads to a very complex code, with many drawbacks.

 The LiveListenerBus can now manage multiple queues for different listeners
 This will allow to increase a lot its deqeuing rate.
 All the listeners are still added to the main queue. So the behavior is the
 same as the previous one.
 In further commits some listeners will be moved to dedicated queues.

  ## How was this patch tested?
  unit test  + manual tests have been run on the cluster
@vanzin
Copy link
Contributor

vanzin commented Sep 12, 2017

You commented on my code, not on the idea. My code was hacked together quickly, it can be cleaned up a lot. Your comments don't prove that separating the refactoring of the listener bus hierarchy from the introduction of queues is impossible or undesirable.

bOOm-X and others added 10 commits September 12, 2017 18:51
The eventLoggingListener is now in a dedicated asynchronous queue.
This listener could represent 50% of the event processing time of the
standard queue

  ## How was this patch tested?
  unit test  + manual tests have been run on the cluster
The ExecutorAllocationManager is now in a dedicated asynchronous queue.
This listener suffer a lot of event drops. Put it in a dedicated queue
decrease a lot the chance of them

  ## How was this patch tested?
  unit test  + manual tests have been run on the cluster
The UI event listeners are now in a dedicated asynchronous queue.
This set of listener could represent 40% of the event processing time
+ do not block the listener bus with call from the GUI

  ## How was this patch tested?
  unit test  + manual tests have been run on the cluster
The extralisteners are now in a dedicated asynchronous queue.
So they cannot interfere with the execution of the spark internal listeners

  ## How was this patch tested?
  unit test  + manual tests have been run on the cluster
The streaming listener which is a bus too (for streaming event & listener)
is now in a dedicated asynchronous queue.
So they streaming listeners are run without impact from the other listeners

  ## How was this patch tested?
  unit test  + manual tests have been run on the cluster
 - wait on empty queue instead of looping
remove WithMultipleListenerBus and replace it by adding a boolean
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants