[SPARK-20642][core] Store FsHistoryProvider listing data in a KVStore. #18887

vanzin · 2017-08-08T22:14:16Z

The application listing is still generated from event logs, but is now stored
in a KVStore instance. By default an in-memory store is used, but a new config
allows setting a local disk path to store the data, in which case a LevelDB
store will be created.

The provider stores things internally using the public REST API types; I believe
this is better going forward since it will make it easier to get rid of the
internal history server API which is mostly redundant at this point.

I also added a finalizer to LevelDBIterator, to make sure that resources are
eventually released. This helps when code iterates but does not exhaust the
iterator, thus not triggering the auto-close code.

HistoryServerSuite was modified to not re-start the history server unnecessarily;
this makes the json validation tests run more quickly.

The application listing is still generated from event logs, but is now stored in a KVStore instance. By default an in-memory store is used, but a new config allows setting a local disk path to store the data, in which case a LevelDB store will be created. The provider stores things internally using the public REST API types; I believe this is better going forward since it will make it easier to get rid of the internal history server API which is mostly redundant at this point. I also added a finalizer to LevelDBIterator, to make sure that resources are eventually released. This helps when code iterates but does not exhaust the iterator, thus not triggering the auto-close code. HistoryServerSuite was modified to not re-start the history server unnecessarily; this makes the json validation tests run more quickly.

vanzin · 2017-08-08T22:15:02Z

For context:

Project link: https://issues.apache.org/jira/browse/SPARK-18085
Upcoming PRs that build on this code: https://github.com/vanzin/spark/pulls

vanzin · 2017-08-08T22:16:20Z

scalastyle-config.xml

@@ -86,7 +86,7 @@ This file is divided into 3 sections:
  </check>

  <check level="error" class="org.scalastyle.scalariform.ObjectNamesChecker" enabled="true">
-    <parameters><parameter name="regex"><![CDATA[[A-Z][A-Za-z]*]]></parameter></parameters>
+    <parameters><parameter name="regex"><![CDATA[(config|[A-Z][A-Za-z]*)]]></parameter></parameters>


I'm adding this so that we can have an object named config; this seems cleaner than the current approach of having package object config, which requires every constant to be tagged as private, instead of just marking the whole object as private[spark].

Agree it makes more sense to make the whole thing private[spark], but couldn't you just turn the rule off around this object, like we do for println?

(Also I'm confused why package object config doesn't violate the rule).

That could be done, but I want this to be the "official" pattern going forward, and having to disable scalastyle every time it's used is kinda ugly and somewhat detracts from the official-ness of it.

ok makes sense. I wasn't paying enough attention to the actual change at first, thought you were just turning the rule entirely off, but I agree this changes makes sense.

btw I was curious why package object config was OK -- its just a hard-coded special case: https://github.com/scalastyle/scalastyle/blob/master/src/main/scala/org/scalastyle/scalariform/ClassNamesChecker.scala#L74

Make sure the db is still open before trying to close the iterator, otherwise it may cause a JVM crash.

SparkQA · 2017-08-09T01:11:15Z

Test build #80420 has finished for PR 18887 at commit 842589d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class KVStoreMetadata(
case class LogInfo(

SparkQA · 2017-08-09T02:47:35Z

Test build #80422 has finished for PR 18887 at commit 1f08bd7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-09T04:14:58Z

Test build #80431 has finished for PR 18887 at commit 1ec1a67.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-08-11T17:19:21Z

hmm... @squito @ajbozarth @jerryshao

ajbozarth

A few questions and feedback, but the SHS stuff looks good, I'm not well versed in the KVStore/DB side of things so I can't vouch for those, though the code looks fine.

ajbozarth · 2017-08-11T19:35:11Z

core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala

@@ -76,6 +76,14 @@ private[history] case class LoadedAppUI(
 private[history] abstract class ApplicationHistoryProvider {

  /**
+   * The number of applications available for listing. Separate method in case it's cheaper
+   * to get a count than to calculate the whole listing.


I'm not sure I follow this reasoning, if the previous way of getting count was getListing().size then how does making a function of it speed it up? I don't mind adding a helping function like this, I just don't follow the reasoning of your comment.

This is an interface, so this was added to allow implementations to override this method if that makes sense.

It just looks like I lost the override in one of my rebases, so let me add that back.

Actually it doesn't seem like this is used anymore and I can remove it...

ajbozarth · 2017-08-11T19:46:12Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

-            new FsApplicationHistoryInfo(app.id, app.name, toRetain.toList))
+      val maxTime = clock.getTimeMillis() - conf.get(MAX_LOG_AGE_S) * 1000
+
+      // Iterate descending over all applications whose oldest attempt is older than the maxAge.


maxAge -> maxTime

ajbozarth · 2017-08-11T20:05:06Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+
+    val attempts = oldApp.attempts.filter(_.info.attemptId != attempt.info.attemptId) ++
+      List(attempt)
+    val oldestAttempt = attempts.map(_.info.lastUpdated.getTime()).min


Is this val used anywhere?

ajbozarth · 2017-08-11T20:06:08Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

@@ -742,53 +698,145 @@ private[history] object FsHistoryProvider {
  private val APPL_END_EVENT_PREFIX = "{\"Event\":\"SparkListenerApplicationEnd\""

  private val LOG_START_EVENT_PREFIX = "{\"Event\":\"SparkListenerLogStart\""
+
+  private val CURRENT_VERSION = 1L


Current version of?

ajbozarth · 2017-08-11T20:07:52Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+  val fileSize: Long)
+
+private[history] class AttemptInfoWrapper(
+    val info: v1.ApplicationAttemptInfo,


Yes, I'm using this syntax because in many places there are conflicting type names in the API package and in other packages.

ajbozarth · 2017-08-11T23:32:51Z

core/src/main/scala/org/apache/spark/status/api/v1/api.scala

@@ -31,6 +33,9 @@ class ApplicationInfo private[spark](
    val memoryPerExecutorMB: Option[Int],
    val attempts: Seq[ApplicationAttemptInfo])

+@JsonIgnoreProperties(
+  value = Array("startTimeEpoch", "endTimeEpoch", "lastUpdatedEpoch"),


Will this exclude the Epoch values from the api? Because if I remember correctly we added those for the api specifically

No, this just avoids trying to deserialize them, which would cause an error because these properties have no setter.

jerryshao · 2017-08-14T09:20:31Z

Looks like it is a little big to fast understand, I need to download the patch and play it at first :).

SparkQA · 2017-08-14T20:01:51Z

Test build #80639 has finished for PR 18887 at commit b696f96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao

LGTM overall, just some minor comments.

One question about FsHistoryProvider, looks like we removed some synchronizations here, I'm not sure if our underlying kvstore well supports concurrency, or it is not an issue any more?

jerryshao · 2017-08-15T01:44:22Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

 import org.apache.spark.ui.SparkUI
 import org.apache.spark.util.{Clock, SystemClock, ThreadUtils, Utils}
+import org.apache.spark.util.kvstore._

 /**
 * A class that provides application history from event logs stored in the file system.


I saw the comments here refers to the removed data structure (though it is folded here), would you please fix the comment.

jerryshao · 2017-08-15T01:48:53Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+        db.setMetadata(new KVStoreMetadata(CURRENT_LISTING_VERSION, logDir.toString()))
+        db
+      } else if (meta.version != CURRENT_LISTING_VERSION ||
+          !logDir.toString().equals(meta.logDir)) {


Minor: logDirhere a String, I think no need to calltoString` here and above.

jerryshao · 2017-08-15T02:10:20Z

docs/monitoring.md

@@ -220,6 +220,13 @@ The history server can be configured as follows:
      Number of threads that will be used by history server to process event logs.
    </td>
  </tr>
+  <tr>
+    <td>spark.history.store.path</td>


Check from the log, looks like we don't have a default path for local history store, do we need to add a default value here in doc?

val LOCAL_STORE_DIR = ConfigBuilder("spark.history.store.path") .stringConf .createOptional

I'm using this as a trigger of whether to use the disk store or not; if you set a local store directory, you're using the disk store, otherwise you're using the memory store. I do need to update it in the documentation, though.

SparkQA · 2017-08-16T01:37:41Z

Test build #80701 has finished for PR 18887 at commit 519dab0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito

I have one big question -- do you think that in the initial version this should replace the old FSHistoryProvider? I feel like we should have one version where the old code is still available, controlled by a feature flag.

Other than that mostly minor things on the code.

squito · 2017-08-16T17:14:05Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

@@ -301,6 +334,7 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
  }

  override def stop(): Unit = {
+    listing.close()


if this throws an exception, should we still try to cleanup initThread?

squito · 2017-08-16T17:16:00Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala


-  // List of application logs to be deleted by event log cleaner.
-  private var attemptsToClean = new mutable.ListBuffer[FsApplicationAttemptInfo]
+  private val listing = storePath.map { path =>


given the large initializer, could you add an explicit type annotation to listing to help the reader?

squito · 2017-08-16T17:19:26Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

-            prevFileSize < entry.getLen() &&
-            SparkHadoopUtil.get.checkAccessPermission(entry, FsAction.READ)
+            SparkHadoopUtil.get.checkAccessPermission(entry, FsAction.READ) &&
+            recordedFileSize(entry.getPath()) < entry.getLen()
        }
        .flatMap { entry => Some(entry) }


realize this isn't your change, but what is the point of this? isn't it a no-op?

squito · 2017-08-16T17:40:53Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+    var coresPerExecutor: Option[Int] = None
+    var memoryPerExecutorMB: Option[Int] = None
+
+    def toView(attempts: List[AttemptInfoWrapper]): ApplicationInfoWrapper = {


looks like this is only ever called with one AttemptInfoWrapper, so simpler if you remove the List.

vanzin · 2017-08-16T20:31:16Z

I feel like we should have one version where the old code is still available, controlled by a feature flag.

I'm not sure exactly what you're suggesting. The default behavior is still, as much as possible, the old one: everything is kept in memory. Keeping the exact old code in place would mean forking FsHistoryProvider, which is not something I see as desirable.

Also, forgot to reply to an earlier comment by @jerryshao :

looks like we removed some synchronizations here, I'm not sure if

KVStore instances are thread safe, so a lot of the old synchronization in this class does not apply. There was code before that used non-thread-safe data structures (such as attemptsToClean), which has been replaced with storing things in the KVStore instance instead.

squito · 2017-08-16T21:01:19Z

I feel like we should have one version where the old code is still available, controlled by a feature flag.

I'm not sure exactly what you're suggesting. The default behavior is still, as much as possible, the old one: everything is kept in memory. Keeping the exact old code in place would mean forking FsHistoryProvider, which is not something I see as desirable.

long-term, definitely not desirable. My concern is that there is a lot of new code taking effect here, on some important functionality -- we don't want some bug to prevent adoption of 2.3. For one version, you could leave the old one available, rename it to something else, and put a big disclaimer in the code that its obsolete, and as long as thing are smooth for 2.3 delete it entirely for 2.4. The HS isn't as core or tricky as the network module but I'm thinking of this like netty vs. nio.

vanzin · 2017-08-16T21:08:16Z

There's quite a lot of unit tests that cover this code; forking would mean also making those unit tests run against both versions of the code so that no one breaks anything, and potentially fixing bugs in two different places if they're found.

I'd rather avoid the overhead. The bulk of FsHistoryProvider, which is the part that actually monitors the file system and makes decisions about when to parse things, is mostly unchanged.

And the rest of this project is way more disruptive than this one change.

SparkQA · 2017-08-16T23:37:04Z

Test build #80750 has finished for PR 18887 at commit dc642bd.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-08-17T02:28:13Z

retest this please

SparkQA · 2017-08-17T05:34:20Z

Test build #80757 has finished for PR 18887 at commit dc642bd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-08-17T22:10:01Z

Just to keep others in the loop, Marcelo and I talked about this some offline. I think this PR itself is fine, but to me this is an important point in the larger history server project he's doing, so I'm going to take a look at the rest of the changes as well before I feel comfortable merging. Also he explained why it extremely complicated to keep both the old & new version available, which make sense to me, though I may poke at a version myself just to see how complex it is.

squito · 2017-08-21T15:56:04Z

I futzed around for a while with trying to keep the old stuff around, and I realized it really would be quite a mess. The biggest problem is that the old rest api is just waay too tied into the UI, eg https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/ExecutorListResource.scala#L26 , probably my fault in that implementation of the rest apis. You could keep the old code around, but would involve so much moving and refactoring that it seems pointless.

I'm still looking at the other changes in the larger project, and I'd encourage other reviewers to do the same.

gatorsmile · 2017-09-09T23:53:26Z

I have the same concern, as @squito said above #18887 (review). Refactoring the old codes does not seem pointless to me.

To evaluate the impacts, I have a few questions:

What is the migration proposal/guides?
What is the proposal when users are using multiple different versions of Sparks in their production environment or during migration procedure?
What should users do when they hit serious bugs that we are unable to find at this stage?
What are the external behavior changes between the new and the old ones?

vanzin · 2017-09-11T17:12:32Z

What is the migration proposal/guides?

Not sure what you mean. There's no change in behavior by default, so there's no migration of anything needed.

What should users do when they hit serious bugs that we are unable to find at this stage?

This is always a question when you introduce a change. This particular project makes it a bit worse since it's a bigger change. But the answer is the same: file a bug, we fix it, next release has the fix; if it's really important for you, you can patch your own Spark, or revert to an older release, as many people do when they find blockers.

The changes in this project do not influence whether your application will fail or not; they're isolated to the UI, so at worst it will make things harder to debug until the issues are fixed.

On the other hand, a lot can be mitigated by testing; there's already a lot of test coverage for parts of this code, but the UI is kinda the exception there. So the longer these changes stay in review instead of being committed, the less coverage they'll have in people's day to day testing, actually increasing the risk that some bug might be missed.

What are the external behavior changes between the new and the old ones?

None by default.

gatorsmile

I quickly went over the changes and left a few comments. I might need more time to understand the potential impact.

gatorsmile · 2017-09-10T04:09:36Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala


-  val fileToAppInfo = new ConcurrentHashMap[Path, FsApplicationAttemptInfo]()
+  private val storePath = conf.get(LOCAL_STORE_DIR)


Need a description on storePath or LOCAL_STORE_DIR, although we have the one in monitoring.md

gatorsmile · 2017-09-10T04:11:25Z

core/src/main/scala/org/apache/spark/deploy/history/config.scala

+    .timeConf(TimeUnit.SECONDS)
+    .createWithDefaultString("7d")
+
+  val LOCAL_STORE_DIR = ConfigBuilder("spark.history.store.path")


Just want to confirm it. Except this, no change on the other parameters. Right?

It'd better to document the default one is an in-memory store.

Correct, other parameters are the same.

gatorsmile · 2017-09-10T04:56:30Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+      } else if (meta.version != CURRENT_LISTING_VERSION || !logDir.equals(meta.logDir)) {
+        logInfo("Detected mismatched config in existing DB, deleting...")
+        db.close()
+        Utils.deleteRecursively(dbPath)


If the version does not match, we delete the files?

Yes; the code will re-create the data from the event logs when that happens.

gatorsmile · 2017-09-10T05:06:10Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

@@ -742,53 +698,146 @@ private[history] object FsHistoryProvider {
  private val APPL_END_EVENT_PREFIX = "{\"Event\":\"SparkListenerApplicationEnd\""

  private val LOG_START_EVENT_PREFIX = "{\"Event\":\"SparkListenerLogStart\""
+
+  /** Current version of the data written to the listing database. */
+  private val CURRENT_LISTING_VERSION = 1L


I tried to find the definition and usage of CURRENT_LISTING_VERSION , but it sounds like this is not discussed. What does this really mean? When will we change this value? Do we have a complete story about this flag?

I'll add a more verbose comment; but this is basically me punting proper versioning to the next Spark release after this code is added. This version number is a "nuclear option"; if we break the data that is serialized to disk, we increase this number, and the new SHS will delete all old data and re-generate it from event logs.

I'm punting because there is no versioning issue in the first version; there's no existing data that the SHS might try to read.

I plan to take a closer look at versioning after all the initial PRs go in, but leaving this here gives us a choice in case there's a more urgent need to break things.

gatorsmile · 2017-09-10T06:43:00Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+
+private[history] case class KVStoreMetadata(
+  val version: Long,
+  val logDir: String)


The above two val are redundant.

gatorsmile · 2017-09-10T07:06:06Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+
+    attempt.attemptId = event.appAttemptId
+    attempt.startTime = new Date(event.time)
+    attempt.lastUpdated = new Date(clock.getTimeMillis())


clock.getTimeMillis() and event.time are always in the same time zone?

Both are basically System.currentTimeMillis(), so yes.

gatorsmile · 2017-09-10T07:09:34Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+      .reverse()
+      .iterator()
+      .asScala
+      .map(_.toAppHistoryInfo)


Nit: toAppHistoryInfo()

gatorsmile · 2017-09-10T07:12:38Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

@@ -229,10 +254,22 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
    }
  }

-  override def getListing(): Iterator[FsApplicationHistoryInfo] = applications.values.iterator
+  override def getListing(): Iterator[ApplicationHistoryInfo] = {


The returned order is descending, right? This is not straightforward from the codes. Please add a comment

gatorsmile · 2017-09-10T07:18:27Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+        initThread.join()
+      }
+    } finally {
+      listing.close()


What might happen if LevelDB is not properly closed?

It might leave JNI handles open (a.k.a. a memory leak).

Also, this doesn't apply to this change, but later when the UI info is also written to disk, it could prevent the UI db from being replaced with an updated one, since its files would be opened.

gatorsmile · 2017-09-10T07:33:28Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

-            prevFileSize < entry.getLen() &&
-            SparkHadoopUtil.get.checkAccessPermission(entry, FsAction.READ)
+            SparkHadoopUtil.get.checkAccessPermission(entry, FsAction.READ) &&
+            recordedFileSize(entry.getPath()) < entry.getLen()


Can we add a comment to explain what recordedFileSize(entry.getPath()) returns? In the original code, the variable name is self descriptive. The new change does not have it any more.

cloud-fan · 2017-09-26T14:13:13Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+  @KVIndexParam logPath: String,
+  fileSize: Long)
+
+private[history] class AttemptInfoWrapper(


why this one doesn't have a natural index?

Because it's not directly written to the store.

cloud-fan · 2017-09-26T14:20:46Z

core/src/main/scala/org/apache/spark/deploy/history/config.scala

+    .stringConf
+    .createWithDefault(DEFAULT_LOG_DIR)
+
+  val MAX_LOG_AGE_S = ConfigBuilder("spark.history.fs.cleaner.maxAge")


what does the ending S mean? seconds?

Yes. This pattern is used in a bunch of other places to indicate the unit of time of the config.

cloud-fan · 2017-09-26T14:28:03Z

core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala

@@ -496,7 +517,7 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc

      var provider: FsHistoryProvider = null
      try {
-        provider = new FsHistoryProvider(conf)
+        provider = newProvider(conf)


it seems worse to have newProvider method and create a lot of diff.

This was part of some code I only partially reverted. Will revert these.

cloud-fan · 2017-09-26T15:06:24Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

    } catch {
-      case e: FileNotFoundException => None


we will never hit FileNotFoundException?

I guess it still can until I remove the current replay code in a later change.

cloud-fan · 2017-09-26T15:07:48Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

-        appInfo.attempts.find(_.attemptId == attemptId).flatMap { attempt =>
+      val appInfo = load(appId)
+      appInfo.attempts
+        .find { attempt => attempt.info.attemptId == attemptId }


nit: _.info.attemptId == attemptId

cloud-fan · 2017-09-26T15:09:38Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

-              Some(attempt.lastUpdated), attempt.startTime)
+            SparkUI.createHistoryUI(conf, replayBus, appSecManager, appInfo.info.name,
+              HistoryServer.getAttemptURI(appId, attempt.info.attemptId),
+              Some(attempt.info.lastUpdated.getTime()), attempt.info.startTime.getTime())


Shall we make load return ApplicationHistoryInfo instead of ApplicationInfoWrapper? Then we can reduce a lot of unnecessary diff.

AttemptInfoWrapper stores information that is not available in the public API and is used by the provider. Later changes also add more fields to ApplicationInfoWrapper that are needed by the provider.

cloud-fan · 2017-09-26T15:11:35Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

-            None
-          }
-
+          ui.appSparkVersion = appListener.appSparkVersion.getOrElse("")


add an assert(appListener.appId.isDefined)?

If the listing exists it's unlikely that this would ever trigger, but sure.

(This code will also go away in subsequent changes.)

cloud-fan · 2017-09-26T15:13:21Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

      // scan for modified applications, replay and merge them
-      val logInfos: Seq[FileStatus] = statusList
+      val logInfos = statusList


can we just do
val logInfos = Option(fs.listStatus(new Path(logDir))).map(_.toSeq).getOrElse(Nil)?

cloud-fan · 2017-09-26T15:19:37Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+        app
+    }
+
+    def compareAttemptInfo(a1: AttemptInfoWrapper, a2: AttemptInfoWrapper): Boolean = {


can we add a isStartedEarlierThan(other) method in AttemptInfoWrapper?

This is only used here, I'd rather keep the logic local.

cloud-fan · 2017-09-26T15:21:10Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+    val attempt = app.attempts.head
+
+    val oldApp = try {
+      listing.read(classOf[ApplicationInfoWrapper], app.id)


you can call load here

cloud-fan · 2017-09-26T15:29:25Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

    }
  }

  /**
-   * Replay the log files in the list and merge the list of old applications with new ones
+   * Replay the given log file, saving the application in the listing db.


why update the comment? We don't merge anymore?

No. If you look at the old code it did a "merge sort" kinda thing to create an updated listing. KVStore sorts things internally so there's no need for that code anymore - you just write something to it, and it's sorted magically.

cloud-fan · 2017-09-26T15:30:54Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+      iterator.get.asScala.foreach { app =>
+        // Applications may have multiple attempts, some of which may not need to be deleted yet.
+        val (remaining, toDelete) = app.attempts.partition { attempt =>
+          attempt.info.lastUpdated.getTime() >= maxTime


nit: previously we use > not >=, see https://github.com/apache/spark/pull/18887/files#diff-a7befb99e7bd7e3ab5c46c2568aa5b3eL561

This is comparing different things.

Old code compared how long since the last update has passed.

This code compares the absolute expiration time against the log's last update.

If I make your change, the unit test stops passing (and it hasn't changed from the previous).

cloud-fan · 2017-09-26T15:35:06Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

-      appInfo.attempts.find(_.attemptId == attemptId)
-    }
+    val count = listing.count(classOf[ApplicationInfoWrapper])
+    s"""|FsHistoryProvider{logdir=$logDir,


one space after |

That would make the string look weird.

SparkQA · 2017-09-26T20:39:23Z

Test build #82206 has finished for PR 18887 at commit 5eff2c5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-09-27T12:34:04Z

LGTM, merging to master!

mgaido91 · 2017-09-27T14:02:24Z

sorry, this is causing a checkstyle error for me while building the project due to the presence of a finalizer method. I can't understand how this was able to pass the Jenkins test. Does someone have any clue on that? Thanks.

cloud-fan · 2017-09-27T15:33:59Z

hmmm weird, @srowen any ideas?

srowen · 2017-09-27T15:48:31Z

Oh, is it because SBT doesn't run checkstyle but Maven does, and SBT runs the PR builder? This could be my fault, for recently re-enabling checkstyle. This may be why. Well, for now, I'd at least just submit the hotfix to suppress this checkstyle rule

cloud-fan · 2017-09-27T16:32:20Z

yea the PR build use SBT. But I think SBT also run checkstyle before?

srowen · 2017-09-27T16:35:00Z

I don't think so because checkstyle is for Java only and SBT won't compile Java code.
I'd kind of love to stop mixing SBT and Maven but that's quite a separate thing.

I'll put up a HOTFIX that both fixes the style violation and disables checkstyle again. Now I remember this was the reason -- it wasn't because we used to compile Java code separately from the Scala code.

## What changes were proposed in this pull request? Fix finalizer checkstyle violation by just turning it off; re-disable checkstyle as it won't be run by SBT PR builder. See apache#18887 (comment) ## How was this patch tested? `./dev/lint-java` runs successfully Author: Sean Owen <sowen@cloudera.com> Closes apache#19371 from srowen/HotfixFinalizerCheckstlye.

vanzin commented Aug 8, 2017

View reviewed changes

SHS-NG M2: Re-work LevelDBIterator.finalize().

1f08bd7

Make sure the db is still open before trying to close the iterator, otherwise it may cause a JVM crash.

Add private where needed.

1ec1a67

ajbozarth reviewed Aug 11, 2017

View reviewed changes

Feedback.

b696f96

jerryshao reviewed Aug 15, 2017

View reviewed changes

Feedback.

519dab0

vanzin force-pushed the SPARK-20642 branch from d728708 to 519dab0 Compare August 15, 2017 22:31

squito reviewed Aug 16, 2017

View reviewed changes

Feedback.

dc642bd

gatorsmile requested changes Sep 11, 2017

View reviewed changes

cloud-fan reviewed Sep 26, 2017

View reviewed changes

Get rid of newProvider.

5eff2c5

asfgit closed this in 74daf62 Sep 27, 2017

srowen mentioned this pull request Sep 27, 2017

[HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle #19371

Closed

vanzin deleted the SPARK-20642 branch September 27, 2017 18:27


		val fileToAppInfo = new ConcurrentHashMap[Path, FsApplicationAttemptInfo]()
		private val storePath = conf.get(LOCAL_STORE_DIR)

[SPARK-20642][core] Store FsHistoryProvider listing data in a KVStore. #18887

[SPARK-20642][core] Store FsHistoryProvider listing data in a KVStore. #18887

Conversation

vanzin commented Aug 8, 2017

vanzin commented Aug 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 9, 2017

SparkQA commented Aug 9, 2017

SparkQA commented Aug 9, 2017

vanzin commented Aug 11, 2017

ajbozarth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryshao commented Aug 14, 2017 • edited Loading

SparkQA commented Aug 14, 2017

jerryshao left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 16, 2017

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vanzin commented Aug 16, 2017

squito commented Aug 16, 2017

vanzin commented Aug 16, 2017 • edited Loading

SparkQA commented Aug 16, 2017

vanzin commented Aug 17, 2017

SparkQA commented Aug 17, 2017

squito commented Aug 17, 2017

squito commented Aug 21, 2017

gatorsmile commented Sep 9, 2017 • edited Loading

vanzin commented Sep 11, 2017

gatorsmile left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vanzin Sep 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Sep 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Sep 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryshao commented Aug 14, 2017 •

edited

Loading

jerryshao left a comment •

edited

Loading

vanzin commented Aug 16, 2017 •

edited

Loading

gatorsmile commented Sep 9, 2017 •

edited

Loading

vanzin Sep 12, 2017 •

edited

Loading

cloud-fan Sep 26, 2017 •

edited

Loading

cloud-fan Sep 26, 2017 •

edited

Loading

cloud-fan Sep 26, 2017 •

edited

Loading

cloud-fan Sep 26, 2017 •

edited

Loading