[SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks #9373

brkyvz · 2015-10-30T06:19:47Z

The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very expensive operation as it creates many small files on S3. It's not necessary to enable it on HDFS anyway.

However, when you have many small files on S3, recovery takes very long. In addition, files start stacking up pretty quickly, and deletes may not be able to keep up, therefore deletes can also be parallelized.

This PR adds support for the two parallelization steps mentioned above, in addition to a couple more failures I encountered during recovery.

SparkQA · 2015-10-30T06:27:30Z

Test build #44665 has finished for PR 9373 at commit be5a2ab.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2015-10-30T06:36:33Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala

-    receiverTracker.stop(processAllReceivedData)
+    if (receiverTracker != null) {
+      // First, stop receiving
+      receiverTracker.stop(processAllReceivedData)


NPE thrown when streaming context is stopped before recovery is complete

SparkQA · 2015-10-30T09:27:31Z

Test build #44666 has finished for PR 9373 at commit 7f8cfe3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2015-11-01T01:34:39Z

streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala

@@ -17,6 +17,7 @@
 package org.apache.spark.streaming.util

 import java.nio.ByteBuffer
+import java.util.concurrent.ConcurrentSkipListSet


NoteToSelf: Remove

brkyvz · 2015-11-01T21:45:04Z

@harishreedharan Here are some benchmark results:
For reference, the driver was a r3.2xlarge EC2 instance.

Num Threads	Rate (ms / file)	Speed-up
50	5.556101934	9.004997951
25	5.99898194	8.340196225
8	8.692144733	5.756080699
4	14.1162362	3.544336169
1	50.03268653	1

harishreedharan · 2015-11-02T02:29:18Z

Did you try HDFS? I am assuming we'd get similar speed ups there too but in
that case there are far fewer files in which case the cost to setup the
streams are paid only a handful of times.

What I am wondering is if we'd actually ever have to deal with that many
files in the non-S3 case. This adds the additional cost for HDFS or any
other FS, no? In those cases the number of files usually would be pretty
small, which may result in this being more expensive.

If this adds only a small cost or if it becomes faster, then let's keep
this.

On Sunday, November 1, 2015, Burak Yavuz <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@harishreedharan https://github.com/harishreedharan Here are some
benchmark results:
For reference, the driver was a r3.2xlarge EC2 instance.

[image: image]
https://cloud.githubusercontent.com/assets/5243515/10871515/54c14846-809e-11e5-91e6-2ac3605d98b7.png
Num Threads Rate (ms / file) Speed-up 50 5.556101934 9.004997951 25
5.99898194 8.340196225 8 8.692144733 5.756080699 4 14.1162362 3.544336169
1 50.03268653 1

—
Reply to this email directly or view it on GitHub
#9373 (comment).

Thanks,
Hari

tdas · 2015-11-04T22:16:54Z

streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala

-        Await.ready(f, 1 second)
+    oldLogFiles.foreach { logInfo =>
+      if (!executionContext.isShutdown) {
+        val f = Future { deleteFile(logInfo) }


Again should not use the default execution context. please make a execution context for this.

the execution context was defined implicitly in the class definition. Made it non-implicit for better readability

brkyvz · 2015-11-08T20:06:29Z

@harishreedharan I've been trying to test this patch, but I just couldn't set up HDFS to work with Spark using the spark-ec2 scripts. Could you please help me set up a cluster with HDFS so that I can benchmark this?
Basically, I can get HDFS up and running on the cluster, but Spark can't access it. get the following exception when I use Hadoop 2:

scala> sc.parallelize(1 to 5).saveAsTextFile("hdfs:///trial")
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).; Host Details : local host is: "ip-172-31-13-113.us-west-2.compute.internal/172.31.13.113"; destination host is: "ec2-52-32-160-227.us-west-2.compute.amazonaws.com":9000;

That looks like a Protobuf version incompatibility.

I launched the ec2 instances using:

ec2/spark-ec2 --spark-git-repo=https://github.com/brkyvz/spark --spark-version=fc2951f6530bde932a0bc97f430c6c360eb03209 -s 2 --spot-price=0.2 -t m4.large --no-ganglia -i ... -k ... -r us-west-2 --hadoop-major-version 2 launch burak-streaming-test-2

I used to get the following when using --hadoop-major-version 1:

java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "ip-172-31-30-170.us-west-2.compute.internal/172.31.30.170"; destination host is: "ec2-52-32-211-30.us-west-2.compute.amazonaws.com":9000;

tdas · 2015-11-09T08:24:22Z

@brkyvz I think there has been issues with Hadoop 2 related stuff in the master branch. Lets talk offline on how to fix it.

brkyvz · 2015-11-10T01:19:10Z

@harishreedharan I couldn't test this on HDFS properly. Instead I enabled the parallelization only when closeFileAfterWrite is enabled, which is when you actually really need it. Does that sound okay to you?

tdas · 2015-11-10T01:40:10Z

@brkyvz Could you update this PR with master? The batching PR got merged, creating conflicts.

harishreedharan · 2015-11-10T01:49:41Z

@brkyvz Sounds good, sir. I think the issue you saw seems to be a protobuf incompatibility issue - did you compile and run against the same hadoop-2 version (2.2+ ?)
This patch now LGTM.

SparkQA · 2015-11-10T02:52:21Z

Test build #45457 has finished for PR 9373 at commit 0b7279f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-10T03:07:31Z

Test build #45467 has finished for PR 9373 at commit c2cafe1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-10T03:24:34Z

Test build #45456 has finished for PR 9373 at commit 98da092.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2015-11-11T17:22:53Z

test this please

SparkQA · 2015-11-11T18:08:36Z

Test build #45650 has finished for PR 9373 at commit 1ba8340.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-11T18:23:32Z

Test build #45648 has finished for PR 9373 at commit 1ba8340.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-11-11T19:49:28Z

streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala

+      def getMax(): Int = synchronized { max }
+    }
+    try {
+      val testSeq = 1 to 64


Can you make this 1000 instead of 8 * 8. Just to make sure that we are splitting things right.

SparkQA · 2015-11-11T20:46:48Z

Test build #45668 has finished for PR 9373 at commit ccf7f5b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2015-11-11T22:21:22Z

streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala

@@ -582,6 +620,9 @@ object WriteAheadLogSuite {
      allowBatching: Boolean): Seq[String] = {
    val wal = createWriteAheadLog(logDirectory, closeFileAfterWrite, allowBatching)
    val data = wal.readAll().asScala.map(byteBufferToString).toSeq
+    // The thread pool for parallel recovery gets killed with wal.close(). Therefore we need to
+    // eagerly compute data, otherwise the lazy computation will fail.
+    data.length


Could you just change toSeq to toArray? toArray will drain the Iterator at once.

tdas · 2015-11-12T02:28:55Z

streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala

-          case ex: Exception =>
-            logWarning(s"Error clearing write ahead log file $logInfo", ex)
-        }
+    def deleteFile(walInfo: LogInfo): Unit = {


nit: why rename this to walInfo?

nit: empty line missing.

logInfo is Spark's logging method

tdas · 2015-11-12T02:45:34Z

@brkyvz Few more comments, and one pending comment from before about adding more unit tests.
@zsxwing please take a look once again.

SparkQA · 2015-11-12T03:42:31Z

Test build #45700 has finished for PR 9373 at commit 7e1829b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-12T07:11:09Z

Test build #45712 has finished for PR 9373 at commit dbb31e3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-12T17:48:12Z

Test build #45747 has finished for PR 9373 at commit a31822c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-12T23:33:36Z

Test build #45781 has finished for PR 9373 at commit 79e9b03.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-11-13T01:55:15Z

LGTM. Merging this to master and 1.6. Thanks @brkyvz, @zsxwing and @harishreedharan

… + minor recovery tweaks The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very expensive operation as it creates many small files on S3. It's not necessary to enable it on HDFS anyway. However, when you have many small files on S3, recovery takes very long. In addition, files start stacking up pretty quickly, and deletes may not be able to keep up, therefore deletes can also be parallelized. This PR adds support for the two parallelization steps mentioned above, in addition to a couple more failures I encountered during recovery. Author: Burak Yavuz <brkyvz@gmail.com> Closes #9373 from brkyvz/par-recovery. (cherry picked from commit 7786f9c) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

… + minor recovery tweaks The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very expensive operation as it creates many small files on S3. It's not necessary to enable it on HDFS anyway. However, when you have many small files on S3, recovery takes very long. In addition, files start stacking up pretty quickly, and deletes may not be able to keep up, therefore deletes can also be parallelized. This PR adds support for the two parallelization steps mentioned above, in addition to a couple more failures I encountered during recovery. Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#9373 from brkyvz/par-recovery.

brkyvz added 4 commits October 29, 2015 13:54

progress

573b657

ready for PR

655f4bf

ready for PR

06da0d1

minor

be5a2ab

fix whitespace

7f8cfe3

brkyvz changed the title ~~[SPARK-11419][STREAMING] Par recovery~~ [SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks Oct 30, 2015

brkyvz reviewed Oct 30, 2015
View reviewed changes

brkyvz reviewed Nov 1, 2015
View reviewed changes

tdas reviewed Nov 4, 2015
View reviewed changes

update

98da092

minor

0b7279f

fix conflicts

c2cafe1

add check for openInputStream

83aa28e

minor grammar

1ba8340

fix merge conflicts

ccf7f5b

tdas reviewed Nov 11, 2015
View reviewed changes

zsxwing reviewed Nov 11, 2015
View reviewed changes

address 3

7e1829b

tdas reviewed Nov 12, 2015
View reviewed changes

address minor

dbb31e3

increase thread size for Jenkins

a31822c

reduce max thread limit

79e9b03

asfgit closed this in 7786f9c Nov 13, 2015

brkyvz deleted the par-recovery branch February 3, 2019 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks #9373

[SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks #9373

brkyvz commented Oct 30, 2015

SparkQA commented Oct 30, 2015

brkyvz Oct 30, 2015

SparkQA commented Oct 30, 2015

brkyvz Nov 1, 2015

brkyvz commented Nov 1, 2015

harishreedharan commented Nov 2, 2015

tdas Nov 4, 2015

brkyvz Nov 10, 2015

brkyvz commented Nov 8, 2015

tdas commented Nov 9, 2015

brkyvz commented Nov 10, 2015

tdas commented Nov 10, 2015

harishreedharan commented Nov 10, 2015

SparkQA commented Nov 10, 2015

SparkQA commented Nov 10, 2015

SparkQA commented Nov 10, 2015

brkyvz commented Nov 11, 2015

SparkQA commented Nov 11, 2015

SparkQA commented Nov 11, 2015

tdas Nov 11, 2015

SparkQA commented Nov 11, 2015

zsxwing Nov 11, 2015

tdas Nov 12, 2015

tdas Nov 12, 2015

brkyvz Nov 12, 2015

tdas commented Nov 12, 2015

SparkQA commented Nov 12, 2015

SparkQA commented Nov 12, 2015

SparkQA commented Nov 12, 2015

SparkQA commented Nov 12, 2015

tdas commented Nov 13, 2015

[SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks #9373

[SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks #9373

Conversation

brkyvz commented Oct 30, 2015

SparkQA commented Oct 30, 2015

Choose a reason for hiding this comment

SparkQA commented Oct 30, 2015

Choose a reason for hiding this comment

brkyvz commented Nov 1, 2015

harishreedharan commented Nov 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brkyvz commented Nov 8, 2015

tdas commented Nov 9, 2015

brkyvz commented Nov 10, 2015

tdas commented Nov 10, 2015

harishreedharan commented Nov 10, 2015

SparkQA commented Nov 10, 2015

SparkQA commented Nov 10, 2015

SparkQA commented Nov 10, 2015

brkyvz commented Nov 11, 2015

SparkQA commented Nov 11, 2015

SparkQA commented Nov 11, 2015

Choose a reason for hiding this comment

SparkQA commented Nov 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdas commented Nov 12, 2015

SparkQA commented Nov 12, 2015

SparkQA commented Nov 12, 2015

SparkQA commented Nov 12, 2015

SparkQA commented Nov 12, 2015

tdas commented Nov 13, 2015