OOM in Rest server when launching a job with large resources #146

ash211 · 2017-02-23T23:01:42Z

Seen while testing an internal application

2017-02-23 23:00:06 WARN  ServletHandler:667 - Error for /v1/submissions/create
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Arrays.java:3664)
	at java.lang.String.<init>(String.java:207)
	at java.lang.StringBuilder.toString(StringBuilder.java:407)
	at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:356)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:20)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3736)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726)
	at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:20)
	at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:50)
	at org.apache.spark.deploy.rest.SubmitRestProtocolMessage$.parseAction(SubmitRestProtocolMessage.scala:112)
	at org.apache.spark.deploy.rest.SubmitRestProtocolMessage$.fromJson(SubmitRestProtocolMessage.scala:130)
	at org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:283)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
	at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
	at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
	at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
	at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
	at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.spark_project.jetty.server.Server.handle(Server.java:499)
	at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
	at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
	at org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
	at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
	at org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

The text was updated successfully, but these errors were encountered:

ash211 · 2017-02-23T23:01:50Z

cc @mccheah

mccheah · 2017-02-24T02:18:54Z

We don't set the driver memory - probably something we should do, and probably is defaulting to the something very small right now.

ash211 · 2017-02-24T06:55:26Z

We could set the rest service memory but I think a better approach would be to be more efficient with memory in this tiny little server. So pass around an InputStream and try to scan through the stream only once when needed to deserialize things.

In RestSubmissionServer.scala:282-283 Spark collects an InputStream into a JSON String and then deserializes the String into a SubmitRestProtocolMessage, converting it into a JSON object twice along the way (SubmitRestProcolMessage.scala:112 and SubmitRestProcolMessage.scala:144). Holding those copies is pretty lightweight in upstream Spark since they're small, but given we put the jar and file tarballs into the request in our KubernetesCreateSubmissionRequest the memory pressure is pushing us over the edge.

What's the right thing to do here?

ash211 · 2017-02-28T21:16:46Z

Workaround up at #161 to make driver submission server's memory configurable while we work out how to handle the large messages more efficiently.

kimoonkim · 2017-03-01T20:51:12Z

In the SIG meeting, I alluded to separate JVM options controlling the max size of single allocations. I think they are -XX:NewSize and -XX:MaxNewSize (documentation link).

ash211 · 2017-03-02T22:53:55Z

Sending as byte array instead of base64 we had filed earlier as #81

ash211 · 2017-03-02T22:55:39Z

Coming out of the SIG meeting, the remediation plan was to:

make rest server heap configurable (Allow setting memory on the driver submission server. #161)
send as byte array instead of base64 (Upload jars/files to driver in binary instead of base64 #81)
bounce the request off disk to prevent OOMs

ash211 changed the title ~~OOM in Rest server with large resources~~ OOM in Rest server when launching a job with large resources Feb 23, 2017

ash211 mentioned this issue Feb 28, 2017

Allow setting memory on the driver submission server. #161

Merged

ash211 mentioned this issue Mar 17, 2017

Revisit submission process after alpha release #167

Closed

ifilonenko pushed a commit to ifilonenko/spark that referenced this issue Feb 25, 2019

Clean before building the dist with hadoop (apache-spark-on-k8s#146)

4a6274c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM in Rest server when launching a job with large resources #146

OOM in Rest server when launching a job with large resources #146

ash211 commented Feb 23, 2017

ash211 commented Feb 23, 2017

mccheah commented Feb 24, 2017

ash211 commented Feb 24, 2017 •

edited

Loading

ash211 commented Feb 28, 2017

kimoonkim commented Mar 1, 2017

ash211 commented Mar 2, 2017

ash211 commented Mar 2, 2017 •

edited

Loading

OOM in Rest server when launching a job with large resources #146

OOM in Rest server when launching a job with large resources #146

Comments

ash211 commented Feb 23, 2017

ash211 commented Feb 23, 2017

mccheah commented Feb 24, 2017

ash211 commented Feb 24, 2017 • edited Loading

ash211 commented Feb 28, 2017

kimoonkim commented Mar 1, 2017

ash211 commented Mar 2, 2017

ash211 commented Mar 2, 2017 • edited Loading

ash211 commented Feb 24, 2017 •

edited

Loading

ash211 commented Mar 2, 2017 •

edited

Loading