Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

OOM in Rest server when launching a job with large resources #146

Open
ash211 opened this issue Feb 23, 2017 · 7 comments
Open

OOM in Rest server when launching a job with large resources #146

ash211 opened this issue Feb 23, 2017 · 7 comments

Comments

@ash211
Copy link

ash211 commented Feb 23, 2017

Seen while testing an internal application

2017-02-23 23:00:06 WARN  ServletHandler:667 - Error for /v1/submissions/create
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Arrays.java:3664)
	at java.lang.String.<init>(String.java:207)
	at java.lang.StringBuilder.toString(StringBuilder.java:407)
	at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:356)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:20)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3736)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726)
	at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:20)
	at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:50)
	at org.apache.spark.deploy.rest.SubmitRestProtocolMessage$.parseAction(SubmitRestProtocolMessage.scala:112)
	at org.apache.spark.deploy.rest.SubmitRestProtocolMessage$.fromJson(SubmitRestProtocolMessage.scala:130)
	at org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:283)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
	at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
	at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
	at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
	at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
	at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.spark_project.jetty.server.Server.handle(Server.java:499)
	at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
	at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
	at org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
	at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
	at org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
@ash211
Copy link
Author

ash211 commented Feb 23, 2017

cc @mccheah

@ash211 ash211 changed the title OOM in Rest server with large resources OOM in Rest server when launching a job with large resources Feb 23, 2017
@mccheah
Copy link

mccheah commented Feb 24, 2017

We don't set the driver memory - probably something we should do, and probably is defaulting to the something very small right now.

@ash211
Copy link
Author

ash211 commented Feb 24, 2017

We could set the rest service memory but I think a better approach would be to be more efficient with memory in this tiny little server. So pass around an InputStream and try to scan through the stream only once when needed to deserialize things.

In RestSubmissionServer.scala:282-283 Spark collects an InputStream into a JSON String and then deserializes the String into a SubmitRestProtocolMessage, converting it into a JSON object twice along the way (SubmitRestProcolMessage.scala:112 and SubmitRestProcolMessage.scala:144). Holding those copies is pretty lightweight in upstream Spark since they're small, but given we put the jar and file tarballs into the request in our KubernetesCreateSubmissionRequest the memory pressure is pushing us over the edge.

What's the right thing to do here?

@ash211
Copy link
Author

ash211 commented Feb 28, 2017

Workaround up at #161 to make driver submission server's memory configurable while we work out how to handle the large messages more efficiently.

@kimoonkim
Copy link
Member

In the SIG meeting, I alluded to separate JVM options controlling the max size of single allocations. I think they are -XX:NewSize and -XX:MaxNewSize (documentation link).

@ash211
Copy link
Author

ash211 commented Mar 2, 2017

Sending as byte array instead of base64 we had filed earlier as #81

@ash211
Copy link
Author

ash211 commented Mar 2, 2017

Coming out of the SIG meeting, the remediation plan was to:

  1. make rest server heap configurable (Allow setting memory on the driver submission server. #161)
  2. send as byte array instead of base64 (Upload jars/files to driver in binary instead of base64 #81)
  3. bounce the request off disk to prevent OOMs

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants