Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example from “Adv. Analytics with Spark”, Chapter 9 fails #64

Closed
mosicr opened this issue Mar 9, 2016 · 17 comments
Closed

Example from “Adv. Analytics with Spark”, Chapter 9 fails #64

mosicr opened this issue Mar 9, 2016 · 17 comments
Assignees
Milestone

Comments

@mosicr
Copy link

mosicr commented Mar 9, 2016

( Running the trials part )

val trials = seedRdd.flatMap(trialReturns(_, numTrials / parallelism, bFactorWeights.value, factorMeans, factorCov))

org.apache.spark.SparkException: Task not serializable

@mosicr
Copy link
Author

mosicr commented Mar 9, 2016

More detail:
Caused by: java.io.NotSerializableException: org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression
Serialization stack:
- object not serializable (class: org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression, value:
org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression@5ed389d6)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: scala.collection.mutable.ArrayBuffer, name: array, type: class [Ljava.lang.Object;)
- object (class scala.collection.mutable.ArrayBuffer, ArrayBuffer(org.apache.commons.math3.stat.regression.OLSM
ultipleLinearRegression@5ed389d6))

@srowen
Copy link
Collaborator

srowen commented Mar 9, 2016

Thanks for the detail. I suspect I can find a way to edit the code to avoid whatever causes this in all cases.

@srowen
Copy link
Collaborator

srowen commented Mar 9, 2016

How are you running this -- in the shell? I don't see here how it would become part of something that's serialized.

@mosicr
Copy link
Author

mosicr commented Mar 9, 2016

Yes, I am running it in the shell.
spark-shell --jars ./jars/nscala-time_2.10-0.2.0.jar ./jars/jfreechart-1.0.14.jar ./jars/breeze_2.11-0.11.2.j
ar

then
:load /home/zkvsijk/source/MonteCarloSource

MonteCarloSource contains code from Chapter 9.

@srowen
Copy link
Collaborator

srowen commented Mar 9, 2016

Can you share this source file so I see exactly what you're running?

@mosicr
Copy link
Author

mosicr commented Mar 9, 2016

MonteCarloSource.scala // source file
MonteCarlo.out // output
MonteCarlo.zip

@srowen
Copy link
Collaborator

srowen commented Mar 10, 2016

@sryza can I ask you to look at this? I can't find the crude oil TSV data for example, in order to reproduce this from the source code above.

It shouldn't be a problem, but in the shell, something's causing the model objects to get into a closure even though it's not used with Spark. As a guess at a workaround, you might avoid making the models reference:

val models = stocksReturns.map(linearModel(_, factorFeatures))
val factorWeights = models.map(_.estimateRegressionParameters()).
  toArray

to

val factorWeights = stocksReturns.map(linearModel(_, factorFeatures)).map(_.estimateRegressionParameters()).
  toArray

Shouldn't be necessary but we're trying to avoid issues with the closure that are specific to the shell.

@mosicr
Copy link
Author

mosicr commented Mar 10, 2016

Tried above, fails with:
linearModel: (instrument: Array[Double], factorMatrix: Array[Array[Double]])org.apache.commons.math3.stat.regression.OLSMultipleLinear
Regression
:44: error: not found: value toArray
toArray
^

@srowen
Copy link
Collaborator

srowen commented Mar 10, 2016

Oh, I think you copied and pasted it literally with the line break. It needs to be one command. It's just trying to avoid the models reference.

@mosicr
Copy link
Author

mosicr commented Mar 10, 2016

It is all in one line:
val factorWeights = stocksReturns.map(linearModel(_, factorFeatures)).map(_.estimateRegressionParameters()).toArray
still getting:
linearModel: (instrument: Array[Double], factorMatrix: Array[Array[Double]])org.apache.commons.math3.stat.regression.OLSMultipleLinear
Regression
:72: error: not found: value toArray
toArray
^

@srowen
Copy link
Collaborator

srowen commented Mar 10, 2016

Hm, that does mean "toArray" has been entered as a statement by itself. Are you certain? The statement here should not be able to generate that error.

@mosicr
Copy link
Author

mosicr commented Mar 10, 2016

Yes, I am quite certain:
image

@srowen
Copy link
Collaborator

srowen commented Mar 10, 2016

That code should be valid and equivalent to the existing code. Right? Something else funny must be going on in how it is being entered in the shell.

@srowen
Copy link
Collaborator

srowen commented Apr 17, 2016

I don't know how to proceed on this one except to make the change I suggested above. It can't hurt, at least. Will do that. At least it would give you an unambiguously working piece of code to compare against.

srowen added a commit that referenced this issue Apr 17, 2016
…th text. And updates Spark to 1.6.1 and updates some plugins
@srowen srowen assigned srowen and unassigned sryza Apr 17, 2016
@srowen srowen added this to the 1.0.2 milestone Apr 17, 2016
@srowen srowen closed this as completed Apr 17, 2016
@mukesh-ranjan
Copy link

Hello Everyone,

Does this book have Python version of code? If yes please help me to get the copies of those.

Thanks & Best Regards
Mukesh Ranjan

@sryza
Copy link
Owner

sryza commented May 25, 2016

Hi @mukesh-ranjan,

The book only has Scala versions of the code.

-Sandy

@feelmercy
Copy link

Hi @srowen,
I got this error:
"object not serializable (class: org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression, value: org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression"

You are right,it's models referencence.
I use your code "val factorWeights = stocksReturns.map(linearModel(,factorFeatures)).map(.estimateRegressionParameters()).toArray"
to run
"val trials = seedRdd.flatMap..." successfully.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants