[SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch #582

dbtsai · 2014-04-28T21:58:13Z

Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective.

AmplabJenkins · 2014-04-28T22:02:57Z

Merged build triggered.

AmplabJenkins · 2014-04-28T22:03:04Z

Merged build started.

AmplabJenkins · 2014-04-28T22:41:40Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-28T22:41:41Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14541/

mengxr · 2014-04-29T05:34:53Z

I think it is good to remove miniBatchFraction from LBFGS's params in this PR, unless someone has a good understanding of the behavior of "stochastic" L-BFGS.

dbtsai · 2014-04-29T22:36:13Z

@mengxr Just did some hack on trying to implement the right "stochastic" L-BFGS, and it kind of works as long as we don't change the objective function. But there is no good way to know which LBFGS step it is to keep the objective function the same in line search step, so I need to do some injection as David suggest. See dbtsai@0c699f2

What do you think now? Just remove the miniBatchFraction since the RDD sample is even not efficient?

mengxr · 2014-04-30T00:41:30Z

I prefer removing the miniBatchFraction. Those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective.

dbtsai · 2014-04-30T01:21:50Z

Make sense from the inverse of hessian point of view. Just remove it!

AmplabJenkins · 2014-04-30T01:22:57Z

Merged build triggered.

AmplabJenkins · 2014-04-30T01:23:03Z

Merged build started.

AmplabJenkins · 2014-04-30T02:01:43Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-30T02:01:43Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14577/

mengxr · 2014-05-06T00:38:26Z

@dbtsai Could you please update the mllib-optimization.md and include an example of L-BFGS?

mengxr · 2014-05-09T00:48:19Z

LGTM. Thanks!

pwendell · 2014-05-09T00:54:47Z

Thanks, merged.

…ps, and remove miniBatch Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective. Author: DB Tsai <dbtsai@alpinenow.com> Closes #582 from dbtsai/dbtsai-lbfgs-bug and squashes the following commits: 9cc6cf9 [DB Tsai] Removed the miniBatch in LBFGS. 1ba6a33 [DB Tsai] Formatting the code. d72c679 [DB Tsai] Using Breeze's states to get the loss. (cherry picked from commit 910a13b) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

…ps, and remove miniBatch Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective. Author: DB Tsai <dbtsai@alpinenow.com> Closes apache#582 from dbtsai/dbtsai-lbfgs-bug and squashes the following commits: 9cc6cf9 [DB Tsai] Removed the miniBatch in LBFGS. 1ba6a33 [DB Tsai] Formatting the code. d72c679 [DB Tsai] Using Breeze's states to get the loss.

The aim of the Json4s project is to provide a common API for Scala JSON libraries. It is Apache-licensed, easier for downstream distributions to package, and mostly API-compatible with lift-json. Furthermore, the Jackson-backed implementation parses faster than lift-json on all but the smallest inputs. Author: William Benton <willb@redhat.com> Closes apache#582 from willb/json4s and squashes the following commits: 7ca62c4 [William Benton] Replace lift-json with json4s-jackson. Conflicts: core/src/main/scala/org/apache/spark/deploy/master/ui/ApplicationPage.scala core/src/main/scala/org/apache/spark/deploy/master/ui/IndexPage.scala core/src/main/scala/org/apache/spark/ui/JettyUtils.scala

## Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain) - [[SPARK-26887][SQL][PYTHON][NS] Create datetime.date directly insteadof creating datetime64 as intermediate data.](palantir@02f03b7) (merged Feb 18) - [[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF](palantir@e8193ed) (merged Mar 7) <= Main Commit - [[SPARK-27163][PYTHON] Cleanup and consolidate Pandas UDF functionality](palantir@8d69b8c) (merged Mar 21) - [[SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.](palantir@adb3a01) (merged Mar 25) ## What changes were proposed in this pull request? This is a series of 4 cherry-picks all related to pandas_udfs. The cherry-picks denote an isolated code change and were applied without any conflicts / manual edits. They allow to accept and return StructTypes in pandas_udfs. This is heavily required by the concept in FoundryML that all Stages can act on pandas or pyspark dataframes interchangeably.

I have tried in all-in-one env, this is OK, lets try.

DB Tsai added 2 commits April 28, 2014 13:36

Using Breeze's states to get the loss.

d72c679

Formatting the code.

1ba6a33

Removed the miniBatch in LBFGS.

9cc6cf9

dbtsai changed the title ~~[SPARK-1157][MLlib] Bug fix: lossHistory should be monotonically decresing~~ [SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch Apr 30, 2014

asfgit closed this in 910a13b May 9, 2014

dbtsai deleted the dbtsai-lbfgs-bug branch May 10, 2014 07:34

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Try to fix spark-minikube-k8s job (apache#582)

9b44191

I have tried in all-in-one env, this is OK, lets try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch #582

[SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch #582

dbtsai commented Apr 28, 2014

AmplabJenkins commented Apr 28, 2014

AmplabJenkins commented Apr 28, 2014

AmplabJenkins commented Apr 28, 2014

AmplabJenkins commented Apr 28, 2014

mengxr commented Apr 29, 2014

dbtsai commented Apr 29, 2014

mengxr commented Apr 30, 2014

dbtsai commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

mengxr commented May 6, 2014

mengxr commented May 9, 2014

pwendell commented May 9, 2014

[SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch #582

[SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch #582

Conversation

dbtsai commented Apr 28, 2014

AmplabJenkins commented Apr 28, 2014

AmplabJenkins commented Apr 28, 2014

AmplabJenkins commented Apr 28, 2014

AmplabJenkins commented Apr 28, 2014

mengxr commented Apr 29, 2014

dbtsai commented Apr 29, 2014

mengxr commented Apr 30, 2014

dbtsai commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

mengxr commented May 6, 2014

mengxr commented May 9, 2014

pwendell commented May 9, 2014