Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch #582

Closed
wants to merge 3 commits into from

Conversation

dbtsai
Copy link
Member

@dbtsai dbtsai commented Apr 28, 2014

Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14541/

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2014

I think it is good to remove miniBatchFraction from LBFGS's params in this PR, unless someone has a good understanding of the behavior of "stochastic" L-BFGS.

@dbtsai
Copy link
Member Author

dbtsai commented Apr 29, 2014

@mengxr Just did some hack on trying to implement the right "stochastic" L-BFGS, and it kind of works as long as we don't change the objective function. But there is no good way to know which LBFGS step it is to keep the objective function the same in line search step, so I need to do some injection as David suggest. See dbtsai@0c699f2

What do you think now? Just remove the miniBatchFraction since the RDD sample is even not efficient?

@mengxr
Copy link
Contributor

mengxr commented Apr 30, 2014

I prefer removing the miniBatchFraction. Those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective.

@dbtsai
Copy link
Member Author

dbtsai commented Apr 30, 2014

Make sense from the inverse of hessian point of view. Just remove it!

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@dbtsai dbtsai changed the title [SPARK-1157][MLlib] Bug fix: lossHistory should be monotonically decresing [SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch Apr 30, 2014
@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14577/

@mengxr
Copy link
Contributor

mengxr commented May 6, 2014

@dbtsai Could you please update the mllib-optimization.md and include an example of L-BFGS?

@mengxr
Copy link
Contributor

mengxr commented May 9, 2014

LGTM. Thanks!

@pwendell
Copy link
Contributor

pwendell commented May 9, 2014

Thanks, merged.

asfgit pushed a commit that referenced this pull request May 9, 2014
…ps, and remove miniBatch

Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective.

Author: DB Tsai <dbtsai@alpinenow.com>

Closes #582 from dbtsai/dbtsai-lbfgs-bug and squashes the following commits:

9cc6cf9 [DB Tsai] Removed the miniBatch in LBFGS.
1ba6a33 [DB Tsai] Formatting the code.
d72c679 [DB Tsai] Using Breeze's states to get the loss.
(cherry picked from commit 910a13b)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@asfgit asfgit closed this in 910a13b May 9, 2014
@dbtsai dbtsai deleted the dbtsai-lbfgs-bug branch May 10, 2014 07:34
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…ps, and remove miniBatch

Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective.

Author: DB Tsai <dbtsai@alpinenow.com>

Closes apache#582 from dbtsai/dbtsai-lbfgs-bug and squashes the following commits:

9cc6cf9 [DB Tsai] Removed the miniBatch in LBFGS.
1ba6a33 [DB Tsai] Formatting the code.
d72c679 [DB Tsai] Using Breeze's states to get the loss.
gzm55 pushed a commit to MediaV/spark that referenced this pull request Jul 17, 2014
The aim of the Json4s project is to provide a common API for
Scala JSON libraries.  It is Apache-licensed, easier for
downstream distributions to package, and mostly API-compatible
with lift-json.  Furthermore, the Jackson-backed implementation
parses faster than lift-json on all but the smallest inputs.

Author: William Benton <willb@redhat.com>

Closes apache#582 from willb/json4s and squashes the following commits:

7ca62c4 [William Benton] Replace lift-json with json4s-jackson.

Conflicts:
	core/src/main/scala/org/apache/spark/deploy/master/ui/ApplicationPage.scala
	core/src/main/scala/org/apache/spark/deploy/master/ui/IndexPage.scala
	core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
helenyugithub pushed a commit to helenyugithub/spark that referenced this pull request Aug 20, 2019
## Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain)

- [[SPARK-26887][SQL][PYTHON][NS] Create datetime.date directly insteadof creating datetime64 as intermediate data.](palantir@02f03b7) (merged Feb 18)

- [[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF](palantir@e8193ed) (merged Mar 7) <= Main Commit

- [[SPARK-27163][PYTHON] Cleanup and consolidate Pandas UDF functionality](palantir@8d69b8c) (merged Mar 21)

- [[SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.](palantir@adb3a01) (merged Mar 25)

## What changes were proposed in this pull request?

This is a series of 4 cherry-picks all related to pandas_udfs. The cherry-picks denote an isolated code change and were applied without any conflicts / manual edits. 
They allow to accept and return StructTypes in pandas_udfs. This is heavily required by the concept in FoundryML that all Stages can act on pandas or pyspark dataframes interchangeably.
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
I have tried in all-in-one env, this is OK, lets try.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants