-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch #582
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
I think it is good to remove |
@mengxr Just did some hack on trying to implement the right "stochastic" L-BFGS, and it kind of works as long as we don't change the objective function. But there is no good way to know which LBFGS step it is to keep the objective function the same in line search step, so I need to do some injection as David suggest. See dbtsai@0c699f2 What do you think now? Just remove the miniBatchFraction since the RDD sample is even not efficient? |
I prefer removing the miniBatchFraction. Those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective. |
Make sense from the inverse of hessian point of view. Just remove it! |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@dbtsai Could you please update the |
LGTM. Thanks! |
Thanks, merged. |
…ps, and remove miniBatch Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective. Author: DB Tsai <dbtsai@alpinenow.com> Closes #582 from dbtsai/dbtsai-lbfgs-bug and squashes the following commits: 9cc6cf9 [DB Tsai] Removed the miniBatch in LBFGS. 1ba6a33 [DB Tsai] Formatting the code. d72c679 [DB Tsai] Using Breeze's states to get the loss. (cherry picked from commit 910a13b) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
…ps, and remove miniBatch Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective. Author: DB Tsai <dbtsai@alpinenow.com> Closes apache#582 from dbtsai/dbtsai-lbfgs-bug and squashes the following commits: 9cc6cf9 [DB Tsai] Removed the miniBatch in LBFGS. 1ba6a33 [DB Tsai] Formatting the code. d72c679 [DB Tsai] Using Breeze's states to get the loss.
The aim of the Json4s project is to provide a common API for Scala JSON libraries. It is Apache-licensed, easier for downstream distributions to package, and mostly API-compatible with lift-json. Furthermore, the Jackson-backed implementation parses faster than lift-json on all but the smallest inputs. Author: William Benton <willb@redhat.com> Closes apache#582 from willb/json4s and squashes the following commits: 7ca62c4 [William Benton] Replace lift-json with json4s-jackson. Conflicts: core/src/main/scala/org/apache/spark/deploy/master/ui/ApplicationPage.scala core/src/main/scala/org/apache/spark/deploy/master/ui/IndexPage.scala core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
## Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain) - [[SPARK-26887][SQL][PYTHON][NS] Create datetime.date directly insteadof creating datetime64 as intermediate data.](palantir@02f03b7) (merged Feb 18) - [[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF](palantir@e8193ed) (merged Mar 7) <= Main Commit - [[SPARK-27163][PYTHON] Cleanup and consolidate Pandas UDF functionality](palantir@8d69b8c) (merged Mar 21) - [[SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.](palantir@adb3a01) (merged Mar 25) ## What changes were proposed in this pull request? This is a series of 4 cherry-picks all related to pandas_udfs. The cherry-picks denote an isolated code change and were applied without any conflicts / manual edits. They allow to accept and return StructTypes in pandas_udfs. This is heavily required by the concept in FoundryML that all Stages can act on pandas or pyspark dataframes interchangeably.
I have tried in all-in-one env, this is OK, lets try.
Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective.