Revert back to Spark 2.3 #399

tovbinm · 2019-08-30T20:46:16Z

Related issues
We are not ready for Spark 2.4 (#327)

Describe the proposed solution
Reverting to Spark 2.3 for now.
I will raise another PR with the 2.4 so we can have it ready to go once needed.

Describe alternatives you've considered
N/A

…to mt/spark-2.4

… made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b'

…to mt/spark-2.4

…spark-2.4

This reverts commit 3e02bf7.

codecov · 2019-08-30T21:23:07Z

Codecov Report

Merging #399 into master will decrease coverage by 12.02%.
The diff coverage is 92.85%.

@@             Coverage Diff             @@
##           master     #399       +/-   ##
===========================================
- Coverage   86.89%   74.87%   -12.03%     
===========================================
  Files         337      337               
  Lines       11076    11054       -22     
  Branches      351      590      +239     
===========================================
- Hits         9625     8277     -1348     
- Misses       1451     2777     +1326

Impacted Files	Coverage Δ
...sforce/op/stages/impl/selector/ModelSelector.scala	`98.18% <ø> (ø)`	⬆️
...sforce/op/stages/OpPipelineStageReaderWriter.scala	`86.2% <ø> (ø)`	⬆️
...sql/execution/datasources/csv/CSVSchemaUtils.scala	`100% <ø> (ø)`	⬆️
...la/com/salesforce/op/utils/spark/RichDataset.scala	`87.09% <ø> (+0.94%)`	⬆️
...m/salesforce/op/stages/OpPipelineStageReader.scala	`59.09% <0%> (+2.56%)`	⬆️
...ce/op/stages/impl/classification/OpLinearSVC.scala	`77.27% <100%> (ø)`	⬆️
...ges/sparkwrappers/specific/OpPredictionModel.scala	`100% <100%> (ø)`	⬆️
...ala/com/salesforce/op/utils/io/csv/CSVToAvro.scala	`87.87% <100%> (ø)`	⬆️
...ages/impl/regression/OpDecisionTreeRegressor.scala	`53.84% <100%> (+3.84%)`	⬆️
...rce/op/stages/impl/regression/OpGBTRegressor.scala	`50% <100%> (-3.34%)`	⬇️
... and 105 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51037a8...2f25962. Read the comment docs.

gerashegalov

LGTM

This reverts commit 95a77b1.

Bug fixes: - Ensure correct metrics despite model failures on some CV folds [#404](#404) - Fix flaky `ModelInsight` tests [#395](#395) - Avoid creating `SparseVector`s for LOCO [#377](#377) New features / updates: - Model combiner [#385](#399) - Added new sample for HousingPrices [#365](#365) - Test to verify that custom metrics appear in model insight metrics [#387](#387) - Add `FeatureDistribution` to `SerializationFormat`s [#383](#383) - Add metadata to `OpStandadrdScaler` to allow for descaling [#378](#378) - Improve json serde error in `evalMetFromJson` [#380](#380) - Track mean & standard deviation as metrics for numeric features and for text length of text features [#354](#354) - Making model selectors robust to failing models [#372](#372) - Use compact and compressed model json by default [#375](#375) - Descale feature contribution for Linear Regression & Logistic Regression [#345](#345) Dependency updates: - Update tika version [#382](#382)

koertkuipers · 2019-09-15T20:37:09Z

Related issues
We are not ready for Spark 2.4 (#327)

Describe the proposed solution
Reverting to Spark 2.3 for now.
I will raise another PR with the 2.4 so we can have it ready to go once needed.

Describe alternatives you've considered
N/A

curious to know why we are not ready for spark 2.4? i didnt observe any issues

tovbinm · 2019-09-16T17:28:02Z

The main suite of products that use TransmogrifAI @ Salesforce requires Spark 2.3. Once they are ready to get upgrade we will move to 2.4.

* Revert "Revert back to Spark 2.3 (#399)" This reverts commit 95a77b1. * Update to Spark 2.4.3 and XGBoost 0.90 * special double serializer fix * fix serialization * fix serialization * docs * fixed missng value for test * meta fix * Updated DecisionTreeNumericMapBucketizer test to deal with the change made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b' * fix params meta test * FIxed failing xgboost test * ident * cleanup * added dataframe reader and writer extensions * added const * cherrypick fixes * added xgboost params + update models to use public predict method * blarg * double ser test * update mleap and spark testing base * Update README.md * type fix * bump minor version * Update Spark version in the README * bump version * Update build.gradle * Update pom.xml * set correct json4s version * upgrade helloworld deps * upgrade notebook deps on TMog and Spark * bump to version 0.7.0 for Spark update * align helloworld dependencies * align helloworld dependencies * get -> getOrElse with exception * fix helloworld compilation * Spark 2.4.5 * Spark 2.4.5 * Spark 2.4.5 * Update OpTitanicSimple.ipynb * Update OpIris.ipynb * Revert "Spark 2.4.5" This reverts commit b3c0a74. * Revert "Spark 2.4.5" This reverts commit f4ab3fd. * Revert "Spark 2.4.5" This reverts commit 50d9dfb. * Revert "Update OpTitanicSimple.ipynb" This reverts commit 3417972. * Revert "Update OpIris.ipynb" This reverts commit df38bcc. Co-authored-by: Christopher Suchanek <cris.suchanek@gmail.com> Co-authored-by: Kevin Moore <jauntbox@gmail.com> Co-authored-by: Nico de Vos <njdevos@gmail.com>

* Revert "Revert back to Spark 2.3 (#399)" This reverts commit 95a77b1. * Update to Spark 2.4.3 and XGBoost 0.90 * special double serializer fix * fix serialization * fix serialization * docs * fixed missng value for test * meta fix * Updated DecisionTreeNumericMapBucketizer test to deal with the change made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b' * fix params meta test * FIxed failing xgboost test * ident * cleanup * added dataframe reader and writer extensions * added const * cherrypick fixes * added xgboost params + update models to use public predict method * blarg * double ser test * update mleap and spark testing base * Update README.md * type fix * bump minor version * Update Spark version in the README * bump version * Update build.gradle * Update pom.xml * set correct json4s version * upgrade helloworld deps * upgrade notebook deps on TMog and Spark * bump to version 0.7.0 for Spark update * align helloworld dependencies * align helloworld dependencies * get -> getOrElse with exception * fix helloworld compilation * style * WIP release notes * TMog version bump * update release notes * update release notes * updates to changelog * updates to changelog * updates to changelog * updates to changelog * updates to changelog * updates to changelog * fix changelog * fix changelog * keep helloworld on 0.6.1 until release Co-authored-by: Matthew Tovbin <tovbinm@users.noreply.github.com> Co-authored-by: Matthew Tovbin <mtovbin@salesforce.com> Co-authored-by: Christopher Suchanek <cris.suchanek@gmail.com> Co-authored-by: Kevin Moore <kevinmoore@salesforce.com> Co-authored-by: Matthew Tovbin <tovbinm@gmail.com>

salesforce-cla · 2020-10-27T07:58:10Z

Thanks for the contribution! Before we can merge this, we need @wsuchy to sign the Salesforce.com Contributor License Agreement.

salesforce-cla · 2020-10-27T07:58:10Z

Thanks for the contribution! It looks like @Jauntbox is an internal user so signing the CLA is not required. However, we need to confirm this.

tovbinm and others added 30 commits May 30, 2019 13:48

Update to Spark 2.4.3 and XGBoost 0.90

f6264a7

special double serializer fix

685d6e1

fix serialization

e62772d

fix serialization

69247ac

docs

330bf50

fixed missng value for test

d6b0723

meta fix

63b77b5

Merge branch 'mt/spark-2.4' of github.com:salesforce/TransmogrifAI in…

4e46e31

…to mt/spark-2.4

Updated DecisionTreeNumericMapBucketizer test to deal with the change…

5a528e1

… made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b'

Merge branch 'mt/spark-2.4' of github.com:salesforce/TransmogrifAI in…

5f39603

…to mt/spark-2.4

fix params meta test

0d1a0c0

FIxed failing xgboost test

0a4f906

Merge branch 'mt/spark-2.4' of github.com:salesforce/TransmogrifAI in…

660db62

…to mt/spark-2.4

ident

3ecca64

cleanup

507503a

added dataframe reader and writer extensions

348a392

added const

f43cb26

Merge branch 'master' into mt/spark-2.4

4455034

Merge branch 'master' into mt/spark-2.4

a0978bf

Merge branch 'master' of github.com:salesforce/TransmogrifAI into mt/…

b27b47a

…spark-2.4

added xgboost params + update models to use public predict method

6535e4e

blarg

d1d7b9a

double ser test

ac75e15

Merge branch 'master' into mt/spark-2.4

bf056e5

update mleap and spark testing base

ea84a02

Merge branch 'master' into mt/spark-2.4

3e65e4a

Merge branch 'master' into mt/spark-2.4

0bb829d

Merge branch 'master' into mt/spark-2.4

8cfa694

Merge branch 'master' into mt/spark-2.4

fd440b7

Merge branch 'master' into mt/spark-2.4

5936981

tovbinm and others added 3 commits July 16, 2019 04:56

Update README.md

0aa80f9

Merge branch 'master' of github.com:salesforce/TransmogrifAI into mt/…

8c8ff88

…spark-2.4

Revert "Update to Spark 2.4.3 + XGBoost 0.90 + MLeap 0.14 (#327)"

cf2ea05

This reverts commit 3e02bf7.

tovbinm requested review from gerashegalov, Jauntbox, leahmcguire and wsuchy as code owners August 30, 2019 20:46

tovbinm added the ready for review label Aug 30, 2019

added is empty

17a53a5

Merge branch 'master' into mt/revert-spark-2.4

1ad7b1b

gerashegalov approved these changes Sep 3, 2019

View reviewed changes

Merge branch 'master' into mt/revert-spark-2.4

2f25962

leahmcguire approved these changes Sep 3, 2019

View reviewed changes

tovbinm merged commit 95a77b1 into master Sep 4, 2019

tovbinm deleted the mt/revert-spark-2.4 branch September 4, 2019 05:10

tovbinm restored the mt/revert-spark-2.4 branch September 4, 2019 05:10

tovbinm added a commit that referenced this pull request Sep 4, 2019

Revert "Revert back to Spark 2.3 (#399)"

77e2229

This reverts commit 95a77b1.

gerashegalov mentioned this pull request Sep 8, 2019

0.6.1 release #403

Merged

tovbinm deleted the mt/revert-spark-2.4 branch September 16, 2019 17:28

salesforce-cla bot added the cla:signed label Feb 8, 2020

salesforce-cla bot removed the cla:signed label Oct 27, 2020

salesforce-cla bot added the cla:missing label Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert back to Spark 2.3 #399

Revert back to Spark 2.3 #399

tovbinm commented Aug 30, 2019 •

edited

Loading

codecov bot commented Aug 30, 2019 •

edited

Loading

gerashegalov left a comment

koertkuipers commented Sep 15, 2019

tovbinm commented Sep 16, 2019

salesforce-cla bot commented Oct 27, 2020

salesforce-cla bot commented Oct 27, 2020

Revert back to Spark 2.3 #399

Revert back to Spark 2.3 #399

Conversation

tovbinm commented Aug 30, 2019 • edited Loading

codecov bot commented Aug 30, 2019 • edited Loading

Codecov Report

gerashegalov left a comment

Choose a reason for hiding this comment

koertkuipers commented Sep 15, 2019

tovbinm commented Sep 16, 2019

salesforce-cla bot commented Oct 27, 2020

salesforce-cla bot commented Oct 27, 2020

tovbinm commented Aug 30, 2019 •

edited

Loading

codecov bot commented Aug 30, 2019 •

edited

Loading