-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 2.4 support #402
Spark 2.4 support #402
Conversation
This reverts commit 95a77b1.
Codecov Report
@@ Coverage Diff @@
## master #402 +/- ##
==========================================
- Coverage 86.9% 81.59% -5.32%
==========================================
Files 340 341 +1
Lines 11450 11507 +57
Branches 364 374 +10
==========================================
- Hits 9951 9389 -562
- Misses 1499 2118 +619
Continue to review full report at Codecov.
|
… made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b'
@@ -128,7 +128,7 @@ Start by picking TransmogrifAI version to match your project dependencies from t | |||
|
|||
| TransmogrifAI Version | Spark Version | Scala Version | Java Version | | |||
|-------------------------------------------------------|:-------------:|:-------------:|:------------:| | |||
| 0.6.2 (unreleased, master) | 2.3 | 2.11 | 1.8 | | |||
| 0.7.0 (unreleased, master) | 2.4 | 2.11 | 1.8 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO for a future PR: I really hope we start making strides to automating this stuff. We should have here placeholders that are populated from build. Even if it's some quick and dirty version of sed
script
@@ -69,20 +69,16 @@ task copyTemplates(type: Copy) { | |||
fileName.replace(".gradle.template", ".gradle") | |||
} | |||
expand([ | |||
databaseHostname: 'db.company.com', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we absolutely need to change this file ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a cleanup, why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the PR is big enough, and for easier maintenance I hoped to keep it focussed on the Spark upgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this has no material difference for the change. You always get a bonus of such with my PRs 😛
@@ -138,7 +138,7 @@ case class AutomaticSchema(recordClassName: String)(dataFile: File) extends Sche | |||
case Some(actualType) => | |||
val newSchema = Schema.create(actualType) | |||
val schemaField = | |||
new Schema.Field(field.name, newSchema, "auto-generated", orgSchemaField.defaultValue) | |||
new Schema.Field(field.name, newSchema, "auto-generated", orgSchemaField.defaultVal()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this change? defaultValue
is deprecated but still there, technically not required to change until it's dropped entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am keeping the existing functionality as is. We can change the behavior separately.
core/src/main/scala/com/salesforce/op/stages/impl/classification/OpLinearSVC.scala
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/classification/OpXGBoostClassifier.scala
Show resolved
Hide resolved
core/src/test/scala/com/salesforce/op/stages/impl/classification/OpClassifierModelTest.scala
Show resolved
Hide resolved
@gerashegalov thanks for reviewing. I hope we can get this merged and released soon. Especially because I don't even see an option now to download Spark lower than 2.4 from the official website, so our users might struggle to run current version of TransmogrifAI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGETM
@nicodv feel free to address the comments that make sense now (or not) and merge. |
@tovbinm Do you have an ETA on merging this? I'm using Spark 2.4, but would love to get started doing a proposed POC with TransmogrifAI so am waiting excitedly! |
I bumped Spark to 2.4.5 (it was just recently released). Should we bump other dependencies as well, e.g. mleap? |
@tovbinm , could you please revert the commits related to Spark 2.4.5? We're planning to do an internal release of TMog on 2.4.4 soon, and don't want to redo any testing / take on additional risk (even if it's just a patch release). We can do a separate upgrade of TMog to Spark 2.4.5 soon after that in a separate PR. |
@nicodv ok |
Woohooo! |
Made my day! |
Thanks for the contribution! It looks like @Jauntbox is an internal user so signing the CLA is not required. However, we need to confirm this. |
Thanks for the contribution! Before we can merge this, we need @wsuchy to sign the Salesforce.com Contributor License Agreement. |
Related issues
We are currently on Spark 2.3, but 2.4 is out
Describe the proposed solution
Upgrade to Spark 2.4
Describe alternatives you've considered
N/A
Additional Context
This branch to be parked here until we are ready to upgrade to Spark 2.4