-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply DateToUnitCircleTransformer logic in raw feature transformations. #130
Conversation
Thanks for the contribution! Unfortunately we can't verify the commit author(s): marcovivero <m***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request. |
Codecov Report
@@ Coverage Diff @@
## master #130 +/- ##
=========================================
Coverage ? 37.87%
=========================================
Files ? 298
Lines ? 9692
Branches ? 552
=========================================
Hits ? 3671
Misses ? 6021
Partials ? 0
Continue to review full report at Codecov.
|
@@ -30,9 +30,14 @@ | |||
|
|||
package com.salesforce.op.filters | |||
|
|||
|
|||
import java.time.{Instant, OffsetDateTime, ZoneOffset} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do not use java.time
. instead use joda time
@@ -183,4 +200,38 @@ private[filters] object PreparedFeatures { | |||
* @return array of string tokens | |||
*/ | |||
private def tokenize(s: String) = TextTokenizer.Analyzer.analyze(s, Language.Unknown) | |||
|
|||
private def prepareDateValue(timestamp: Long, timePeriod: Option[TimePeriod]): Double = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please make DateToUnitCircle.convertToRandians
reuable instead? this is essentially the same code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -521,12 +522,15 @@ class OpWorkflow(val uid: String = UID[OpWorkflow]) extends OpWorkflowCore { | |||
maxCorrelation: Double = 0.95, | |||
correlationType: CorrelationType = CorrelationType.Pearson, | |||
protectedFeatures: Array[OPFeature] = Array.empty, | |||
textBinsFormula: (Summary, Int) => Int = RawFeatureFilter.textBinsFormula | |||
protectedJSFeatures: Array[OPFeature] = Array.empty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- how is
protectedJSFeatures
different fromprotectedFeatures
? - please add docs for
protectedJSFeatures
param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already enabled in RawFeatureFilter
, it protects features from JS divergence check. I didn't see it OpWorkflow.withRawFeatureFilter
, just making sure it's here as well.
row: Row, | ||
responses: Array[TransientFeature], | ||
predictors: Array[TransientFeature], | ||
timePeriod: Option[TimePeriod]): PreparedFeatures = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for timePeriod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
private def prepareFeature[T <: FeatureType]( | ||
name: String, | ||
value: T, | ||
timePeriod: Option[TimePeriod]): Map[FeatureKey, ProcessedSeq] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for timePeriod
again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
value match { | ||
case v: Text => v.value | ||
.map(s => Map[FeatureKey, ProcessedSeq]((name, None) -> Left(tokenize(s)))).getOrElse(Map.empty) | ||
case v: Date => v.value.map { timestamp => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why Text, Date and OPNumeric are handled differently than the other values which require some value to be present case ft@SomeValue(_)
below? @leahmcguire @marcovivero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is because all of the values inside case ft@SomeValue(_)
already have ft.value
as sequences, we'll end up needing to do the same operation to unwrap the Option[Double]
.
@@ -93,7 +94,8 @@ class RawFeatureFilter[T] | |||
val correlationType: CorrelationType = CorrelationType.Pearson, | |||
val jsDivergenceProtectedFeatures: Set[String] = Set.empty, | |||
val protectedFeatures: Set[String] = Set.empty, | |||
val textBinsFormula: (Summary, Int) => Int = RawFeatureFilter.textBinsFormula | |||
val textBinsFormula: (Summary, Int) => Int = RawFeatureFilter.textBinsFormula, | |||
val timePeriod: Option[TimePeriod] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for timePeriod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, please add timePeriod
to OpWorkflow.withRawFeatureFilter
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -157,6 +162,68 @@ class PreparedFeaturesTest extends FlatSpec with TestSparkContext { | |||
testCorrMatrix(allResponseKeys2, CorrelationType.Spearman, expected) | |||
} | |||
|
|||
it should "correctly transform date features when time period is specified" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test has to be split into separate cases or be a for loop not a copy/paste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, there's a single function for running each check.
…I-1 into mv/datetime-rff
…into mv/datetime-rff
@@ -157,6 +162,66 @@ class PreparedFeaturesTest extends FlatSpec with TestSparkContext { | |||
testCorrMatrix(allResponseKeys2, CorrelationType.Spearman, expected) | |||
} | |||
|
|||
it should "correctly transform date features when time period DayOfMonth is specified" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shorten test names to maybe: transform dates with DayOfMonth time period
|
||
def runDateToUnitCircleTest( | ||
period: TimePeriod, | ||
expected1: Double, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make expected: Double*
or expected: Seq[Double]
@@ -100,23 +100,31 @@ class DateToUnitCircleTransformer[T <: Date] | |||
} | |||
} | |||
|
|||
private[op] object DateToUnitCircle { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove private[op]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh woops, I needed to use this with repl
at some point
case SomeValue(v: DenseVector) => Map((name, None) -> Right(v.toArray.toSeq)) | ||
case SomeValue(v: SparseVector) => Map((name, None) -> Right(v.indices.map(_.toDouble).toSeq)) | ||
case ft@SomeValue(_) => ft match { | ||
case v: Text => Map((name, None) -> Left(v.value.toSeq.flatMap(tokenize))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! @marcovivero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
I added |
Related issues
There is currently no related issue open for this, this is an enhancement.
Describe the proposed solution
Add the option to apply circular date time transformer in
RawFeatureFilter
to anyDate
,DateTime
,DateList
, orDateMap
feature type. This is the same logic utilized forDateToUnitCircleTransformer
:https://github.com/salesforce/TransmogrifAI/blob/master/core/src/main/scala/com/salesforce/op/stages/impl/feature/DateToUnitCircleTransformer.scala
This is currently turned off by default and should be up to user to turn on in their respective workflows.
Describe alternatives you've considered
We currently use the standard approach to processing
Numeric
types, i.e. just mapping these to doubles. This may lead to issues when comparing different distributions over time.Additional context
Add any other context about the changes here.