Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track mean & standard deviation of text length as a metric for text feature #354

Merged
merged 78 commits into from
Aug 2, 2019
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
d504ace
starter code
TuanNguyen27 Jul 2, 2019
9cd2790
spaghetti code
TuanNguyen27 Jul 2, 2019
61c52c7
better place to put avgTextLen
TuanNguyen27 Jul 3, 2019
bf1c283
first fix of unit test
TuanNguyen27 Jul 3, 2019
d64d087
fix most tests
TuanNguyen27 Jul 3, 2019
3f5da2c
fix some styles
TuanNguyen27 Jul 3, 2019
6bb0256
fix more style
TuanNguyen27 Jul 3, 2019
0f91a40
Merge branch 'master' into tn/cardinality
TuanNguyen27 Jul 3, 2019
cebf02d
handling division by zero
TuanNguyen27 Jul 3, 2019
c1e50ca
address comments
TuanNguyen27 Jul 5, 2019
75da25e
adding some doc on how to use text len cardinality
TuanNguyen27 Jul 8, 2019
2d7c233
Merge branch 'master' into tn/cardinality
TuanNguyen27 Jul 8, 2019
9e0e2f9
add default value for avg text len
TuanNguyen27 Jul 8, 2019
47ad700
add docs
TuanNguyen27 Jul 8, 2019
42a47ba
fix scala style
TuanNguyen27 Jul 8, 2019
9082b86
delete extra line
TuanNguyen27 Jul 8, 2019
1c2235e
Merge branch 'master' into tn/cardinality
TuanNguyen27 Jul 9, 2019
387f0ea
remove avgtextLength from doc
TuanNguyen27 Jul 9, 2019
fd64fd8
Merge branch 'tn/cardinality' of https://github.com/salesforce/Transm…
TuanNguyen27 Jul 9, 2019
0b27fed
starter code on moments & textstat
TuanNguyen27 Jul 10, 2019
3d1eea9
fix moments aggregation?
TuanNguyen27 Jul 10, 2019
af3d467
Merge branch 'master' into tn/cardinality
TuanNguyen27 Jul 11, 2019
34da327
still broken
TuanNguyen27 Jul 11, 2019
37ff005
Merge branch 'tn/cardinality' of https://github.com/salesforce/Transm…
TuanNguyen27 Jul 11, 2019
7a69049
Merge branch 'master' into tn/cardinality
leahmcguire Jul 11, 2019
0a0ba98
Merge branch 'master' into tn/cardinality
TuanNguyen27 Jul 12, 2019
35b6fe9
Merge branch 'master' into tn/cardinality
TuanNguyen27 Jul 12, 2019
06ae7e3
finsh semi group adding logic
TuanNguyen27 Jul 15, 2019
88c7726
Merge branch 'master' into tn/cardinality
TuanNguyen27 Jul 18, 2019
273fc8c
removing the old implementation
TuanNguyen27 Jul 19, 2019
6baf8a0
removing redundant code
TuanNguyen27 Jul 22, 2019
d2ad2ec
remove redundant changes to tests
TuanNguyen27 Jul 22, 2019
5c67bd2
remove more extra stuff
TuanNguyen27 Jul 22, 2019
4127a87
bump default value for maxCard here
TuanNguyen27 Jul 22, 2019
c6dff11
make cardinality and moments work across both text and numeric features
TuanNguyen27 Jul 23, 2019
0728855
rename variables
TuanNguyen27 Jul 23, 2019
797d5e6
wip
TuanNguyen27 Jul 24, 2019
3366362
wip
TuanNguyen27 Jul 24, 2019
2b09292
FeatureDistribution update
TuanNguyen27 Jul 24, 2019
f634049
wip
TuanNguyen27 Jul 25, 2019
f5c68a3
Merge branch 'master' into tn/cardinality
TuanNguyen27 Jul 25, 2019
6e22db9
moving cardinality and moments calculation into feature distribution
TuanNguyen27 Jul 25, 2019
34af0eb
Merge branch 'tn/cardinality' of https://github.com/salesforce/Transm…
TuanNguyen27 Jul 25, 2019
0eb5f8d
wip
TuanNguyen27 Jul 25, 2019
10af178
update some compiler error
TuanNguyen27 Jul 25, 2019
0b902a0
update test
TuanNguyen27 Jul 26, 2019
520ac05
fix scala style
TuanNguyen27 Jul 26, 2019
1ea54e5
more fix
TuanNguyen27 Jul 26, 2019
c62d2db
wip
TuanNguyen27 Jul 26, 2019
52c0fc3
fix style error
TuanNguyen27 Jul 26, 2019
315a21a
update test to reflect new members of FeatureDistribution case class
TuanNguyen27 Jul 26, 2019
3c8f196
update printing
TuanNguyen27 Jul 26, 2019
be94fe3
added some tests
TuanNguyen27 Jul 27, 2019
bd4ef29
Update FeatureDistributionTest.scala
TuanNguyen27 Jul 28, 2019
d24d863
fix scala style
TuanNguyen27 Jul 28, 2019
9135987
add docs
TuanNguyen27 Jul 29, 2019
48a742f
move maxCard param to companion object
TuanNguyen27 Jul 29, 2019
3ab4f91
scala style fix
TuanNguyen27 Jul 29, 2019
2825443
fix string conversion
TuanNguyen27 Jul 29, 2019
3973b72
move MaxCardinality to RFF companion object
TuanNguyen27 Jul 29, 2019
5a99fc0
Try fixing test
TuanNguyen27 Jul 29, 2019
9f12d0b
Merge branch 'master' into tn/cardinality
TuanNguyen27 Jul 30, 2019
8f454d1
wip, need to make a bigger dummy dataset
TuanNguyen27 Jul 30, 2019
2fd2c68
Merge branch 'tn/cardinality' of https://github.com/salesforce/Transm…
TuanNguyen27 Jul 30, 2019
de826d5
more idiomatic scala
TuanNguyen27 Jul 30, 2019
8811be5
test Tuple2Semigroup
TuanNguyen27 Jul 30, 2019
6678746
wip
TuanNguyen27 Jul 30, 2019
4a569dc
wip
TuanNguyen27 Jul 30, 2019
d1d7dc0
wip
TuanNguyen27 Jul 30, 2019
29d1925
update test
TuanNguyen27 Jul 31, 2019
d72004f
fix scala style
TuanNguyen27 Jul 31, 2019
6802558
removing verbose lines
TuanNguyen27 Jul 31, 2019
88c082f
clean up test for cardinality and moments
TuanNguyen27 Aug 1, 2019
5b71fca
fix scala style
TuanNguyen27 Aug 1, 2019
85d207f
clean up summation of Option[Moments]
TuanNguyen27 Aug 1, 2019
c9bdc27
Changed the TextStats SemiGroup to a Monoid so that we can make an Op…
Jauntbox Aug 2, 2019
d3c36f2
Fix merge conflict
Jauntbox Aug 2, 2019
4736a9c
Merge branch 'master' into tn/cardinality
TuanNguyen27 Aug 2, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion core/src/main/scala/com/salesforce/op/OpWorkflow.scala
Original file line number Diff line number Diff line change
Expand Up @@ -524,12 +524,14 @@ class OpWorkflow(val uid: String = UID[OpWorkflow]) extends OpWorkflowCore {
def withRawFeatureFilter[T](
trainingReader: Option[Reader[T]],
scoringReader: Option[Reader[T]],
bins: Int = 100,
bins: Int = 500,
minFillRate: Double = 0.001,
maxFillDifference: Double = 0.90,
maxFillRatioDiff: Double = 20.0,
maxJSDivergence: Double = 0.90,
maxCorrelation: Double = 0.95,
pvalCutoff: Double = 0.05,
minTextLen: Double = 100,
correlationType: CorrelationType = CorrelationType.Pearson,
protectedFeatures: Array[OPFeature] = Array.empty,
protectedJSFeatures: Array[OPFeature] = Array.empty,
Expand All @@ -552,6 +554,8 @@ class OpWorkflow(val uid: String = UID[OpWorkflow]) extends OpWorkflowCore {
maxFillRatioDiff = maxFillRatioDiff,
maxJSDivergence = maxJSDivergence,
maxCorrelation = maxCorrelation,
pvalCutoff = pvalCutoff,
minTextLen = minTextLen,
correlationType = correlationType,
protectedFeatures = protectedRawFeatures,
jsDivergenceProtectedFeatures = protectedRawJSFeatures,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ package com.salesforce.op.filters
* 1st level keys correspond to response keys
* 2nd level keys correspond to predictor keys with values being
* null-label leakage corr. value
* @param avgtextLength average length of text features
*/
private[op] case class AllFeatureInformation
(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ import com.salesforce.op.utils.json.EnumEntrySerializer
import com.twitter.algebird.Monoid._
import com.twitter.algebird.Operators._
import com.twitter.algebird.Semigroup
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.stat.Statistics
import org.json4s.jackson.Serialization
import org.json4s.{DefaultFormats, Formats}

Expand All @@ -51,6 +53,7 @@ import scala.util.Try
* @param key map key associated with distribution (when the feature is a map)
* @param count total count of feature seen
* @param nulls number of empties seen in feature
* @param avgTextLen average length of the text (only applicable to text feature)
* @param distribution binned counts of feature values (hashed for strings, evenly spaced bins for numerics)
* @param summaryInfo either min and max number of tokens for text data, or splits used for bins for numeric data
* @param `type` feature distribution type: training or scoring
Expand All @@ -61,6 +64,7 @@ case class FeatureDistribution
key: Option[String],
count: Long,
nulls: Long,
avgTextLen: Double,
distribution: Array[Double],
summaryInfo: Array[Double],
`type`: FeatureDistributionType = FeatureDistributionType.Training
Expand Down Expand Up @@ -91,6 +95,23 @@ case class FeatureDistribution
*/
def fillRate(): Double = if (count == 0L) 0.0 else (count - nulls) / count.toDouble

/**
* Test whether the given distribution is Uniform, for detecting useless text hashes
*
* @return true means we don't have enough evidence to reject Null hypothesis (current distribution is uniform)
* likely to drop the feature, but will combine with average text length check.
* If the hash space is too small w.r.t, a text feature could still appear uniformly distributed
*
* false means rejecting the Null hypothesis
* hashed feature deos not follow uniform distribution, but could still be useless
*/

def chiSqUnifTest(cutoff: Double): Boolean = {
val vectorizedDistr = Vectors.dense(distribution)
val goodnessOfFitTestResult = Statistics.chiSqTest(vectorizedDistr)
return goodnessOfFitTestResult.pValue >= cutoff
}

/**
* Combine feature distributions
*
Expand All @@ -100,9 +121,11 @@ case class FeatureDistribution
def reduce(fd: FeatureDistribution): FeatureDistribution = {
checkMatch(fd)
val combinedDist = distribution + fd.distribution
val combinedAvgTextLen = (avgTextLen * count + fd.avgTextLen * fd.count)/(count + fd.count)
// summary info can be empty or min max if hist is empty but should otherwise match so take the longest info
val combinedSummary = if (summaryInfo.length > fd.summaryInfo.length) summaryInfo else fd.summaryInfo
FeatureDistribution(name, key, count + fd.count, nulls + fd.nulls, combinedDist, combinedSummary, `type`)
FeatureDistribution(name, key, count + fd.count, nulls + fd.nulls,
combinedAvgTextLen, combinedDist, combinedSummary, `type`)
}

/**
Expand Down Expand Up @@ -154,6 +177,7 @@ case class FeatureDistribution
"key" -> key,
"count" -> count.toString,
"nulls" -> nulls.toString,
"avgTextLen" -> avgTextLen.toString,
"distribution" -> distribution.mkString("[", ",", "]"),
"summaryInfo" -> summaryInfo.mkString("[", ",", "]")
).map { case (n, v) => s"$n = $v" }.mkString(", ")
Expand All @@ -162,7 +186,7 @@ case class FeatureDistribution
}

override def equals(that: Any): Boolean = that match {
case FeatureDistribution(`name`, `key`, `count`, `nulls`, d, s, `type`) =>
case FeatureDistribution(`name`, `key`, `count`, `nulls`, `avgTextLen`, d, s, `type`) =>
distribution.deep == d.deep && summaryInfo.deep == s.deep
case _ => false
}
Expand Down Expand Up @@ -224,12 +248,16 @@ object FeatureDistribution {
val (nullCount, (summaryInfo, distribution)) =
value.map(seq => 0L -> histValues(seq, summary, bins, textBinsFormula))
.getOrElse(1L -> (Array(summary.min, summary.max, summary.sum, summary.count) -> new Array[Double](bins)))

val avgTextLen = value match {
case Some(Left(v)) => if (v.size > 0) v.map(_.size).sum / v.size else 0.0
case _ => 0.0
}
FeatureDistribution(
name = name,
key = key,
count = 1L,
nulls = nullCount,
avgTextLen = avgTextLen,
summaryInfo = summaryInfo,
distribution = distribution,
`type` = `type`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ class RawFeatureFilter[T]
val maxFillRatioDiff: Double,
val maxJSDivergence: Double,
val maxCorrelation: Double,
val pvalCutoff: Double,
val minTextLen: Double,
val correlationType: CorrelationType = CorrelationType.Pearson,
val jsDivergenceProtectedFeatures: Set[String] = Set.empty,
val protectedFeatures: Set[String] = Set.empty,
Expand Down Expand Up @@ -320,6 +322,8 @@ class RawFeatureFilter[T]
message = s"Features excluded because training fill rate did not meet min required ($minFill)"
)

val uniformFtDistribution: Seq[Boolean] = trainingDistribs.map(_.chiSqUnifTest(pvalCutoff))
val avgTextLenTest: Seq[Boolean] = trainingDistribs.map(_.avgTextLen < minTextLen)
val trainingNullLabelLeakers: Seq[Boolean] = rawFeatureFilterMetrics.map(_.trainingNullLabelAbsoluteCorr).map {
case Some(corr) => corr > maxCorrelation
case None => false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,8 @@ class OpWorkflowModelReaderWriterTest
aggregateParams = null
)

val distributions = Array(FeatureDistribution("a", None, 1L, 1L, Array(1.0), Array(1.0)),
FeatureDistribution("b", Option("b"), 2L, 2L, Array(2.0), Array(2.0)))
val distributions = Array(FeatureDistribution("a", None, 1L, 1L, 0, Array(1.0), Array(1.0)),
FeatureDistribution("b", Option("b"), 2L, 2L, 0, Array(2.0), Array(2.0)))

val rawFeatureFilterResults = RawFeatureFilterResults(rawFeatureDistributions = distributions)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,42 +134,42 @@ class FeatureDistributionTest extends FlatSpec with PassengerSparkFixtureTest wi
}

it should "correctly compare fill rates" in {
val fd1 = FeatureDistribution("A", None, 10, 1, Array.empty, Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, Array.empty, Array.empty)
val fd1 = FeatureDistribution("A", None, 10, 1, 0, Array.empty, Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, 0, Array.empty, Array.empty)
fd1.relativeFillRate(fd2) shouldBe 0.9
}

it should "correctly compare relative fill rates" in {
val fd1 = FeatureDistribution("A", None, 10, 1, Array.empty, Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 19, Array.empty, Array.empty)
val fd1 = FeatureDistribution("A", None, 10, 1, 0, Array.empty, Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 19, 0, Array.empty, Array.empty)
trainSummaries(0).relativeFillRatio(scoreSummaries(0)) shouldBe 4.5
trainSummaries(2).relativeFillRatio(scoreSummaries(2)) shouldBe 1.0
fd1.relativeFillRatio(fd2) shouldBe 18.0
}

it should "correctly compute the DS divergence" in {
val fd1 = FeatureDistribution("A", None, 10, 1, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, Array(2, 8, 0, 0, 12), Array.empty)
val fd1 = FeatureDistribution("A", None, 10, 1, 0, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, 0, Array(2, 8, 0, 0, 12), Array.empty)
fd1.jsDivergence(fd2) should be < eps

val fd3 = FeatureDistribution("A", None, 10, 1, Array(0, 0, 1000, 1000, 0), Array.empty)
val fd3 = FeatureDistribution("A", None, 10, 1, 0, Array(0, 0, 1000, 1000, 0), Array.empty)
fd3.jsDivergence(fd3) should be < eps
val fd4 = FeatureDistribution("A", None, 20, 20, Array(200, 800, 0, 0, 1200), Array.empty)
val fd4 = FeatureDistribution("A", None, 20, 20, 0, Array(200, 800, 0, 0, 1200), Array.empty)
(fd3.jsDivergence(fd4) - 1.0) should be < eps
}

it should "reduce correctly" in {
val fd1 = FeatureDistribution("A", None, 10, 1, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, Array(2, 8, 0, 0, 12), Array.empty)
val res = FeatureDistribution("A", None, 30, 21, Array(3.0, 12.0, 0.0, 0.0, 18.0), Array.empty)
val fd1 = FeatureDistribution("A", None, 10, 1, 0, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, 0, Array(2, 8, 0, 0, 12), Array.empty)
val res = FeatureDistribution("A", None, 30, 21, 0, Array(3.0, 12.0, 0.0, 0.0, 18.0), Array.empty)

fd1.reduce(fd2) shouldBe res
FeatureDistribution.semigroup.plus(fd1, fd2) shouldBe res
}

it should "have equals" in {
val fd1 = FeatureDistribution("A", None, 10, 1, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, Array(2, 8, 0, 0, 12), Array.empty)
val fd1 = FeatureDistribution("A", None, 10, 1, 0, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, 0, Array(2, 8, 0, 0, 12), Array.empty)
fd1 shouldBe fd1
fd1.equals("blarg") shouldBe false
fd1 shouldBe fd1.copy(summaryInfo = Array.empty)
Expand All @@ -178,23 +178,23 @@ class FeatureDistributionTest extends FlatSpec with PassengerSparkFixtureTest wi
}

it should "have hashCode" in {
val fd1 = FeatureDistribution("A", None, 10, 1, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, Array(2, 8, 0, 0, 12), Array.empty)
val fd1 = FeatureDistribution("A", None, 10, 1, 0, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, 0, Array(2, 8, 0, 0, 12), Array.empty)
fd1.hashCode() shouldBe fd1.hashCode()
fd1.hashCode() shouldBe fd1.copy(summaryInfo = fd1.summaryInfo).hashCode()
fd1.hashCode() should not be fd1.copy(summaryInfo = Array.empty).hashCode()
fd1.hashCode() should not be fd2.hashCode()
}

it should "have toString" in {
FeatureDistribution("A", None, 10, 1, Array(1, 4, 0, 0, 6), Array.empty).toString() shouldBe
FeatureDistribution("A", None, 10, 1, 0, Array(1, 4, 0, 0, 6), Array.empty).toString() shouldBe
"FeatureDistribution(type = Training, name = A, key = None, count = 10, nulls = 1, " +
"distribution = [1.0,4.0,0.0,0.0,6.0], summaryInfo = [])"
"avgTextLen = 0.0, distribution = [1.0,4.0,0.0,0.0,6.0], summaryInfo = [])"
}

it should "marshall to/from json" in {
val fd1 = FeatureDistribution("A", None, 10, 1, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, Array(2, 8, 0, 0, 12), Array.empty)
val fd1 = FeatureDistribution("A", None, 10, 1, 0, Array(1, 4, 0, 0, 6), Array.empty)
val fd2 = FeatureDistribution("A", None, 20, 20, 0, Array(2, 8, 0, 0, 12), Array.empty)
val json = FeatureDistribution.toJson(Array(fd1, fd2))
FeatureDistribution.fromJson(json) match {
case Success(r) => r shouldBe Seq(fd1, fd2)
Expand All @@ -203,11 +203,12 @@ class FeatureDistributionTest extends FlatSpec with PassengerSparkFixtureTest wi
}

it should "marshall to/from json with default vector args" in {
val fd1 = FeatureDistribution("A", None, 10, 1, Array(1, 4, 0, 0, 6), Array.empty, FeatureDistributionType.Scoring)
val fd2 = FeatureDistribution("A", Some("X"), 20, 20, Array(2, 8, 0, 0, 12), Array.empty)
val fd1 = FeatureDistribution("A", None, 10, 1, 0.0, Array(1, 4, 0, 0, 6),
Array.empty, FeatureDistributionType.Scoring)
val fd2 = FeatureDistribution("A", Some("X"), 20, 20, 0.0, Array(2, 8, 0, 0, 12), Array.empty)
val json =
"""[{"name":"A","count":10,"nulls":1,"distribution":[1.0,4.0,0.0,0.0,6.0],"type":"Scoring"},
|{"name":"A","key":"X","count":20,"nulls":20,"distribution":[2.0,8.0,0.0,0.0,12.0]}]
"""[{"name":"A","count":10,"nulls":1,"avgTextLen":0.0,"distribution":[1.0,4.0,0.0,0.0,6.0],"type":"Scoring"},
|{"name":"A","key":"X","count":20,"nulls":20,"avgTextLen":0.0,"distribution":[2.0,8.0,0.0,0.0,12.0]}]
|""".stripMargin

FeatureDistribution.fromJson(json) match {
Expand All @@ -217,7 +218,7 @@ class FeatureDistributionTest extends FlatSpec with PassengerSparkFixtureTest wi
}

it should "error on mismatching feature name, key or type" in {
val fd1 = FeatureDistribution("A", None, 10, 1, Array(1, 4, 0, 0, 6), Array.empty)
val fd1 = FeatureDistribution("A", None, 10, 1, 0, Array(1, 4, 0, 0, 6), Array.empty)

intercept[IllegalArgumentException](fd1.reduce(fd1.copy(name = "boo"))) should have message
"requirement failed: Name must match to compare or combine feature distributions: A != boo"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,20 +37,20 @@ trait FiltersTestData {
protected val eps = 1E-2

protected val trainSummaries = Seq(
FeatureDistribution("A", None, 10, 1, Array(1, 4, 0, 0, 6), Array.empty),
FeatureDistribution("B", None, 20, 20, Array(2, 8, 0, 0, 12), Array.empty),
FeatureDistribution("C", Some("1"), 10, 1, Array(1, 4, 0, 0, 6), Array.empty),
FeatureDistribution("C", Some("2"), 20, 19, Array(2, 8, 0, 0, 12), Array.empty),
FeatureDistribution("D", Some("1"), 10, 9, Array(1, 4, 0, 0, 6), Array.empty),
FeatureDistribution("D", Some("2"), 20, 19, Array(2, 8, 0, 0, 12), Array.empty)
FeatureDistribution("A", None, 10, 1, 0, Array(1, 4, 0, 0, 6), Array.empty),
FeatureDistribution("B", None, 20, 20, 0, Array(2, 8, 0, 0, 12), Array.empty),
FeatureDistribution("C", Some("1"), 10, 1, 0, Array(1, 4, 0, 0, 6), Array.empty),
FeatureDistribution("C", Some("2"), 20, 19, 0, Array(2, 8, 0, 0, 12), Array.empty),
FeatureDistribution("D", Some("1"), 10, 9, 0, Array(1, 4, 0, 0, 6), Array.empty),
FeatureDistribution("D", Some("2"), 20, 19, 0, Array(2, 8, 0, 0, 12), Array.empty)
)

protected val scoreSummaries = Seq(
FeatureDistribution("A", None, 10, 8, Array(1, 4, 0, 0, 6), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("B", None, 20, 20, Array(2, 8, 0, 0, 12), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("C", Some("1"), 10, 1, Array(0, 0, 10, 10, 0), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("C", Some("2"), 20, 19, Array(2, 8, 0, 0, 12), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("D", Some("1"), 0, 0, Array(0, 0, 0, 0, 0), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("D", Some("2"), 0, 0, Array(0, 0, 0, 0, 0), Array.empty, FeatureDistributionType.Scoring)
FeatureDistribution("A", None, 10, 8, 0, Array(1, 4, 0, 0, 6), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("B", None, 20, 20, 0, Array(2, 8, 0, 0, 12), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("C", Some("1"), 10, 1, 0, Array(0, 0, 10, 10, 0), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("C", Some("2"), 20, 19, 0, Array(2, 8, 0, 0, 12), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("D", Some("1"), 0, 0, 0, Array(0, 0, 0, 0, 0), Array.empty, FeatureDistributionType.Scoring),
FeatureDistribution("D", Some("2"), 0, 0, 0, Array(0, 0, 0, 0, 0), Array.empty, FeatureDistributionType.Scoring)
)
}
Loading