Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7578] [ml] [doc] User guide for spark.ml Normalizer, IDF, StandardScaler #6127

Closed
wants to merge 4 commits into from

Conversation

jkbradley
Copy link
Member

Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.

CC: @mengxr

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32635 has started for PR 6127 at commit 8a2b5cd.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32635 has finished for PR 6127 at commit 8a2b5cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaNormalizerSuite
    • public class JavaStandardScalerSuite

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32635/
Test PASSed.

@AmplabJenkins
Copy link

Build triggered.

@AmplabJenkins
Copy link

Build started.

@SparkQA
Copy link

SparkQA commented May 19, 2015

Test build #33085 has started for PR 6127 at commit 45245e8.

@SparkQA
Copy link

SparkQA commented May 19, 2015

Test build #33085 has finished for PR 6127 at commit 45245e8.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaNormalizerSuite
    • public class JavaStandardScalerSuite

@AmplabJenkins
Copy link

Build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33085/
Test PASSed.

@jkbradley
Copy link
Member Author

@mengxr Just rebased off of master, so this should be ready to review (with fewer PRs for the ml-features.md file now).

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 21, 2015

Test build #33274 has started for PR 6127 at commit f16bcec.

@SparkQA
Copy link

SparkQA commented May 21, 2015

Test build #33274 has finished for PR 6127 at commit f16bcec.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33274/
Test FAILed.

@SparkQA
Copy link

SparkQA commented May 21, 2015

Test build #848 has started for PR 6127 at commit f16bcec.

@SparkQA
Copy link

SparkQA commented May 22, 2015

Test build #848 has finished for PR 6127 at commit f16bcec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")
val idfModel = idf.fit(featurizedData)
val rescaledData = idfModel.transform(featurizedData)
rescaledData.select("features", "label").take(3).foreach(println)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: take(3) -> collect()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but was worried users would copy this into their own code and try to print out a huge dataset. What do you think?

IDF idf = new IDF().setInputCol("rawFeatures").setOutputCol("features");
IDFModel idfModel = idf.fit(featurizedData);
DataFrame rescaledData = idfModel.transform(featurizedData);
for (Row r : rescaledData.select("features", "label").take(3)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take(3) -> collect()?

@jkbradley
Copy link
Member Author

@mengxr Updated. Thanks!

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 22, 2015

Test build #33310 has started for PR 6127 at commit cd47f4b.

@SparkQA
Copy link

SparkQA commented May 22, 2015

Test build #33310 has finished for PR 6127 at commit cd47f4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaNormalizerSuite
    • public class JavaStandardScalerSuite

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33310/
Test PASSed.

asfgit pushed a commit that referenced this pull request May 22, 2015
…dardScaler

Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6127 from jkbradley/feature-guide-2 and squashes the following commits:

cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF

(cherry picked from commit 2728c3d)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@mengxr
Copy link
Contributor

mengxr commented May 22, 2015

LGTM. Merged into master and branch-1.4. Thanks!

jkbradley added a commit that referenced this pull request May 22, 2015
…dardScaler

Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6127 from jkbradley/feature-guide-2 and squashes the following commits:

cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF
@jkbradley jkbradley closed this May 22, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
…dardScaler

Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes apache#6127 from jkbradley/feature-guide-2 and squashes the following commits:

cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…dardScaler

Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes apache#6127 from jkbradley/feature-guide-2 and squashes the following commits:

cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…dardScaler

Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes apache#6127 from jkbradley/feature-guide-2 and squashes the following commits:

cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants