[SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins #23117

HyukjinKwon · 2018-11-22T08:26:41Z

What changes were proposed in this pull request?

Background

For the current status, the test script that generates coverage information was merged
into Spark, #20204

So, we can generate the coverage report and site by, for example:

run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql

like run-tests script in ./python.

Proposed change

The next step is to host this coverage report via github.io automatically
by Jenkins (see https://spark-test.github.io/pyspark-coverage-site/).

This uses my testing account for Spark, @spark-test, which is shared to Felix and Shivaram a long time ago for testing purpose including AppVeyor.

To cut this short, this PR targets to run the coverage in
spark-master-test-sbt-hadoop-2.7

In the specific job, it will clone the page, and rebase the up-to-date PySpark test coverage from the latest commit. For instance as below:

# Clone PySpark coverage site.
git clone https://github.com/spark-test/pyspark-coverage-site.git

# Remove existing HTMLs.
rm -fr pyspark-coverage-site/*

# Copy generated coverage HTMLs.
cp -r .../python/test_coverage/htmlcov/* pyspark-coverage-site/

# Check out to a temporary branch.
git symbolic-ref HEAD refs/heads/latest_branch

# Add all the files.
git add -A

# Commit current HTMLs.
git commit -am "Coverage report at latest commit in Apache Spark"

# Delete the old branch.
git branch -D gh-pages

# Rename the temporary branch to master.
git branch -m gh-pages

# Finally, force update to our repository.
git push -f origin gh-pages

So, it is a one single up-to-date coverage can be shown in the github-io page. The commands above were manually tested.

TODOs

Write a draft @HyukjinKwon
pip install coverage to all python implementations (pypy, python2, python3) in Jenkins workers - @shaneknapp
Set hidden SPARK_TEST_KEY for @spark-test's password in Jenkins via Jenkins's feature
This should be set in both PR builder and spark-master-test-sbt-hadoop-2.7 so that later other PRs can test and fix the bugs - @shaneknapp
Set an environment variable that indicates spark-master-test-sbt-hadoop-2.7 so that that specific build can report and update the coverage site - @shaneknapp
Make PR builder's test passed @HyukjinKwon
Fix flaky test related with coverage @HyukjinKwon
- 6 consecutive passes out of 7 runs

This PR will be co-authored with me and @shaneknapp

How was this patch tested?

It will be tested via Jenkins.

HyukjinKwon · 2018-11-22T08:27:51Z

cc @rxin, @srowen, @JoshRosen, @shaneknapp, @gatorsmile, @BryanCutler, @holdenk, @felixcheung, @viirya, @ueshin, @icexelloss, @yhuai, @squito, @cloud-fan

dev/run-tests.py

shaneknapp · 2018-11-22T18:02:10Z

i'll try and take a look at this over the next couple of days, but it's a holiday weekend and i may not be able to get to this until monday.

HyukjinKwon · 2018-11-22T23:57:35Z

It's not urgent :) so it's okay. Actually i'm on a vacation for a week as well. Thanks for taking a look @shaneknapp !!

HyukjinKwon · 2018-11-26T16:08:08Z

Hey @shaneknapp, have you found some time to take a look for this?

shaneknapp · 2018-11-26T19:18:29Z

not yet, but i will carve out some time today and wednesday to look closer.

squito

it will be great to have a test coverge report!

does this add time to running the tests?

dev/run-tests.py

HyukjinKwon · 2018-11-27T01:03:17Z

does this add time to running the tests?

Given my local tests, the time diff looked slightly increasing .. I want to see how it works in Jenkins ..

HyukjinKwon · 2018-12-20T15:46:07Z

Hey, @shaneknapp, mind if I ask to take a look please when you're available?

shaneknapp · 2018-12-20T20:50:42Z

Ping me again in two or three days. Currently traveling.

…

On Thu, Dec 20, 2018, 07:47 Hyukjin Kwon ***@***.*** wrote: Hey, @shaneknapp <https://github.com/shaneknapp>, mind if I ask to take a look please when you're available? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABiDrDO3E52fOP8Zizdm8Pyz_usJBzYxks5u67D9gaJpZM4Yu3Te> .

HyukjinKwon · 2018-12-27T03:34:51Z

gentle ping @shaneknapp :D.

HyukjinKwon · 2019-01-05T13:58:53Z

gentle ping .. @shaneknapp

shaneknapp · 2019-01-05T22:28:47Z

sorry! i'll get to this on monday, promise. :)

…

On Sat, Jan 5, 2019 at 5:59 AM Hyukjin Kwon ***@***.***> wrote: gentle ping .. @shaneknapp <https://github.com/shaneknapp> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABiDrPEvuA1r3ykIFALX486XMB9AW7Sbks5vAK_LgaJpZM4Yu3Te> .

HyukjinKwon · 2019-01-07T07:59:48Z

@shaneknapp, I updated PR description for action items to be clear. Please let me know if there's any question about it when you start to work. Thank you so much!

shaneknapp · 2019-01-07T19:27:39Z

will coverage version 4.5.2 be sufficient (same across pypy/py2.7/py3.4)?

shaneknapp · 2019-01-07T19:41:51Z

alright, coverage==4.5.2 is installed on all workers, across all python/pypy envs.

shaneknapp · 2019-01-07T21:09:55Z

now i'm currently looking in to how to get the python coverage tests to deploy solely to the spark-master-test-sbt-hadoop-2.7 build via the jenkins job configs... setting this up w/the PRB will be much easier.

HyukjinKwon · 2019-01-08T02:26:06Z

Yea coverage versions look good! Thanks.

Ah, @shaneknapp, I think I can handle how to run the python coverage tests within dev/run-tests.py if there's any environment variable set to spark-master-test-sbt-hadoop-2.7 build (for instance, let's say SPARK_MASTER_SBT_HADOOP_2_7).

I can check the environment variable (SPARK_MASTER_SBT_HADOOP_2_7) within this script run-tests.py, and run the python coverage tests.

shaneknapp · 2019-01-17T20:14:54Z

just a quick FYI, i'll get back to my part of this early next week. i'm out of the office at our lab's retreat.

HyukjinKwon · 2019-01-18T01:55:13Z

Thanks, @shaneknapp. (I'm just checking the installation of coverage FWIW)

HyukjinKwon · 2019-01-18T06:40:34Z

I filed for the flaky test (SPARK-26646).

HyukjinKwon · 2019-01-21T06:59:24Z

python/pyspark/streaming/tests/test_dstream.py


 from pyspark import SparkConf, SparkContext, RDD
 from pyspark.streaming import StreamingContext
 from pyspark.testing.streamingutils import PySparkStreamingTestCase


+@unittest.skipIf(
+    "pypy" in platform.python_implementation().lower() and "COVERAGE_PROCESS_START" in os.environ,
+    "PyPy implementation causes to hang DStream tests forever when Coverage report is used.")


Hm, I am not sure but those tests hang forever when coverage is used.

HyukjinKwon · 2019-01-21T07:20:54Z

FWIW, @shaneknapp, I added an empty commit authored by you at 82732eaded312b0cae6ec4876d0b5791dd4faa54 so .. this PR will be committed with you (for instance like 51bee7a)

Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: shane knapp <incomplete@gmail.com>

HyukjinKwon · 2019-01-30T09:28:11Z

retest this please

HyukjinKwon · 2019-01-30T09:45:07Z

IIRC, I think @rxin, @JoshRosen, and @shaneknapp agreed upon my approach a long ago in general when we discussed about it in emails. Let me get this in in few days if there are no notable comments and start to monitor spark-master-test-sbt-hadoop-2.7.

SparkQA · 2019-01-30T14:09:12Z

Test build #101884 has finished for PR 23117 at commit a1c0601.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dev/run-tests.py

SparkQA · 2019-01-31T06:45:14Z

Test build #101927 has finished for PR 23117 at commit 426ef11.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-02-01T02:17:25Z

I'm going to merge this since it's not going to affect Spark itself and also PR builders. Im gonna monitor SBT master job and see if it works.

Merged to master.

HyukjinKwon · 2019-02-01T12:11:17Z

Argh, looks it causes an error when it pushes HTMLs into https://spark-test.github.io/pyspark-coverage-site/ due to low git version at spark-master-test-sbt-hadoop-2.7:

  error: The requested URL returned error: 403 Forbidden while accessing https://spark-test:****@github.com/spark-test/pyspark-coverage-site.git/info/refs

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/5465/console

  Please upgrade your git client.
  GitHub.com no longer supports git over dumb-http: https://github.com/blog/809-git-dumb-http-transport-to-be-turned-off-in-90-days

https://github.com/spark-test/pyspark-coverage-site.git/info/refs

I am going to interact with @shaneknapp in a private channel to speed up to fix it.

shaneknapp · 2019-02-01T16:47:51Z

actually, no, it shouldn't be the git version. that build is using 2.7.2, and according to the blog post we need 1.6.6.

HyukjinKwon · 2019-02-02T11:24:24Z

Thanks, @shaneknapp. I at least checked the build is now passing https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/5468/

… via Jenkins ## What changes were proposed in this pull request? ### Background For the current status, the test script that generates coverage information was merged into Spark, apache#20204 So, we can generate the coverage report and site by, for example: ``` run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql ``` like `run-tests` script in `./python`. ### Proposed change The next step is to host this coverage report via `github.io` automatically by Jenkins (see https://spark-test.github.io/pyspark-coverage-site/). This uses my testing account for Spark, spark-test, which is shared to Felix and Shivaram a long time ago for testing purpose including AppVeyor. To cut this short, this PR targets to run the coverage in [spark-master-test-sbt-hadoop-2.7](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/) In the specific job, it will clone the page, and rebase the up-to-date PySpark test coverage from the latest commit. For instance as below: ```bash # Clone PySpark coverage site. git clone https://github.com/spark-test/pyspark-coverage-site.git # Remove existing HTMLs. rm -fr pyspark-coverage-site/* # Copy generated coverage HTMLs. cp -r .../python/test_coverage/htmlcov/* pyspark-coverage-site/ # Check out to a temporary branch. git symbolic-ref HEAD refs/heads/latest_branch # Add all the files. git add -A # Commit current HTMLs. git commit -am "Coverage report at latest commit in Apache Spark" # Delete the old branch. git branch -D gh-pages # Rename the temporary branch to master. git branch -m gh-pages # Finally, force update to our repository. git push -f origin gh-pages ``` So, it is a one single up-to-date coverage can be shown in the `github-io` page. The commands above were manually tested. ### TODOs - [x] Write a draft HyukjinKwon - [x] `pip install coverage` to all python implementations (pypy, python2, python3) in Jenkins workers - shaneknapp - [x] Set hidden `SPARK_TEST_KEY` for spark-test's password in Jenkins via Jenkins's feature This should be set in both PR builder and `spark-master-test-sbt-hadoop-2.7` so that later other PRs can test and fix the bugs - shaneknapp - [x] Set an environment variable that indicates `spark-master-test-sbt-hadoop-2.7` so that that specific build can report and update the coverage site - shaneknapp - [x] Make PR builder's test passed HyukjinKwon - [x] Fix flaky test related with coverage HyukjinKwon - 6 consecutive passes out of 7 runs This PR will be co-authored with me and shaneknapp ## How was this patch tested? It will be tested via Jenkins. Closes apache#23117 from HyukjinKwon/SPARK-7721. Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: hyukjinkwon <gurwls223@apache.org> Co-authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

dongjoon-hyun · 2020-11-30T05:33:38Z

Hi, @HyukjinKwon and @shaneknapp .
It seems that AmbLab Jenkins lost SPARK_TEST_KEY during the recent transition and has been failing for last 5 days.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/1677/console

Generating HTML files for PySpark coverage under /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7-hive-2.3/python/test_coverage/htmlcov
/home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7-hive-2.3
[error] 'SPARK_TEST_KEY' environment variable was not set. Unable to post PySpark coverage results.

dongjoon-hyun · 2020-11-30T05:35:30Z

In addition to that it breaks branch-3.0 too because its Jenkins configuration has SPARK_MASTER_SBT_HADOOP_2_7=1.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3/1040/console

HyukjinKwon · 2020-11-30T06:05:33Z

Thank you @dongjoon-hyun. Let me set the key accordinlgy.

dongjoon-hyun · 2020-11-30T06:18:58Z

Thank you~

HyukjinKwon · 2020-11-30T06:20:56Z

@shaneknapp,

I removed SPARK_MASTER_SBT_HADOOP_2_7 in
- spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3
- spark-branch-3.0-test-sbt-hadoop-2.7-hive-1.2
- spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3
and added SPARK_TEST_KEY as a password in spark-master-test-sbt-hadoop-2.7-hive-2.3:

dongjoon-hyun · 2020-11-30T06:26:00Z

Could you remove SPARK_MASTER_SBT_HADOOP_2_7 from the followings too?

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-2.7-hive-1.2/configure

HyukjinKwon · 2020-11-30T06:27:25Z

Sure! I will check other jobs too

dongjoon-hyun · 2020-11-30T07:22:27Z

Thank you so much!

HyukjinKwon commented Nov 22, 2018

View reviewed changes

dev/run-tests.py Outdated Show resolved Hide resolved

HyukjinKwon commented Nov 22, 2018

View reviewed changes

dev/run-tests.py Outdated Show resolved Hide resolved

squito reviewed Nov 26, 2018

View reviewed changes

dev/run-tests.py Outdated Show resolved Hide resolved

dev/run-tests.py Show resolved Hide resolved

HyukjinKwon force-pushed the SPARK-7721 branch from c386878 to 0dcc98a Compare January 21, 2019 03:56

HyukjinKwon commented Jan 21, 2019

View reviewed changes

HyukjinKwon force-pushed the SPARK-7721 branch from e2ed235 to 1cc2a5c Compare January 21, 2019 07:15

apache deleted a comment from SparkQA Jan 21, 2019

Adding Shanke as co-author

a1c0601

Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: shane knapp <incomplete@gmail.com>

HyukjinKwon force-pushed the SPARK-7721 branch from 5f8872f to a1c0601 Compare January 30, 2019 06:37

This comment has been minimized.

Sign in to view

squito reviewed Jan 30, 2019

View reviewed changes

dev/run-tests.py Outdated Show resolved Hide resolved

Fix a typo

426ef11

asfgit closed this in cdd694c Feb 1, 2019

HyukjinKwon mentioned this pull request Jun 24, 2019

[MINOR][BUILD] Exclude pyspark-coverage-site/ dir from RAT #24950

Closed

HyukjinKwon deleted the SPARK-7721 branch March 3, 2020 01:20

[SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins #23117

[SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins #23117

Conversation

HyukjinKwon commented Nov 22, 2018 • edited Loading

What changes were proposed in this pull request?

Background

Proposed change

TODOs

How was this patch tested?

HyukjinKwon commented Nov 22, 2018 • edited Loading

shaneknapp commented Nov 22, 2018

HyukjinKwon commented Nov 22, 2018

HyukjinKwon commented Nov 26, 2018

shaneknapp commented Nov 26, 2018

squito left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Nov 27, 2018

HyukjinKwon commented Dec 20, 2018

shaneknapp commented Dec 20, 2018 via email

HyukjinKwon commented Dec 27, 2018

HyukjinKwon commented Jan 5, 2019

shaneknapp commented Jan 5, 2019 via email

HyukjinKwon commented Jan 7, 2019

shaneknapp commented Jan 7, 2019

shaneknapp commented Jan 7, 2019

shaneknapp commented Jan 7, 2019

HyukjinKwon commented Jan 8, 2019

shaneknapp commented Jan 17, 2019

HyukjinKwon commented Jan 18, 2019

HyukjinKwon commented Jan 18, 2019

HyukjinKwon Jan 21, 2019

Choose a reason for hiding this comment

HyukjinKwon commented Jan 21, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

HyukjinKwon commented Jan 30, 2019

HyukjinKwon commented Jan 30, 2019

SparkQA commented Jan 30, 2019

SparkQA commented Jan 31, 2019

HyukjinKwon commented Feb 1, 2019

HyukjinKwon commented Feb 1, 2019 • edited Loading

shaneknapp commented Feb 1, 2019

HyukjinKwon commented Feb 2, 2019

dongjoon-hyun commented Nov 30, 2020 • edited Loading

dongjoon-hyun commented Nov 30, 2020

HyukjinKwon commented Nov 30, 2020

dongjoon-hyun commented Nov 30, 2020

HyukjinKwon commented Nov 30, 2020 • edited Loading

dongjoon-hyun commented Nov 30, 2020

HyukjinKwon commented Nov 30, 2020

dongjoon-hyun commented Nov 30, 2020

HyukjinKwon commented Nov 22, 2018 •

edited

Loading

HyukjinKwon commented Nov 22, 2018 •

edited

Loading

HyukjinKwon commented Feb 1, 2019 •

edited

Loading

dongjoon-hyun commented Nov 30, 2020 •

edited

Loading

HyukjinKwon commented Nov 30, 2020 •

edited

Loading