-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins #23117
Conversation
i'll try and take a look at this over the next couple of days, but it's a holiday weekend and i may not be able to get to this until monday. |
It's not urgent :) so it's okay. Actually i'm on a vacation for a week as well. Thanks for taking a look @shaneknapp !! |
Hey @shaneknapp, have you found some time to take a look for this? |
not yet, but i will carve out some time today and wednesday to look closer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be great to have a test coverge report!
does this add time to running the tests?
Given my local tests, the time diff looked slightly increasing .. I want to see how it works in Jenkins .. |
Hey, @shaneknapp, mind if I ask to take a look please when you're available? |
Ping me again in two or three days. Currently traveling.
…On Thu, Dec 20, 2018, 07:47 Hyukjin Kwon ***@***.*** wrote:
Hey, @shaneknapp <https://github.com/shaneknapp>, mind if I ask to take a
look please when you're available?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23117 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABiDrDO3E52fOP8Zizdm8Pyz_usJBzYxks5u67D9gaJpZM4Yu3Te>
.
|
gentle ping @shaneknapp :D. |
gentle ping .. @shaneknapp |
sorry! i'll get to this on monday, promise. :)
…On Sat, Jan 5, 2019 at 5:59 AM Hyukjin Kwon ***@***.***> wrote:
gentle ping .. @shaneknapp <https://github.com/shaneknapp>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23117 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABiDrPEvuA1r3ykIFALX486XMB9AW7Sbks5vAK_LgaJpZM4Yu3Te>
.
|
@shaneknapp, I updated PR description for action items to be clear. Please let me know if there's any question about it when you start to work. Thank you so much! |
will coverage version 4.5.2 be sufficient (same across pypy/py2.7/py3.4)? |
alright, coverage==4.5.2 is installed on all workers, across all python/pypy envs. |
now i'm currently looking in to how to get the python coverage tests to deploy solely to the |
Yea coverage versions look good! Thanks. Ah, @shaneknapp, I think I can handle how to run the python coverage tests within I can check the environment variable ( |
just a quick FYI, i'll get back to my part of this early next week. i'm out of the office at our lab's retreat. |
Thanks, @shaneknapp. (I'm just checking the installation of coverage FWIW) |
I filed for the flaky test (SPARK-26646). |
c386878
to
0dcc98a
Compare
|
||
from pyspark import SparkConf, SparkContext, RDD | ||
from pyspark.streaming import StreamingContext | ||
from pyspark.testing.streamingutils import PySparkStreamingTestCase | ||
|
||
|
||
@unittest.skipIf( | ||
"pypy" in platform.python_implementation().lower() and "COVERAGE_PROCESS_START" in os.environ, | ||
"PyPy implementation causes to hang DStream tests forever when Coverage report is used.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I am not sure but those tests hang forever when coverage is used.
e2ed235
to
1cc2a5c
Compare
FWIW, @shaneknapp, I added an empty commit authored by you at 82732eaded312b0cae6ec4876d0b5791dd4faa54 so .. this PR will be committed with you (for instance like 51bee7a) |
Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: shane knapp <incomplete@gmail.com>
5f8872f
to
a1c0601
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
retest this please |
IIRC, I think @rxin, @JoshRosen, and @shaneknapp agreed upon my approach a long ago in general when we discussed about it in emails. Let me get this in in few days if there are no notable comments and start to monitor spark-master-test-sbt-hadoop-2.7. |
Test build #101884 has finished for PR 23117 at commit
|
Test build #101927 has finished for PR 23117 at commit
|
I'm going to merge this since it's not going to affect Spark itself and also PR builders. Im gonna monitor SBT master job and see if it works. Merged to master. |
Argh, looks it causes an error when it pushes HTMLs into https://spark-test.github.io/pyspark-coverage-site/ due to low
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/5465/console
https://github.com/spark-test/pyspark-coverage-site.git/info/refs I am going to interact with @shaneknapp in a private channel to speed up to fix it. |
actually, no, it shouldn't be the git version. that build is using 2.7.2, and according to the blog post we need 1.6.6. |
Thanks, @shaneknapp. I at least checked the build is now passing https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/5468/ |
… via Jenkins ## What changes were proposed in this pull request? ### Background For the current status, the test script that generates coverage information was merged into Spark, apache#20204 So, we can generate the coverage report and site by, for example: ``` run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql ``` like `run-tests` script in `./python`. ### Proposed change The next step is to host this coverage report via `github.io` automatically by Jenkins (see https://spark-test.github.io/pyspark-coverage-site/). This uses my testing account for Spark, spark-test, which is shared to Felix and Shivaram a long time ago for testing purpose including AppVeyor. To cut this short, this PR targets to run the coverage in [spark-master-test-sbt-hadoop-2.7](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/) In the specific job, it will clone the page, and rebase the up-to-date PySpark test coverage from the latest commit. For instance as below: ```bash # Clone PySpark coverage site. git clone https://github.com/spark-test/pyspark-coverage-site.git # Remove existing HTMLs. rm -fr pyspark-coverage-site/* # Copy generated coverage HTMLs. cp -r .../python/test_coverage/htmlcov/* pyspark-coverage-site/ # Check out to a temporary branch. git symbolic-ref HEAD refs/heads/latest_branch # Add all the files. git add -A # Commit current HTMLs. git commit -am "Coverage report at latest commit in Apache Spark" # Delete the old branch. git branch -D gh-pages # Rename the temporary branch to master. git branch -m gh-pages # Finally, force update to our repository. git push -f origin gh-pages ``` So, it is a one single up-to-date coverage can be shown in the `github-io` page. The commands above were manually tested. ### TODOs - [x] Write a draft HyukjinKwon - [x] `pip install coverage` to all python implementations (pypy, python2, python3) in Jenkins workers - shaneknapp - [x] Set hidden `SPARK_TEST_KEY` for spark-test's password in Jenkins via Jenkins's feature This should be set in both PR builder and `spark-master-test-sbt-hadoop-2.7` so that later other PRs can test and fix the bugs - shaneknapp - [x] Set an environment variable that indicates `spark-master-test-sbt-hadoop-2.7` so that that specific build can report and update the coverage site - shaneknapp - [x] Make PR builder's test passed HyukjinKwon - [x] Fix flaky test related with coverage HyukjinKwon - 6 consecutive passes out of 7 runs This PR will be co-authored with me and shaneknapp ## How was this patch tested? It will be tested via Jenkins. Closes apache#23117 from HyukjinKwon/SPARK-7721. Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: hyukjinkwon <gurwls223@apache.org> Co-authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Hi, @HyukjinKwon and @shaneknapp .
|
In addition to that it breaks |
Thank you @dongjoon-hyun. Let me set the key accordinlgy. |
Thank you~ |
Could you remove |
Sure! I will check other jobs too |
Thank you so much! |
What changes were proposed in this pull request?
Background
For the current status, the test script that generates coverage information was merged
into Spark, #20204
So, we can generate the coverage report and site by, for example:
like
run-tests
script in./python
.Proposed change
The next step is to host this coverage report via
github.io
automaticallyby Jenkins (see https://spark-test.github.io/pyspark-coverage-site/).
This uses my testing account for Spark, @spark-test, which is shared to Felix and Shivaram a long time ago for testing purpose including AppVeyor.
To cut this short, this PR targets to run the coverage in
spark-master-test-sbt-hadoop-2.7
In the specific job, it will clone the page, and rebase the up-to-date PySpark test coverage from the latest commit. For instance as below:
So, it is a one single up-to-date coverage can be shown in the
github-io
page. The commands above were manually tested.TODOs
pip install coverage
to all python implementations (pypy, python2, python3) in Jenkins workers - @shaneknappSPARK_TEST_KEY
for @spark-test's password in Jenkins via Jenkins's featureThis should be set in both PR builder and
spark-master-test-sbt-hadoop-2.7
so that later other PRs can test and fix the bugs - @shaneknappspark-master-test-sbt-hadoop-2.7
so that that specific build can report and update the coverage site - @shaneknappThis PR will be co-authored with me and @shaneknapp
How was this patch tested?
It will be tested via Jenkins.