Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement automated pull request testing for Trilinos #1155

Closed
jwillenbring opened this issue Mar 20, 2017 · 23 comments
Closed

Implement automated pull request testing for Trilinos #1155

jwillenbring opened this issue Mar 20, 2017 · 23 comments
Assignees
Labels
Framework tasks Framework tasks (used internally by Framework team)

Comments

@jwillenbring
Copy link
Member

@trilinos/framework

@allevin is planning to develop a capability that will allow for testing of Trilinos pull request prior to having the changes applied to the develop branch. This story will track that work. A rough outline for the work is as follows:

  1. Adapt some existing scripts that test the pull requests for another project to be more general and serve the Trilinos case.

  2. Deploy on a limited, opt-in basis.

  3. Deploy on a full basis for all Trilinos develop branch modifications.

@jwillenbring jwillenbring added the Framework tasks Framework tasks (used internally by Framework team) label Mar 20, 2017
@bartlettroscoe
Copy link
Member

Is there no work done on this yet? What is the ETA for when we might get this implemented for Trilinos?

@bartlettroscoe
Copy link
Member

@jwillenbring, you indicated offline that @allevin has already been working on this for some time. If that is the case, why is this Issue not put "In Progress" and have its status logged? Why is it still listed as "Ready" in the Trilinos Framework Kanban board?

@bartlettroscoe
Copy link
Member

@jwillenbring and I had a meeting with Kitware staff (Brad K., Bill H. and Ben B.) to discuss a new automation system for GitLab merge-requests they have put together for testing and merging changes using a topic-branch workflow to CMake called "Ghostflow" which is short for "git hosted workflow" which was outlined in the slides they presented:

In a nutshell, what Ghostflow does is to create a new throwaway integration test branch after every change the the main development branch (e.g. 'master') or to any of the marked (i..e with the 'stage' command) PR branches. (Note: this throwaway integration test branch called 'stage' is really more like the 'pu' (proposed upstream) branch described in gitworkflows(7) and less like the 'next' branch described in that workflow.) This throwaway integration test branch 'stage' is then run through a more detailed set of automated builds on a number of different platforms each day (or more frequently if one would like).

The advantage of this approach over the simpler independent branch testing is that it scales better with the number of branches. For example, the CUDA builds for Trilinos for ATDM take upwards of 10 hours or more to run (and that is without rebuilds testing package-by-package so this is O(7x) or more that basic GCC build). For those platforms, we can't test each PR branch independently. This would allow us to test all of the branches.

The PR testing workflow that we would construct given Ghostflow might be something like:

  • PR branches to be marked to be tested by the basic CI biuld would be tagged with "Do: ci_build". This would trigger an automated CI build to be fired off right away and results would be posted to CDash and links to the CDash build would be provided as comments in the PR issue.

  • PR branches to be included in more in-depth testing would be marked with "Do: stage". This would automatically trigger a CI build if not already done and if the CI build passes 100%, then the branch would be merged into the 'stage' branch. Otherwise, the branch would not be attempted to be merged.

  • A 'stage' branch would be built from scratch whenever a change to the 'develop' branch or the one of the 'stage' PR branches was updated (the latter only after the CI build passed).

  • A snapshot of 'stage' would be created at the same time each day and then a set of nightly builds on various platforms would be fired off and reported to CDash (and to the PR issues as comments).

  • A PR branch would only be allowed to be (manually) merged to the 'develop' branch when all of the nightly builds on the 'stage' branch were 100% clean (or if a hot-fix branch could be created with confidence to clean up the remaining issues and then merged).

  • An additional set of builds and tests would be performed on the 'develop' branch each day as a trigger before updating the 'master' branch.

This is just an idea of what this workflow would look like. The tweaks and knobs would be:

  • What builds are selected for the CI build before a branch was even attempted to be added to 'stage'.? For example, this could be a Linux build and a Mac OSX build

  • What builds are selected for the nightly testing of the PR branches on the 'sage' branch (that is the trigger for merging with 'develop')? For example, this would include an additional CUDA build and perhaps one or two other critical builds for ATDM customers that are known to break very frequently.

  • What nightly builds selected for the promotion from 'develop' to 'master'? For example, this might include extra installation testing, backward compatibility testing, testing against special APPs like Nalu or Albany, etc.

However, the Ghostflow workflow we were presented is not yet ready to implement a process like described above. In particular, the following would need to be added or implemented by Kitware:

  • Add support for GitHub (currently only GitLab is implemented)

  • Add support for using distributed Jenkins farms to fire off various builds and submit to CDash.

But Kitware could not start working on this until Ben B. got back from vacation on July 5th. And they estimated it could take a few months to implement this. So the earliest we could potentially get this implemented and used would be into September 2017 or later.

Therefore, the decision was to wait and see how the refactoring work that @allevin is leading is progressing and then make a decision to if to look into Ghostflow with Kitware after the July 5th. I will add a reminder to bring up this topic at a future Kitware contract meeting after July 5th. (But the work that @allevin is doing with Jenkins could likely be directly leveraged in the updated Ghostflow implementation for Trilinos.)

@jwillenbring
Copy link
Member Author

@allevin Told me this morning that he has a rough version of this ready for me to try next week. He will write up instructions. There is still a gap in directing people to the results in CDash and other smaller gaps to fill. The initial implementation will not support testing multiple pull requests simultaneously.

@bartlettroscoe
Copy link
Member

There is still a gap in directing people to the results in CDash and other smaller gaps to fill.

As part of TriBITSPub/TriBITS#154, the TRIBITS_CTEST_DRIVER() functioin will create a text file that will contain the URL to the result on CDash. The python code can then read that file and print that URL in the GitHub comment. It will not take long to do that so let me know when you want that to use. (That is also needed for nice Travis CI output pointing to CDash).

@bartlettroscoe
Copy link
Member

Any recent update on this? I will be gone on vacation for the next 2 weeks and I fear a breakdown of the CI process using the checkin-test-sems.sh script (see #1304). The above comment from 24 days ago said that it should have been ready to test about 15 days ago or so.

@bartlettroscoe
Copy link
Member

As described in TriBITSPub/TriBITS#214 (comment) you should have everything you need from CMake/CTest and TriBITS to implement this. Improvements can be made from there (like embedding the exact CDash URL but you can always give the genertic URL and you can give the SHA1 in the build name so it is easy to find). Let's set up a meeting between @allevin and myself for when I get back from vacation the week of 8/28 to finish this.

@jwillenbring
Copy link
Member Author

I have been testing Aaron's work along with some Jenkins jobs I put together to do very basic testing of Trilinos. This has been largely successful in the narrow scope of testing without reporting to CDash (which is the scope @maherou encouraged me to pursue given other difficulties). An example of a pull request that underwent successful testing with this tool (but is lacking final inspection) can be found here:

jwillenbring#11

The current path forward is as follows:

  1. @allevin has a few more clean up tasks to complete based on feedback provided by the framework team. He may have completed them last week. I will touch base with him tomorrow.

  2. I need to update the process to run as "trilinos", both on the GitHub and Jenkins sides. For the testing, it is running as "me".

  3. We have decided to start with a "soft" rollout of this capability. In effect, initially, the results will be informative, and the tool will not merge automatically , and people will have the option to merge manually, or even just push directly to develop. We will gather data based on tests passing and failing, and how long tests take, and resolve issues with the tool.

  4. After the initial rollout we will start to clamp things down and not allow people to push directly to develop, and later let the tool automerge instead of giving people the option to push the button.

@aprokop
Copy link
Contributor

aprokop commented Sep 5, 2017

later let the tool automerge instead of giving people the option to push the button.

@jwillenbring My personal opinion is that you should not do automatic merging. This prevents other people from a chance to look at your code and notice the problems. I understand that it will slow down the process but the improved quality should compensate for that.

@bmpersc
Copy link
Contributor

bmpersc commented Sep 5, 2017

@aprokop you mention that you're concerned more people won't have time to review pull requests properly. What would you consider to be an acceptable amount of reviewers to look over a pull request? Do you have any thoughts on how to prevent a pull request from being trapped in limbo waiting for the necessary reviews?

@aprokop
Copy link
Contributor

aprokop commented Sep 5, 2017

What would you consider to be an acceptable amount of reviewers to look over a pull request?
The more the better, but minimum = 1.

Do you have any thoughts on how to prevent a pull request from being trapped in limbo waiting for the necessary reviews?

This is more of a social issue, and depends on the PR type (new feature/bug fix/comment fix). I would say for high priority things the author is allowed to merge once CI tests pass. For other smaller sized patches, couple days would be my guess for a deadline. For a brand new feature, depends.
From a workflow perspective, you could have a bot running and tagging PRs based on times since opening.

@bartlettroscoe
Copy link
Member

Do you have any thoughts on how to prevent a pull request from being trapped in limbo waiting for the necessary reviews?

What Kitware implemented with their "ghostflow" system with GitHub was to have people add commands in the comments that fire off builds, do the final merges, etc. See:

That is the system to study I think.

@jwillenbring
Copy link
Member Author

@jwillenbring My personal opinion is that you should not do automatic merging. This prevents other people from a chance to look at your code and notice the problems. I understand that it will slow down the process but the improved quality should compensate for that.

I should have said more here. The auto merge will not occur until a review is performed on the pull request (the review settings are configurable, but I will describe the current plan).

If someone is outside of the Trilinos organization, and the branch containing the changes is not on the trilinos/Trilinos repo, then that review must occur before the testing will even kick off. This is to allow people to make sure things aren't happening like insertion of malicious code, or changing code in a way that works on one machine, but disables all the testing so that it doesn't fail on another.

If someone is part of the Trilinos organization, or the branch is containing the changes is on the trilinos/Trilinos repo, then the review can occur anytime (before, after, or during testing), but must occur before the auto merge will take place.

As currently configured, one person needs to approve the changes, and there need to be 0 changes requested. So, if someone goes in and looks at it and says "good", but before the testing finishes someone else says "try this, request changes" then it won't merge, but if the second reviewer doesn't get there before the testing stops, it will merge. So, if someone agrees with the changes, but wants someone else to look, they should just comment, not approve.

@aprokop
Copy link
Contributor

aprokop commented Sep 5, 2017

@jwillenbring Thanks for the explanation, it sounds a lot better than what I had in mind before.

before the testing finishes someone else says "try this, request changes" then it won't merge

Is this keyword based? Or Github review "Request Changes"?

@jwillenbring
Copy link
Member Author

Is this keyword based? Or Github review "Request Changes"?

Github review "Request Changes"

@bartlettroscoe
Copy link
Member

Archiving email update on this so poeple can find this.


From: Trilinos-developers [mailto:trilinos-developers-bounces@trilinos.org] On Behalf Of Willenbring, James M
Sent: Thursday, November 16, 2017 5:29 PM
To: trilinos-developers@trilinos.org
Subject: [EXTERNAL] [Trilinos-developers] soft launch of pull request testing tool tomorrow
Importance: High

All,

CRITICAL SUMMARY

  1. Planning soft launch of pull request tool for tomorrow
  2. Reply to Jim if you are interested in training on the pull request workflow (not required during soft launch)

Today the pull request testing tool was run successfully for the first time on the main Trilinos repository. Following the successful run of the 4.9.3 and 4.8.4 tests, I merged the change to develop. The PR was in support of the SIERRA/Trilinos integration process. You can see the details here:

#1992

As discussed at TUG, there are a lot of improvements we still want to make to the pull request testing. However, I feel it is in good enough shape to execute the soft launch we have been planning. Here are the critical points to remember:

  1. The soft launch will not impact anyone’s ability to push directly to develop, but we encourage you to get familiar with the pull-request workflow if you are not already. I have included useful links below.
  2. The soft launch will initially only test PR’s numbered 1991 and higher. We can adjust this to help clean up existing pull requests if appropriate in the future.
  3. The soft launch will not limit the ability to merge a PR. Even if tests have not run, or fail, it will not impact the ability to merge the PR at this stage.
  4. If the repository is not clean, the tests will not be nearly as useful because the known failures will need to be identified among the pull request testing failures.
  5. If you don’t want your pull request to be tested by the tool, you can use [WIP] to begin the title of your pull request and the tool will ignore it.
  6. Pull requests from github users not in the Trilinos organization will require a review of the pull request before it enters testing. The PR must be “approved” by at least one person, and changes cannot be requested by anyone else or the PR will not be tested.

Here are some useful links if you are not familiar with the pull request workflow:

https://help.github.com/articles/about-pull-requests/

https://help.github.com/articles/creating-a-pull-request-from-a-fork/

https://help.github.com/articles/about-forks/

https://help.github.com/articles/syncing-a-fork/

Keep in mind as you read this information that we use “develop” as the branch we apply changes directly to for Trilinos, not “master”.

We are currently trying to gauge the need for training. Many developers are very comfortable with the pull request workflow and use it regularly. If you would like to receive some training, please send me (jmwille@sandia.gov) an email and let me know what concepts need clarification. We can then try to figure out if we should provide specific links, group training, 1 on 1 training, etc.

I plan to begin the soft launch tomorrow. If there are questions or concerns, let me know.

Thanks

Jim

@jwillenbring
Copy link
Member Author

The pull request testing soft launch is live. Pull requests are being tested, but passing tests is not required. The next major objective is to make the output of the tests more easily available to developers.

@bartlettroscoe
Copy link
Member

@jwillenbring,

Updated documentation on the usage of TriBITS CTest -S driver scripts is now posted at:

Please have a look and please not any typos you might see.

I will provide the skeleton for a script to drive the Jenkins PR branch testing later today. It should be very little code. Then we can discuss what needs to be added to TribitsCTestDriverCore.cmake in order to better support Trilinos PR testing. It don't think it is a lot of work to provide as much as one would want for this use case.

@jwillenbring
Copy link
Member Author

I sent the following email to the trilinos-developers list today:

All,

I want to update you on the Trilinos Framework Team’s efforts to deploy Pull Request testing for Trilinos. Skip the “Details” (literally) if you are not interested.

Results for a Pull Request build using GCC 4.8.4 can now be found under the “Pull Request” group on this dashboard: https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos.

Details: This dashboard is using a version of CDash that is under testing. However, we cannot use the testing.sandia.gov CDash dashboard because the submissions made by the pull request testing require a newer version of CDash than is available on that server.

You can now designate a Pull Request as a “work in progress”, using the “AT: WIP” label in GitHub, instead of putting [WIP] in the title of the pull request.

Details: The label works the same as the title keyword. “AT” in the name of the label stands for “Autotester”. This prefix will be used for all future labels related to the autotester, which runs the Pull Request testing.

You can now request a “retest” of your pull request using the “AT: RETEST” label in GitHub.

Details: Reasons why you may want to request another set of tests against your pull request include:

a. The pull request testing previously failed for a non-code related issue. For example, a communication failure with GitHub.

b. The pull request testing previously failed due to a defect not directly related to your pull request and you want to test to see if the defect has been fixed with a PR that has been applied since your PR was last tested.

c. The pull request testing previously passed, but other PR’s have been applied since. The framework team is still contemplating if we should deem tests “stale” after some amount of time, or if we should require tests to be run immediately prior to applying a PR, but for now if there is any concern that a PR might not pass the testing with the current state of the develop branch, it is easy to request a new set of tests.

The largest current impediment to deploying pull request testing more fully is that the new testing driver for pull request testing that submits to CDash does not reliably return pass/fail correctly to GitHub in all situations.

Details: Until we can count on the new driver to reliably provide Pass/fail information for the corner cases we have discovered, we need to keep running the old-style jobs that do not report to the dashboard.

We are working on improved documentation for the Pull Request testing capability.

We remain committed to a phased deployment approach.

Details: Future planned improvements include deeming test results to be “stale” after a period of time (and requiring a new set of tests to run prior to merging), enforcing that all status checks pass prior to merge, A modified “automerge” concept that will give developers more control of requesting that things be merged when possible, and strengthened requirements for GitHub reviews that will also allow exceptions to be made when appropriate. Additionally, we are looking seriously at adding a CUDA build, and removing the GCC 4.9.3 build (as the errors caught with this build that are not caught by the 4.8.4 build are few).

Jim
On behalf of the Trilinos Framework Team

@mhoemmen
Copy link
Contributor

mhoemmen commented Mar 8, 2018

Thanks Jim! I would very much like to see a CUDA build, even if it only runs a small subset of tests.

bartlettroscoe added a commit that referenced this issue Mar 20, 2018
With topic branch wokflows more common, it is better to use --no-rebase by
default to be safe.

Build/Test Cases Summary
Enabled Packages: Teuchos
Disabled Packages: PyTrilinos,Claps,TriKota
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=137,notpassed=0 (2.71 min)
@bartlettroscoe
Copy link
Member

@jwillenbring, would we not consider this done? The auto PR testing system has been mandatory for several months now.

@bartlettroscoe
Copy link
Member

ThCan we close? We don't need lingering framework issues contributing to the bloat of 1000+ open Trilinos issue.

@jwillenbring
Copy link
Member Author

This has been done for some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework tasks Framework tasks (used internally by Framework team)
Projects
None yet
Development

No branches or pull requests

6 participants