Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Framework: Allow commits to non-code directories w/o autotesting #2594

Closed
csiefer2 opened this issue Apr 19, 2018 · 29 comments
Closed

Framework: Allow commits to non-code directories w/o autotesting #2594

csiefer2 opened this issue Apr 19, 2018 · 29 comments
Labels
Framework tasks Framework tasks (used internally by Framework team)

Comments

@csiefer2
Copy link
Member

I'm thinking specifically of cmake/ctest/drivers, which contains stuff for driving nightly tests and not code that the autotester can actually execute.

@csiefer2 csiefer2 added the Framework tasks Framework tasks (used internally by Framework team) label Apr 19, 2018
@jhux2
Copy link
Member

jhux2 commented Apr 19, 2018

@trilinos/framework

@bartlettroscoe
Copy link
Member

This should be done at the same time as moving to the TriBITS code that automatically knows about mapping of directories to packages.

@csiefer2
Copy link
Member Author

@bartlettroscoe And the timeframe for that is?

@bartlettroscoe
Copy link
Member

And the timeframe for that is?

This is a question for the @trilinos/framework team.

@csiefer2
Copy link
Member Author

@trilinos/framework This is starting to get really annoying. We've been looking at 8+ hour delays on the autotester so far this week. Can you bump up the priority of the TriBITS upgrades (or throw more resources at the autotester)?

@jmgate
Copy link
Contributor

jmgate commented Apr 24, 2018

Roughly how many PRs are we seeing per day?

@csiefer2
Copy link
Member Author

Yesterday we had 7 new ones and the autotester is still working through the backlog from yesterday...

@csiefer2
Copy link
Member Author

... and 3 old ones that left WIP

@jmgate
Copy link
Contributor

jmgate commented Apr 24, 2018

Is "we" all of Trilinos, or Tpetra, MueLu?

@csiefer2
Copy link
Member Author

Trilinos as a whole. It would be simple to scrape the PR page to do a graph of PR's merged per day. That might be a good thing for the framework team to track (if you do need more resources, that'll be how you justify it).

@bartlettroscoe
Copy link
Member

As a data-point, a simple re-enable of some Anasazi tests for some ATDM Trilinos builds in PR #2621 which was created at 3:59 PM MT did not have the testing completed until 11:07 PM MT. That is not quite 8 hours but it is close to it.

I worry about the added time that an Intel 17 build will add to the auto PR tester. The Intel 17.0.1 build and tests takes almost 4 hours to complete as mentioned at #2463 (comment) and that is on 16 cores of a very fast machine.

@csiefer2
Copy link
Member Author

I'm still waiting for tests of something I submitted yesterday at COB to start. Eight hours was being conservative :P

@bartlettroscoe
Copy link
Member

I think a better statistic is the number of PR builds started in the last 24 hours shown at:

That shows 24 PR builds (one GCC 4.8.4 and one GCC 4.9.3) so that means 12 iterations of the auto PR tester started.

The Intel 17.0.1 build will be on Jenkins machines and will therefore complete with these existing builds.

The CUDA build will run on 'white' and will not compete with these builds so that should not be a problem.

@bartlettroscoe
Copy link
Member

I'm still waiting for tests of something I submitted yesterday at COB to start. Eight hours was being conservative :P

@csiefer2 is it not starting because of some reason other than a backup of the auto PR tester? See:

@csiefer2
Copy link
Member Author

@bartlettroscoe Nope. Just backlog.

@allevin
Copy link

allevin commented Apr 24, 2018

@csiefer2 I investigated why your job did not start. It seems a number of Jenkins jobs that run the autotester are disabled. The @trilinos/framework team is investigating.

@csiefer2
Copy link
Member Author

Jenkins messed stuff up? How am I not surprised...

@csiefer2
Copy link
Member Author

@trilinos/framework Do you guys have stats the mean/std deviation of the number of testable PR's per day?

How many sigmas before we can expect the autotester to be overwhelmed (if Jenkins is working)?

@prwolfe
Copy link
Contributor

prwolfe commented Apr 24, 2018

@csiefer2 Honestly, we have yet to hit the expected limit. Jim and I did some work profiling recently and it looks like we have some additional capacity (which is why I am working on the intel Pr build.)

As for what would be the limit, it depends on the rate and timing, so I can't really give you an answer other than that we will be monitoring this.

@prwolfe
Copy link
Contributor

prwolfe commented Apr 24, 2018

While it seems easy, determining if a given arbitrary file should trigger a rebuild is not trivial. In fact I found a few shell scripts designed to modify the checkout and in one case the code. I will talk with Aaron and Jim about the specific directory mentioned, but there is no policy or technical control to keep the scripts there from changing in nature.

I would favor eliminating some code areas/directories by policy, but not attempting to understand the implications of the given commit as that can be a significant undertaking. The directories chosen should have some marker (README or other note) indicating to the development team that changes at this level will not be PR tested.

At a glance I see a few things that are candidates to not trigger testing

sampleScripts
docs

Stuff I'm less sure about

commonTools
cmake/* - several of these do impact stuff, but you claim at least one does not.
demos - I'm pretty sure we don't build this even if we should.

@csiefer2
Copy link
Member Author

cmake/ctest/drivers only impact testing...

@csiefer2
Copy link
Member Author

@prwolfe Could we allow individual packages to specify their own "non-code" directories? This has to be agglomerated somewhere, of course...

@prwolfe
Copy link
Contributor

prwolfe commented Apr 24, 2018

Not sure - let me do some digging.

@jwillenbring
Copy link
Member

@csiefer2 's build sat queued too long due to a mistake on my part. I will avoid similar mistakes in the future.

As @prwolfe noted, our current testing resources are sufficient for processing the pull requests we need to process. We took steps to decreases the chances of pull requests having to wait behind other pull requests, and will make an even more aggressive change by the end of tomorrow. This will create a small risk of over running our resources, but that risk is quite small and a suitable message will be added as a comment to the PR if we would run out of resources.

We will see how much this improves the delays we are seeing and then prioritize improvements in logic to decide how much of Trilinos needs to build accordingly.

@jwillenbring
Copy link
Member

This should be done at the same time as moving to the TriBITS code that automatically knows about mapping of directories to packages.

@bartlettroscoe how would the TriBITS capability work for builds like the CUDA one where we want to consider enabling only about 1/2 of Trilinos packages?

@bartlettroscoe
Copy link
Member

@bartlettroscoe how would the TriBITS capability work for builds like the CUDA one where we want to consider enabling only about 1/2 of Trilinos packages?

We would just put in forced disables of non ATDM packages. That is basically adding a new cache var
called something like ATDM_FORCE_DISABLE_DISABLED_PACKAGES and a one-line change right at:

What that would do is if ROL was changed in the PR, for example, and you passed in -DTrilinos_ENABLE_ROL=ON, then it would turn off ROL inside of the cmake configure of Trilinos and you would get zero tests defined. But the auto tester can already handle zero tests? It think it is that easy.

Let's talk at our regular meeting time on Monday about getting the current ATDM CUDA build on 'white' up as a PR build with just the 25 package that are currently being tested. That has been kept clean and it caught a defect in SEACAS that all of the SIERRA testing that was done was missed. If this CUDA build a (with tests) would have been in auto PR testing, then that SEACAS update would have never gotten into the 'develop' branch and we would have saved people bunch of time due to this update (see #2650).

@jwillenbring
Copy link
Member

This has been working for some time. I am closing this ticket.

@jhux2
Copy link
Member

jhux2 commented Nov 2, 2018

Could we allow individual packages to specify their own "non-code" directories? This has to be agglomerated somewhere, of course...

@trilinos/framework Was this considered? Just curious.

@bartlettroscoe
Copy link
Member

Could we allow individual packages to specify their own "non-code" directories? This has to be agglomerated somewhere, of course...

@trilinos/framework Was this considered? Just curious.

@jhux2, I discussed this with @william76 a few months ago as part of #3133. Basically we could extend TriBITS to support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework tasks Framework tasks (used internally by Framework team)
Projects
None yet
Development

No branches or pull requests

7 participants