Convert ERP test to SystemTestsCompareTwo infrastructure? #448

billsacks · 2016-08-24T16:28:21Z

We want to convert the various system tests that compare two runs to use the SystemTestsCompareTwo infrastructure, where possible. I am opening separate issues for each such system test, so that we can check them off as we do this conversion.

The ERP test would be less straightforward to convert to this infrastructure, since currently it makes use of setting CONTINUE_RUN which doesn't work if we're using a clone-based test. On the other hand, the criticality and complexity of the ERP test argue for translating it to this new infrastructure, which will make it more maintainable and less error-prone.

@mvertens suggests changing its behavior so that, rather than doing a CONTINUE_RUN, it instead uses a branch run for the second case. For this to work, we'd need to introduce a new hook into the SystemTestsCompareTwo infrastructure: a _pre_run_two hook that is executed between the case1 run and the case2 run. For the ERP test, this would need to copy (or link) the relevant restart files from the case1 run directory to the case2 run directory.

mfdeakin-sandia · 2016-11-16T01:12:36Z

What are the relevant restart files, or how can I find out their names? ERP.run_phase copies some XML files to the LockedFiles directory and what looks like the CESM/ACME executable, but doesn't copy anything which looks like a restart file. It does call case_setup again, but that doesn't seem related either

jedwards4b · 2016-11-16T01:18:22Z

I suggest that we run the st_archive on case1 and retrieve them from the archive directory for case 2. The st_archiver already knows what files are required.

mfdeakin-sandia · 2016-11-16T01:23:16Z

Alright, I'll look into how I can do that. I assume it's not going to be as simple as the hacky run_cmd_no_fail("case.st_archive")?

jedwards4b · 2016-11-16T01:27:28Z

not as hacky as that just need to set st_archive argument in run_indv(self, suffix="base", st_archive=False) to True. You would do this for the first run and then retreve all of the files from the archive restart directory into case2 run directory.

mfdeakin-sandia · 2016-11-16T01:32:45Z

Hmm. I was hoping to be able to do this all in the _pre_run_hook method @billsacks and @mvertens mentioned. I guess adding more default constructor parameters for SystemTestsCompareTwo will work for this, though.

jedwards4b · 2016-11-16T01:37:42Z

system_tests_compare_two.py isn't calling run_indv with all the available arguments, i think that you just need to add that st_archive argument there to make it available to tests using that method.

billsacks · 2016-11-16T02:50:41Z

I'm a bit hesitant about using the short-term archiver here: The ERP test already does so much that it can be challenging to figure out what went wrong when it fails. Adding the short-term archiver into the mix adds an additional potential failure mode. Or multiple new failure modes if this is done via an additional job... I can't tell if that's what you're suggesting here, but I would at least vote against having this be done via multiple batch jobs if possible.

On the other hand, I see @jedwards4b 's point that that might be the easiest way to be sure you're picking up all the necessary restart files.

The one other idea that comes to mind is refactoring the short-term archiver to extract a clean function that gives you the list of all restart files. It looks like this function would contain some of what's currently in case_st_archive.py: _archive_restarts. Then this function could be called by the short-term archiver. In addition, it could be called directly by the ERP test to get the list of restart files that should be copied (or linked) into the second case.

All that said: I guess I'm okay with sticking with the idea of using the short-term archiver in the ERP test, as long as it doesn't require submitting a separate job.

jedwards4b · 2016-11-16T13:48:39Z

I would override the run_phase method from system_tests_compare_two with one that calls run_indv with the st_archive flag set for the first run and then copies (or links) the files from the archive restart directory into the case2 run directory before starting run2. This is the only local method that this class should need, all the rest can be inherited.

billsacks · 2016-11-16T14:45:28Z

I wouldn't override run_phase: the design is that specific tests should fill in hooks provided by this method, but should not override this method: If you override this method, you'll end up needing to copy & paste into erp.py, which will be a maintenance problem, and could lead to unintentional divergence.

But I see @jedwards4b 's overall point. I would just instead do this by:

Adding new arguments to the SystemTestsCompareTwo constructor saying whether st_archive should be done for case1 and case2. i.e., this would look like:

  def __init__(self,
               case,
               separate_builds,
               run_one_st_archive = False,
               run_two_st_archive = False,
               run_two_suffix = 'test',
               run_one_description = '',
               run_two_description = ''):

and then in the run_phase method:

      self.run_indv(suffix = self._run_one_suffix, st_archive = run_one_st_archive)

and similarly for case2.

Add a _pre_run2 hook. By default (i.e., in SystemTestsCompareTwo) this would just be implemented as 'pass' (similar to _common_setup). The erp test could override this to do the necessary work.

mfdeakin-sandia · 2016-11-17T17:27:59Z

I've set it up to run the short term archiver (the same way Bill prefered and I alluded to) for case one, but I'm unfortunately not even getting to that point in run_indv. Instead, the runs are failing in _component_compare_copy, as it doesn't seem to be finding any history files. What part of the run can I look at to see why they aren't being created?

jgfouca · 2016-11-17T17:49:47Z

hist_utils has routines that find history files, look at what's in the directory and find the history files then see why the filenames are not being recognized.

mfdeakin-sandia · 2016-11-17T18:14:20Z

I've been looking at hist_utils, and it's looking for files with the regex suffix ".h?.nc". There are no such files in the run directory. The only netcdf files created are "run/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_110448_8lees5.cpl.r.0001-01-04-00000.nc" and "run/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_110448_8lees5.dice.r.0001-01-04-00000.nc"

jgfouca · 2016-11-17T18:19:26Z

OK, then the test is misconfigured and it's not running long enough to make a hist file.

jgfouca · 2016-11-17T18:21:06Z

You'll need to look at settings like HIST_OPTION, HIST_N, STOP_OPTION, STOP_N

mfdeakin-sandia · 2016-11-17T18:21:46Z

That does seem to be the case. I thought I had specified all of the configuration options, but it looks like some were hidden for some reason.

jgfouca · 2016-11-17T18:23:30Z

Look at the config_tests.xml file.

mfdeakin-sandia · 2016-11-17T18:27:08Z

I was :), but the only difference between it and master is that I added <.. SMP_BUILD >0</...>. I also matched (or attempted to) the settings done in the run and build phases of the old ERP test. So why else would this change?

jgfouca · 2016-11-17T18:27:50Z

Not sure, it's possible the test has been broken all along. you may want to confirm that

billsacks · 2016-11-17T18:37:43Z

The ERP test is missing this in config_tests.xml:

    <HIST_OPTION>$STOP_OPTION</HIST_OPTION>
    <HIST_N>$STOP_N</HIST_N>

Take a look at some other tests (NCK, PEM, etc.) for an example.

billsacks · 2016-11-17T18:39:15Z

It looks like these things had been set in erp.py:

                self._case.set_value("REST_N", rest_n)
                self._case.set_value("REST_OPTION", stop_option)
                self._case.set_value("HIST_N", stop_n)
                self._case.set_value("HIST_OPTION", stop_option)
                self._case.set_value("CONTINUE_RUN", False)

REST_OPTION & REST_N are specific to the erp test, so should be kept in the python. HIST_N, HIST_OPTION and CONTINUE_RUN should be set in config_tests.xml, following what we're moving to for other tests.

mfdeakin-sandia · 2016-11-17T18:40:17Z

Ah, I did miss them. That explains my confusion.

mfdeakin-sandia · 2016-11-17T21:06:01Z

Alright, I have the short term archive files from the archive directory. Do I need to copy all of the netcdf files in the tree, or just the ones in the hist directories? And where in the second runs case directory do I put them?

jedwards4b · 2016-11-17T21:19:04Z

Under the rest directory there will be a directory with a date format 0001-01-03-00000 or something like that - you want all the files from that directory to go into the second case rundir.

mfdeakin-sandia · 2016-11-17T21:56:36Z

I think I've gotten everything working. Just to verify, these are the files copied and where I'm copying them to:

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/rpointer.rof
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/rpointer.rof

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/rpointer.ice
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/rpointer.ice

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.dice.r.0001-01-04-00000.nc
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.dice.r.0001-01-04-00000.nc

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/rpointer.ocn
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/rpointer.ocn

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.cpl.r.0001-01-04-00000.nc
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.cpl.r.0001-01-04-00000.nc

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.docn.rs1.0001-01-04-00000.bin
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.docn.rs1.0001-01-04-00000.bin

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/rpointer.atm
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/rpointer.atm

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.datm.rs1.0001-01-04-00000.bin
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.datm.rs1.0001-01-04-00000.bin

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/rpointer.drv
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/rpointer.drv

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.dice.rs1.0001-01-04-00000.bin
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.dice.rs1.0001-01-04-00000.bin

/home/mdeakin/acme/scratch/archive/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz/rest/0001-01-04-00000/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.drof.rs1.0001-01-04-00000.bin
/home/mdeakin/acme/scratch/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.test/run/ERP_Ld3.ne16_ne16.A.sandia-srn-sems_gnu.20161117_145035_vb78lz.drof.rs1.0001-01-04-00000.bin

jedwards4b · 2016-11-17T22:01:39Z

Keep in mind that the exact set of files depend on the compset and the exact name of the rest directory depends on the time-date stamp. Having said that you should only expect a single subdirectory under rest and if you copy all of the files from that subdirectory to the run directory of the second case you should have everything you need.

jedwards4b · 2016-12-08T09:11:28Z

PR #824 determined that this was an ill-concieved issue and we should not refactor this test.

jgfouca assigned mfdeakin-sandia Oct 6, 2016

rljacob added the ready label Oct 20, 2016

mfdeakin-sandia mentioned this issue Nov 17, 2016

mfdeakin-sandia/systemtestscomparetwo/erp #824

Closed

ghost added in progress and removed ready labels Nov 17, 2016

jedwards4b closed this as completed Dec 8, 2016

ghost removed the in progress label Dec 8, 2016

This was referenced Jun 6, 2017

Need better tests of system tests #1640

Closed

Would this be a way we could convert ERS and ERP to SystemTestsCompareTwo? #1647

Closed

billsacks mentioned this issue Aug 21, 2017

Rewrite ERP and ERS to use SystemTestsCompareTwo #1830

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert ERP test to SystemTestsCompareTwo infrastructure? #448

Convert ERP test to SystemTestsCompareTwo infrastructure? #448

billsacks commented Aug 24, 2016

mfdeakin-sandia commented Nov 16, 2016 •

edited

Loading

jedwards4b commented Nov 16, 2016

mfdeakin-sandia commented Nov 16, 2016

jedwards4b commented Nov 16, 2016

mfdeakin-sandia commented Nov 16, 2016

jedwards4b commented Nov 16, 2016

billsacks commented Nov 16, 2016

jedwards4b commented Nov 16, 2016

billsacks commented Nov 16, 2016

mfdeakin-sandia commented Nov 17, 2016

jgfouca commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

jgfouca commented Nov 17, 2016

jgfouca commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

jgfouca commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016 •

edited

Loading

jgfouca commented Nov 17, 2016

billsacks commented Nov 17, 2016

billsacks commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

jedwards4b commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

jedwards4b commented Nov 17, 2016 •

edited

Loading

jedwards4b commented Dec 8, 2016

Convert ERP test to SystemTestsCompareTwo infrastructure? #448

Convert ERP test to SystemTestsCompareTwo infrastructure? #448

Comments

billsacks commented Aug 24, 2016

mfdeakin-sandia commented Nov 16, 2016 • edited Loading

jedwards4b commented Nov 16, 2016

mfdeakin-sandia commented Nov 16, 2016

jedwards4b commented Nov 16, 2016

mfdeakin-sandia commented Nov 16, 2016

jedwards4b commented Nov 16, 2016

billsacks commented Nov 16, 2016

jedwards4b commented Nov 16, 2016

billsacks commented Nov 16, 2016

mfdeakin-sandia commented Nov 17, 2016

jgfouca commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

jgfouca commented Nov 17, 2016

jgfouca commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

jgfouca commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016 • edited Loading

jgfouca commented Nov 17, 2016

billsacks commented Nov 17, 2016

billsacks commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

jedwards4b commented Nov 17, 2016

mfdeakin-sandia commented Nov 17, 2016

jedwards4b commented Nov 17, 2016 • edited Loading

jedwards4b commented Dec 8, 2016

mfdeakin-sandia commented Nov 16, 2016 •

edited

Loading

mfdeakin-sandia commented Nov 17, 2016 •

edited

Loading

jedwards4b commented Nov 17, 2016 •

edited

Loading