Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVK test failing #5145

Open
rljacob opened this issue Aug 18, 2022 · 14 comments
Open

MVK test failing #5145

rljacob opened this issue Aug 18, 2022 · 14 comments
Assignees
Labels
non-BFB PR makes roundoff changes to answers. PotentialBug Testing Anything related to unit/system tests

Comments

@rljacob
Copy link
Member

rljacob commented Aug 18, 2022

The MVK non-bfb test has been failing for a while on Chrysalis.

https://my.cdash.org/test/60574720

Seems to be timing out?

@rljacob rljacob added PotentialBug Testing Anything related to unit/system tests labels Aug 18, 2022
@rljacob
Copy link
Member Author

rljacob commented Aug 18, 2022

Possibly same problem as #5122 and so should have same solution #5125

@mkstratos
Copy link
Contributor

mkstratos commented Aug 18, 2022

@rljacob In my testing, it's almost certainly the same issue. I've been able to successfully run the MVK test with the #5125 #5123 fix.

@rljacob
Copy link
Member Author

rljacob commented Mar 29, 2023

Update: at some point MVK started passing sometimes on Chrysalis. I don't think the #5125 (edit) fix was ever applied so not sure why. But it sill fails I think because its asking for to many nodes and not running.

Screenshot 2023-03-29 at 12 08 27 PM

@sarats sarats added the non-BFB PR makes roundoff changes to answers. label Mar 29, 2023
@jonbob
Copy link
Contributor

jonbob commented Mar 29, 2023

@rljacob -- we did merge #5123 last December, so it's not that

@sarats
Copy link
Member

sarats commented Mar 29, 2023

Looking at latest https://my.cdash.org/test/76691852
Is this relevant?

Model elm no file specified for finidat

Perhaps not. Scrolling back, I see a test failure after running (above didn't run) which had similar warning(?) posted.
https://my.cdash.org/test/76461137

@jonbob
Copy link
Contributor

jonbob commented Mar 29, 2023

But it did take some effort to get the results blessed, because #5123 changed output file names and blessing ti required some special commands from @jgfouca:
SES-2269 09/Feb/23
So that's when it started to pass, at least sometimes

@jonbob
Copy link
Contributor

jonbob commented Mar 29, 2023

And at least some of the FAILs look like:

2023-03-01 05:37:33: BASELINE FAIL for test 'JNextAtm_nbfb20230301_003435'.
    Test status: fail; Variables analyzed: 121; Rejecting: 24; Critical value: 13; Ensembles: statistically different
    EVV results can be viewed at:
        https://web.lcrc.anl.gov/public/e3sm/e3smtest/evv/MVK_PS.ne4_oQU240.F2010.chrysalis_intel.C.JNextAtm_nbfb20230301_003435/index.html

@rljacob
Copy link
Member Author

rljacob commented Mar 29, 2023

The current run of that test is still in the queue:
306069 compute test.MVK e3smtest PD 0:00 90 (Priority)

The number of nodes needs to be reduced. chrysalis is to busy to get 90 nodes every night.

@rljacob
Copy link
Member Author

rljacob commented Mar 29, 2023

@jonbob I meant #5125 has not been applied to MVK. I edited the comment.

@sarats
Copy link
Member

sarats commented Mar 29, 2023

I think test failures are okay, it means that something on next changed. I guess lack of compute nodes is the reason for "not running" then.

@salilmahajan
Copy link

Adding @mkstratos to this thread. These tests were tuned for v1 and we are currently working to check its sensitivity with v2 - to ensure that when they fail (when its not timing out), the failure is more likely due to change in climate statistics.

@mkstratos
Copy link
Contributor

I think there may also be failures due to memleak, like this one which passes statistically: https://my.cdash.org/test/76012860

@sarats
Copy link
Member

sarats commented Mar 29, 2023

I thought memleak need not necessarily result in test failure.

@rljacob
Copy link
Member Author

rljacob commented Mar 29, 2023

Right memleak is not a fail. https://my.cdash.org/test/76012860 should be reported as PASS but I think the string "OLD FAIL" on the last line is being interpreted as a test fail. That's a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-BFB PR makes roundoff changes to answers. PotentialBug Testing Anything related to unit/system tests
Projects
None yet
Development

No branches or pull requests

5 participants