-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modified vertical range for KEvertex calc to avoid NaNs #4945
modified vertical range for KEvertex calc to avoid NaNs #4945
Conversation
@alicebarthel, I will try to run some tests today or tomorrow. Thanks for submitting this! |
@alicebarthel -- just a note for next time that the branch naming convention is to use all lower-case letters... |
TestingI made a test merge of this branch with For E3SM testing, I ran
I also ran the compass ocean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good fix to have. In general, I think we don't want to be computing with invalid values.
@alicebarthel, thanks for putting this in. @jonbob, do you think my testing is sufficient or should we wait for @mark-petersen to return and do a review? |
@xylar - I think your testing is complete and there's no need to wait for @mark-petersen to review. I'll run E3SM tests before I merge and between the two we should have it covered. Thanks again for your help |
Sounds great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alicebarthel for making this PR and @xylar for testing. During my original debugging I ran numerous stand-alone nightly suites with comparisons, with rk4 and split explicit, with config_include_KE_vertex = .true.
. Specifically, I just ran the nightly suite with split explicit and this is bfb with master using gnu optimized stand-alone on badger.
My tests are failing for this PR. SMS_D_Ld3.T62_oQU120.CMPASO-IAF.chrysalis_intel complains about the following:
It may be necessary for the lower do-loop extent to be max(1,minLevelEdgeBot(iEdge))? |
It seems most likely that we need a check for valid |
It seems like this code block shows how to do this properly: E3SM/components/mpas-ocean/src/shared/mpas_ocn_vel_hmix_del4.F Lines 204 to 217 in 973a8f7
We need to use minLevelVertexBot and maxLevelVertexTop rather than minLevelEdgeBot and maxLevelEdgeBot .
@alicebarthel, would you like to make the corresponding changes? Then, we could talk through how to rerun the test that @jonbob ran that failed for him. |
Thanks @xylar for pointing out a relevant code snippet. I'll look into it and make the changes later today. |
I was able to reproduce @jonbob's error with the old code. |
@alicebarthel, that's great! I'll check with @jonbob in a bit but I think that will do for now. |
@alicebarthel - the tests no longer result in error messages, but they do indicate non-BFB behavior for both cases I ran so far:
The TestStatus for both shows:
|
Ok, I've thought about his more and now understand your comments above. @cbegeman I like your solution and I think it makes physical sense, but split_explicit gives non-zero values for KEvertex on vertices connecting land and ocean, so I think we would get in trouble changing them to zero (hence Jon's non BFB). |
status: waiting for a bug to be fixed. |
status: @mark-petersen is working to fix this. |
status: still waiting on @mark-petersen |
@alicebarthel, based on the comments here, it is better to set up RK4 the same way as split explicit, i.e. to make sure that normalVelocity is always zero on the edges. Then this error does not occur in the first place. I was able to reproduce this failure and then solve it with these changes: You can grab commit 3757046 from my fork, branch fix-rk4-by-zeroing-normalvelocity. @alicebarthel could you test this in your set-up? Thanks. |
@mark-petersen, this is a nice solution! @alicebarthel and I chatted about this branch today and came up with an alternative approach to the same block of code. We were thinking of using |
Yes, either would work fine. @alicebarthel you can push either of those fixes to this branch, or ask for help if needed. An advantage of using |
@mark-petersen As I mentioned this morning, I was also working on this. The solution @xylar mentioned is implemented on this new fix branch . |
@alicebarthel, this is great! I tested your new fix branch in stand-alone with the nightly suite on both intel debug and gnu debug with default flags, and then gnu debug with forced I then compared gnu optimized nightly suite between your fix branch and the branch point c5f8b37, and it compares bfb except that on RK4 tests, edges where both cell neighbors are land now have You can force push your branch to this PR, and I will approve. I'm also happy to post more details of my testing if you like. |
5a56c86
to
4911116
Compare
@mark-petersen Thank you so much for testing the BFB for split_explicit. I force pushed as requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approving based on @alicebarthel and @mark-petersen's testing and a visual inspecting.
@alicebarthel, a rebase may be needed to resolve a conflict. Do you want to give that a try? Or leave it for @jonbob to sort out during merge? (@jonbob, what's your preference?) |
@xylar and @alicebarthel - I'm happy to sort out that conflict when I merge this PR, if that's easier |
0b31efe
to
a9a39ad
Compare
I rebased it to the current master, manually resolved the conflict, and pushed it. @jonbob let me know if you need me to do anything else, and thanks for doing the merge! |
Do we need @philipwjones to re-review? Or is this good to go? |
@jonbob, I think we can go ahead. RK4 isn't really a target for the performance team so I think @philipwjones will be happy as long as we don't break anything (and we're not likely to do that in RK4 itself). |
I think @philipwjones's only comment was to say that the loop bounds changes were fine, but these have been removed in any case (see #4945 (comment)). |
@xylar - I did resolve that conversation hoping it would add Phil's review approval. Anyway, I'll test this and get it merged as soon as possible |
I don't think that's how GitHub works. I think commenting on the code automatically makes you a reviewer whether you intended that. So it's kind of important for people to either request changes or approve when they comment. Otherwise, it's hard to know if they still need to approve. But requesting changes sometimes feels rude because there's a big red flag. So I can see why people don't do it... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase was successful. Checked again by visual inspection and test compile. Still passes nightly suite with gnu debug.
…4945) Modified vertical range for KEvertex calc to avoid NaNs Modifies the vertical range over which KEvertex is calculated. The previous version had NaNs in normalVelocity and activeTracers which led to an immediate State Validation Fail when RK4 and config_include_KE_vertex were used together. Fixes #4933 [BFB]
test merge shows BFB results inside E3SM for:
merged to next |
merged to master |
Following @mark-petersen's suggestion, I modified the vertical range over which KEvertex is calculated.
The previous version had NaNs in normalVelocity and activeTracers which led to an immediate State Validation Fail when RK4 and config_include_KE_vertex were used together.
I successfully ran a 1-day test in E3SM (C-case).
@xylar @mark-petersen let me know if you'd like me to run other tests.
Fixes #4933
[BFB]