Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix float invalidation error of eca_bgc #3303

Closed
wants to merge 1 commit into from

Conversation

jinyuntang
Copy link
Contributor

@jinyuntang jinyuntang commented Nov 7, 2019

By initializing col_plant_pdmeand_vr properly, it fixes
a NaN triggered error of float invalidation when running
ECACNP with DEBUG=TRUE.

[non-BFB]
Fixes #3296

By initializing col_plant_pdmeand_vr properly, it fixes
a NaN triggered error of float invalidation when running
ECACNP with DEBUG=TRUE.
bishtgautam added a commit that referenced this pull request Nov 8, 2019
By initializing col_plant_pdmeand_vr properly, it fixes
a NaN triggered error of float invalidation when running
ECACNP with DEBUG=TRUE.

[BFB]
Fixes #3296
@dqwu
Copy link
Contributor

dqwu commented Nov 10, 2019

@bishtgautam
This PR has caused two tests to fail on melvin:

ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC
ERS.f19_g16.I20TRGSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I20TRGSWCNPECACNTBC

See https://my.cdash.org/viewTest.php?onlydelta&buildid=1726125

Looks like baseline mismatch:

  comparing model 'cpl'
    /ascldap/users/jenkins/acme/scratch/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231/run/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231.cpl.hi.0001-01-12-00000.nc.base did NOT match /ascldap/users/jenkins/acme/scratch/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231/run/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231.cpl.hi.0001-01-12-00000.nc.rest
    cat /ascldap/users/jenkins/acme/scratch/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231/run/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231.cpl.hi.0001-01-12-00000.nc.base.cprnc.out
FAIL
...
  comparing model 'cpl'
    /ascldap/users/jenkins/acme/scratch/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231/run/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231.cpl.hi.0001-01-12-00000.nc did NOT match /sems-data-store/ACME/baselines/gnu/master/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC/cpl.hi.0001-01-12-00000.nc
    cat /ascldap/users/jenkins/acme/scratch/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231/run/ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC.C.JNextDeveloper20191108_221231.cpl.hi.0001-01-12-00000.nc.cprnc.out
FAIL

@jinyuntang
Copy link
Contributor Author

@dqwu and @bishtgautam, I suspect depending on the platform, the baseline may need to be regenerated. In the default code (i.e., the master), the array col_plant_pdemand_vr is not initialized properly, and depending on the compiler and platform, the value of col_plant_pdemand_vr at the 1st time step is totally random.

@bishtgautam
Copy link
Contributor

Ok, I'm going to change this PR to be non-BFB.

@bishtgautam bishtgautam added non-BFB PR makes roundoff changes to answers. and removed BFB PR leaves answers BFB labels Nov 10, 2019
bishtgautam added a commit that referenced this pull request Nov 12, 2019
@bishtgautam
Copy link
Contributor

@jinyuntang This PR broke the following ECA ERS tests:

  • ERS.f19_g16.I1850GSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I1850GSWCNPECACNTBC
  • ERS.f19_g16.I20TRGSWCNPECACNTBC.melvin_gnu.clm-eca_f19_g16_I20TRGSWCNPECACNTBC

See https://my.cdash.org/viewTest.php?onlyfailed&buildid=1727482. I have reverted this PR.

@jinyuntang
Copy link
Contributor Author

@bishtgautam and @rljacob , I saw the following line in the reports of both simulations.
Error: Computation time increase > 10 pct from baseline
How do we handle this usually? I assume I will leave this alone at the moment.

@rljacob
Copy link
Member

rljacob commented Nov 13, 2019

It worth looking at the timers more closely.

@rljacob
Copy link
Member

rljacob commented Jan 27, 2020

@jinyuntang did you look at the timers?

@jinyuntang
Copy link
Contributor Author

@rljacob, I did not get a chance to do this. But I have asked @qzhu-lbl to bundle it with his v2.ECA branch, because it also affects that branch.

@rljacob
Copy link
Member

rljacob commented Jan 27, 2020

So this PR should be closed because the work will be done in a different, future PR?

@jinyuntang
Copy link
Contributor Author

@rljacob, I think so.

@jinyuntang
Copy link
Contributor Author

@rljacob, but please wait till @qzhu-lbl makes this change in the v2.ECA code.

@rljacob
Copy link
Member

rljacob commented Jan 30, 2020

Closing this won't affect the other branch. We can leave the branch intact.

@rljacob rljacob closed this Jan 30, 2020
@jinyuntang jinyuntang deleted the jinyuntang/lnd/fix-NaN branch October 26, 2020 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix PR Land non-BFB PR makes roundoff changes to answers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

floating invalid error when running eca-bgc with debug=true
4 participants