Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with run Hydro and fire together #654

Closed
JunyanDing opened this issue May 17, 2020 · 7 comments · Fixed by #665
Closed

Problem with run Hydro and fire together #654

JunyanDing opened this issue May 17, 2020 · 7 comments · Fixed by #665

Comments

@JunyanDing
Copy link
Contributor

I am testing hydro with fire, but got the same weird outcome. The model stopped producing outcome after 18 months but still seems to be running. But there are no error messages in log files.
When I ran Hydro and fire separately, both runs were completed successfully.
I attached the run script, parameter file, the log files, and the output.
I am using the master branch, the CLM hash is fe16302 and FATES hash is 8328198

testfire.zip

@rgknox
Copy link
Contributor

rgknox commented May 19, 2020

Thanks for bringing this up, this is indeed a weird error. Can you confirm with the same run, with fire turned off, that you do not get the problem?

@JunyanDing
Copy link
Contributor Author

JunyanDing commented May 19, 2020 via email

@xuchongang
Copy link
Contributor

@rgknox , @JunyanDing , I want to confirm that I saw the exact some problem on my side. The model hanges on a specific day when fire happens. The model runs with fire only and no hydro, or hydro only but no fire. I am looking into it more to see if I can find the bug with totalview. Will keep you updated.

@rgknox
Copy link
Contributor

rgknox commented Jun 5, 2020

I'm wondering if this has to do with the burning of live crown leaves and the sudden drop in volume that occurs. Maybe its also possible that we aren't accounting for the water in the lost tissues correctly and its triggering a mass balance check error.

@JunyanDing the logs show no errors. Could you try reproducing this run with DEBUG=TRUE? Perhaps that will generate meaningful log messages.

@jkshuman
Copy link
Contributor

jkshuman commented Jun 5, 2020

@rgknox a heads up that we are talking about two different versions here, though I think the problem may be consistent across the branches. the version that @JunyanDing is using is master running with Hydro, and this branch does not use Hydro details within the SPITFIRE routine. the version that @xuchongang is using does feed information from Hydro into SPITFIRE for a new surface fuel pool. It appears that both have zero active fire when run with Hydro and fire active.

@JunyanDing
Copy link
Contributor Author

JunyanDing commented Jun 5, 2020

I have tried DEBUG=TRUE. But Hydro diagnostic does not work with DEBUG on due to an earlier issue.
Below is the error message in CESM log when DEBUG set to TRUE. (The corresponding line of the code in the module of the current master branch is https://github.com/NGEET/fates/blob/master/main/FatesHistoryInterfaceMod.F90#L1968)

forrtl: error (65): floating invalid
Image PC Routine Line Source
cesm.exe 0000000003A2A10D Unknown Unknown Unknown
cesm.exe 0000000003A27FA7 Unknown Unknown Unknown
cesm.exe 00000000039CA8E4 Unknown Unknown Unknown
cesm.exe 00000000039CA6F6 Unknown Unknown Unknown
cesm.exe 000000000394D866 Unknown Unknown Unknown
cesm.exe 0000000003958637 Unknown Unknown Unknown
Unknown 00002B4938555630 Unknown Unknown Unknown
cesm.exe 00000000039C932B Unknown Unknown Unknown
cesm.exe 00000000013FA22B fateshistoryinter 1970 FatesHistoryInterfaceMod.F90
cesm.exe 000000000092406B clmfatesinterface 1322 clmfates_interfaceMod.F90
cesm.exe 0000000000892355 clm_initializemod 667 clm_initializeMod.F90
cesm.exe 0000000000827392 lnd_comp_mct_mp_l 233 lnd_comp_mct.F90
cesm.exe 000000000045D463 component_mod_mp_ 267 component_mod.F90
cesm.exe 0000000000424249 cime_comp_mod_mp_ 1237 cime_comp_mod.F90
cesm.exe 0000000000453850 MAIN__ 114 cime_driver.F90
cesm.exe 0000000000411EDE Unknown Unknown Unknown
libc.so.6 00002B4938784545 Unknown Unknown Unknown
cesm.exe 0000000000411DE9 Unknown Unknown Unknown

mpirun noticed that process rank 0 with PID 104913 on node n0313.lr3 exited on signal 6 (Aborted).

@xuchongang xuchongang mentioned this issue Jun 18, 2020
4 tasks
@glemieux glemieux linked a pull request Jul 2, 2020 that will close this issue
4 tasks
@glemieux
Copy link
Contributor

glemieux commented Jul 2, 2020

Closing per #665

@glemieux glemieux closed this as completed Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants