-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks running out of memory when they shouldn't #1664
Comments
@guillaumevernieres I recall a similar issue with the ocean analysis in UFS-RNR. I checked out configuration for C96 and this is what I have. the I haven't tested this on Orion, but this was true for Hera. This will however not work on the cloud (at least AWS) since SLURM does not support that directive with their distro/build. |
Thanks for checking @HenryWinterbottom-NOAA . I don't know what the optimum configuration would look like, but I don't think "exclusive" is it, not for tasks for which we have a rough idea of the memory footprint. I'll try to come up with memory estimates for these 3 tasks. |
I noticed this recently too, I think we need to add |
Haha ... |
@guillaumevernieres I think |
I need some C384/0.25° ICs on Hera so I can test. Can someone point me to them? |
Or maybe I can just use P8? |
I think @guillaumevernieres should have a C385/0.25 set up for cycling. For forecast only you could use P8 ICs - but avoid 2013010100 (as there's an instability it seems) |
These are all DA jobs (although gdasfcst seems to have already been fixed), so I'll need the cycling set. Thanks for confirming, @JessicaMeixner-NOAA Also @guillaumevernieres, how many threads should the ocnanal job use? |
@WalterKolczynski-NOAA I don't really need help for the marine DA but we can discuss the merit of my changes in my next PR. I have made changes following @NeilBarton-NOAA and @JessicaMeixner-NOAA 's suggestions for the forecast step, but I don't see similar changes in develop. When you say that the forecast was fixed, when did that happen? The analcalc job is failing for everybody I think, the quick fix is to up the memory requested, but I don't know if there is an underlying bug that needs to be addressed. |
@guillaumevernieres For the forecast, #1763 updated the number of threads for C384 and @CoryMartin-NOAA reported it now runs to completion. I was trying to fix the others, but I can't test solutions unless I have ICs. If you are already working on it, please assign yourself to this issue and I will move on to something else. |
@WalterKolczynski-NOAA , the PR you point to was for fixing the Give me a few mn to dig-out warm coupled IC so you can test the s2s forecast in cycling mode. |
It fixed the fcst as well AFAIK, by updating |
OK. Warm s2s IC's: |
Thanks, will check things out. |
Description
I'm running a lower res prototype GFSv17 with the gdas cycle only and the model at C384/0.25. The following tasks have been failing due to lack of memory:
gdasfcst
(see issue Some forecast resolutions fail on lower-mem HPC when using ESMF threading #1572). It's working with 40 nodes, it used to be fine with 19.gdasanalcalc
ocnanalrun
: I run that one regularly on 30 nodes outside of the g-w without issues. It fails with 40 nodes when submitted with thegw
It looks like all these tasks have
threads-per-core
limited to 1, which isn't right. Omitting this option forocnanalrun
task allows me to complete the analysis on 30 nodes. I wonder if that is also the issue for the other 2 tasks. I'll test tomorrow and update this issue.It could also just be bad luck since none of these tasks use the
--exclusive
option.Pinging @aerorahul and @WalterKolczynski-NOAA .
The text was updated successfully, but these errors were encountered: