Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All tracers go to NaN using a non-regular grid #5

Closed
jagoosw opened this issue Jul 26, 2022 · 25 comments
Closed

All tracers go to NaN using a non-regular grid #5

jagoosw opened this issue Jul 26, 2022 · 25 comments

Comments

@jagoosw
Copy link
Collaborator

jagoosw commented Jul 26, 2022

I've just tried running the subpolar example with the grid defined like:

stretching = 5
refinement = 2.5
h(k) = (k-1)/ Nz
ζ₀(k) = 1 + (h(k) - 1) / refinement
Σ(k) = (1 - exp(-stretching * h(k))) / (1 - exp(-stretching))
z_faces(k) = Lz * (ζ₀(k) * Σ(k) - 1)

grd = RectilinearGrid(
                size=(Nx, Ny, Nz), 
                x=(0,Lx),
                y=(0,Ly),
                z=z_faces)

and (having battled with it for ages thinking it was something else) I realized this is causing all of the tracers to go to NaN after the first iteration. I've tried making a minimal example of this but it runs fine with a tracer and simple forcing/boundary conditions, with the PAR updating, buoyancy, and with the turbulent closure.

So I can't work out what it is yet but it seems to only occur when the full model is used.

(Note: if you set the grid as above in the subpolar example the issue presents its self as the root finder failing in the air/sea flux boundary condition as that is the first place that the newly NaNed tracers get used. I thought the issue was the boundary condition its self for a while before I realized it was happening before it got there.)

@syou83syou83
Copy link
Collaborator

It works fine for me in the original version. What are the Nz and Lz you are using? And Δt in simulation = Simulation(model, Δt=200, stop_time=duration) ?

@jagoosw
Copy link
Collaborator Author

jagoosw commented Jul 27, 2022

Interesting, I had used Nz=50 and Lz=600. I'll have more of a look later

@jagoosw
Copy link
Collaborator Author

jagoosw commented Jul 31, 2022

I think the issue was the kelp not the rest of the model so this should be solved

@jagoosw jagoosw closed this as completed Jul 31, 2022
@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 8, 2022

Hi @syou83syou83,

Have you managed to make this work in the current subpolar.jl example? I've just tried again and got everything becoming NaN again.

Thanks

@jagoosw jagoosw reopened this Aug 8, 2022
@syou83syou83
Copy link
Collaborator

Hi @jagoosw ,
Which type of PAR you are using, PAR as a function or PAR as a field? I think I ran with PAR as a function and it works fine.

While I am trying to reproduce your subpolar_newpar.pdf with PAR as a field, I met a different issues. Basically, I tried to increase Kappa in the mixed layer from 0.01 to 0.1, which output an error
nested task error: DomainError with -0.05363415099443389. And I think I have to reduce time step further like smaller than 100 in this case.

But going back to your question, with PAR as a Field, and small Kappa, it works fine.

Do you want to meet and discuss it tomorrow?

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 8, 2022

Hi @syou83syou83,

I'm using the PAR field, I'm not sure if this is where the issue is coming from though although it does seem likely since that's the main difference between this version and your original.

Could you send the whole/more of the error message? If that's coming from the PAR callback that's interesting because I'm fairly sure nothing in that function is capable of being negative, except for the input chlorophyll concentration which shouldn't throw an error (since it's not logged or anything).

I'm not currently in Cambridge but could video call if you wanted to discuss it?

Jago

@syou83syou83
Copy link
Collaborator

Hi @jagoosw ,
I am running a case overnight and will provide more of the error message tomorrow. It might come from KernelAbstractions or wait function. I am not sure. I will let you know tomorrow. Thanks.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 9, 2022

Just came across this which may be helpful for debugging

simulation.callbacks[:nan_checker].func.erroring = true

Edit: the boundary condition convergence error seems to still get to it before this callback

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 9, 2022

To add to what you told me in your email, I also get NaNs with the regular grid if I make the timestep very large so I guess the stability is the problem here. (Thank you for your help working this out its not something I'd thought of at all!)

I'll experiment using a non regular grid and see if there is a good balance between viable timestep and coursness to optimise the runtime and see how the speed scales with each.

@johnryantaylor
Copy link
Collaborator

johnryantaylor commented Aug 9, 2022 via email

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 9, 2022

Thank you, I'll look into this in the morning!

@johnryantaylor
Copy link
Collaborator

a quick correction, the diffusive time step constraint will be dt=c*dz^2/kappa (no square root)

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 10, 2022

Hi @syou83syou83,

I can't seem to replicate what you emailed me yesterday (i.e. using refinement=5, stretching=2 with any of Nz = 150, 50, 33) and still get NaN values (leading to the convergence failure in the airseaflux), could you send your exact setup that ran?

Thanks,
Jago

@syou83syou83
Copy link
Collaborator

Hi @jagoosw,
I just pushed subpolar.jl with stretched grid. It should work. Please let me how it works on your end.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 10, 2022

Thank you, I've checked now and tried to run it exactly as you have it but am still getting the error. Did you change the root finding to use your loop method when you got it to work or is your version still using find_zero?

Thanks,
Jago

@syou83syou83
Copy link
Collaborator

Did you get the similar error as "nested task error: Roots.ConvergenceFailed("Algorithm failed to converge")? Can you try the loop method and see how it goes? I used both ways but ran them only for a few days.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 10, 2022

Its happening almost straight away for me (in less than 100 iterations), I will try the other way now.

I'm also struggling to see why this function would cause the error unless the H function sometimes has multiple roots and its giving negative roots, but then I don't see why this would only present its self with the irregularly spaced grid?

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 10, 2022

Okay I've run that now and although it doesn't throw that error all the tracers are still NaN, did you check the tracer values when you ran it with the old method and a non regular grid? I've had a play around with the pCO2 function I have and it gives the same values (as you would expect), so I don't think its finding H that is the issue and that is just where the error is first detected. I also use a different solubility factor but both my method and the original are order unity so don't think that is changing much.

I've tried it with the sinking turned off and get the same error, so tried making \kappa 0 which also didn't work, and then both off which also didn't work. I then tried making a regular grid with much smaller spacing (150 points over 200m so about 1.3m) and it runs for more time steps but in <100 NaNs again so it does seem to be related to the grid spacing rather than irregular spacing.

Finally, while I've been playing with the advection scheme I've changed it to use UpwindBiasedFifthOrder by default (now an optional arg in the setup function), since its meant to be a lot more efficient on CPU.

@syou83syou83
Copy link
Collaborator

Can you send me you subpolar.jl code through email? I will take a look at it.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 10, 2022

Will do, I'm using the boundaries branch though so it probably won't just run in the main branch

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 10, 2022

Another update: I've run it with the air/sea flux turned off, which fixed the issue so the problem is something todo with that. I also tried functional PAR and using a more accurate scheme but they didn't work.

None of turning off the BCs, changing to a more accurate advection scheme, or using the PAR function worked. For the former it ran because there was nothing to catch the NaN error but the tracers were still NaN

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 11, 2022

Think I've cracked it! The LOBSTER.getPAR function which is just a proxy for Oceananigans.Fields.interpolate fails when the location you put in is exactly on a grid point on an irregularly spaced grid. I confirmed this by temporarily fixing our code by rounding the z position passed to getPAR which runs.

I'll push this fix and then open a pull request on Oceananigans to fix the issue properly!

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 11, 2022

Edit: they've already fixed it in the bleeding edge version in their github repo but not in the version on the package manager, so I will leave the temporary fix and set myself a reminder to remove it once they update. I'll explore the CLF condition for the model now that I can make that be the limiting factor.

@johnryantaylor
Copy link
Collaborator

johnryantaylor commented Aug 11, 2022 via email

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 15, 2022

CFL numbers discussed here

@jagoosw jagoosw closed this as completed Aug 15, 2022
ciadht added a commit that referenced this issue Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants