Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dss operations within coupler pass an unused t argument. #396

Closed
akshaysridhar opened this issue Aug 17, 2023 · 12 comments · Fixed by #959
Closed

dss operations within coupler pass an unused t argument. #396

akshaysridhar opened this issue Aug 17, 2023 · 12 comments · Fixed by #959

Comments

@akshaysridhar
Copy link
Member

Following the CTS interface, the weighted_dss_slab! function definition requires a third t (time) argument which is unused within the ClimaCore.Spaces.weighted_dss! function. We may want to remove this from our general design pattern.

@LenkaNovak
Copy link
Collaborator

LenkaNovak commented Aug 18, 2023

@akshaysridhar @sriharshakandala @kmdeck

As @gdecker1 found earlier, Float32 runs now break in ClimaLSM due to t being specified as an FT type here. For some reason t is being returned as Float64 even if all arguments to the ODEProblem are passed as FT32. @dennisYatunin maybe you'll be familiar with the internals of this, do you think this could be an issue in ClimaTimeSteppers?

Screen Shot 2023-08-17 at 6 59 30 PM

For some reason our FT32 Buildkite was set with FT64, which is why we didn't catch this earlier! Error is reproducible if the driver on this branch is run interactively.

(Note, FT32 didn't cause problems in these interactive Coupler runs before the ClimaLSM.dss! PR. )

Unless we're doing something silly here, are you all happy to remove the type specification (like we did in the coupler #387) in ClimaLSM, before this issue gets resolved in ClimaTimeSteppers? As it stands, we can't test AMIP for FT32.

@akshaysridhar
Copy link
Member Author

Adding notes from Slack here: (Callbacks in the timestepping methods also require a (u,p,t) arg list)

@dennisYatunin
Copy link
Member

dennisYatunin commented Aug 18, 2023

You should probably just drop the unnecessary FT type restriction. The value of t is passed here from the integrator for the sake of debugging (e.g., printing @info "Value at $t: ..."); its type does not matter because it should not be used for anything else. For now, t always needs to be stored as a Float64 because Float32 does not have enough bits to accurately track time without roundoff error.

@kmdeck
Copy link
Member

kmdeck commented Aug 21, 2023

hi all - catching up here.
is this correct: ClimaLSM needs a PR which drops the type restriction on t in dss! but which keeps it as an argument.

We may need to do this in multiple functions because we had assumed that t would be the same type as that underlying the state vectors. In the past, I thought this was a desirable feature, that we wanted the type restriction to ensure the simulation took place entirely at float32 or at float64. we can first make the above change, and see if we run into further issues?

@LenkaNovak
Copy link
Collaborator

Hi @kmdeck , thanks for following up on this! 🚀 That's right! Looks like removing the type specification would be the quick (and probably only) fix. Thanks @dennisYatunin for the explanation. I agree that it would be better to have consistent types, but I also get the float precision issue. Do you guys have tests that run with Float32?

@kmdeck
Copy link
Member

kmdeck commented Aug 22, 2023

Hi @kmdeck , thanks for following up on this! 🚀 That's right! Looks like removing the type specification would be the quick (and probably only) fix. Thanks @dennisYatunin for the explanation. I agree that it would be better to have consistent types, but I also get the float precision issue. Do you guys have tests that run with Float32?

We have unit tests that run with Float32 but all of our integrations (in experiments, etc) use Float64, I think. Ill make sure we can run with Float32 before merging anything!

@kmdeck
Copy link
Member

kmdeck commented Aug 24, 2023

Hi @kmdeck , thanks for following up on this! 🚀 That's right! Looks like removing the type specification would be the quick (and probably only) fix. Thanks @dennisYatunin for the explanation. I agree that it would be better to have consistent types, but I also get the float precision issue. Do you guys have tests that run with Float32?

We have unit tests that run with Float32 but all of our integrations (in experiments, etc) use Float64, I think. Ill make sure we can run with Float32 before merging anything!

@LenkaNovak it turns out that we have implicit requirements that t be the same float type as the state in a lot of places. so it will be a bigger change for us to get this to run. For example, whenever we prescribe time varying BC (reanalysis data, etc), the BC ends up being the same type as t, or a combination of quantities that are of type (t) and of the state FT).

what is the priority on this?

@LenkaNovak
Copy link
Collaborator

Hmm... I thought you guys don't use the t variable in the locally defined dss! function like here. All we should need is for the t::FT to be changed to just t on that line. Or is that not always the case? 🤔

@kmdeck
Copy link
Member

kmdeck commented Aug 29, 2023

Hmm... I thought you guys don't use the t variable in the locally defined dss! function like here. All we should need is for the t::FT to be changed to just t on that line. Or is that not always the case? 🤔

The issue is actually that we assume t is the same type as the state in many places (not just here - also in time varying boundary conditions, using renalysis data, etc). We didnt adequately test running with Float32; I think since switching to ClimaTimesteppers we lost this capability/can only run with Float64.

@LenkaNovak
Copy link
Collaborator

I see, thank you for looking into this, @kmdeck . I think Atmos has similar problems, but it was on their radar to re-enable single precision soon. As for the question on priorities, I'm not totally sure. @cmbengue @tapios How much should we prioritize Float32 runs right now?

@tapios
Copy link

tapios commented Aug 29, 2023

I see, thank you for looking into this, @kmdeck . I think Atmos has similar problems, but it was on their radar to re-enable single precision soon. As for the question on priorities, I'm not totally sure. @cmbengue @tapios How much should we prioritize Float32 runs right now?

It depends on what right now means. It's not crucial, I think, for the 4-6 weeks or so. But beyond that, we'd like to be able to do coupled runs (on GPUs) in Float32.

@dennisYatunin
Copy link
Member

dennisYatunin commented Aug 29, 2023

It's also worth noting that we will probably continue to avoid using Float32 to represent t; it just leads to too much roundoff error for long simulations. The most viable options now are Float64 and maybe also DateTime (though using the latter would require changing a lot of code where we assume that t is a number). Perhaps the fastest solution for ClimaLSM would be to wrap t in FT(t) whenever it is passed from the integrator to a tendency function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants