-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix buildkite run so that unit tests run on GPU #739
Conversation
bf8038c
to
309d6f3
Compare
309d6f3
to
74e15a2
Compare
@charleskawczynski do you have any ideas about the bug up above? I can reproduce the error, and I tried added more parametric types to the
Im trying to use @code_warntype in the REPL but am not clearly seeing any "Any" pop up:
|
I pushed a commit! I think the issue was that we were reaching into global scope for the default argument |
@charleskawczynski it looks like this is still failing - do you have any other ideas? |
I am wondering if this will work after Teja's PR refactoring the rooting distribution? |
I'm not sure if this is the problem, but |
Also, for consistency, I suggest changing |
fd31f3b
to
81b5536
Compare
19c16b0
to
23b3da9
Compare
As @imreddyTeja discovered, changing I'm not sure what would cause this - one idea is that the fields of |
23b3da9
to
53c2de0
Compare
The GPU runtests pass when run on clima, but fail on central with error code 255, which this thread suggests is an out-of-memory error |
That's a bit odd, the single element tuple is basically an immutable version of a
I'm not sure. It's a bit difficult to identify the issue from the PR, could/should we make a reproducer? Looking One thought I had was in regards to grabbing the type information from an optional argument, this seems like it could be prone to inference issues. I might break that definition up.
Ok, can we try increasing the memory request on CI? |
I can work on making a reproducer. Which optional argument and definition are you referring to here?
I'll try increasing it |
@@ -175,14 +176,14 @@ function zenith_angle( | |||
t, | |||
start_date; | |||
cd_field = sfc_cds, | |||
insol_params::Insolation.Parameters.InsolationParameters{FT} = earth_param_set.insol_params, | |||
) where {FT} | |||
insol_params::Insolation.Parameters.InsolationParameters{_FT} = earth_param_set.insol_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one @juliasloan25
c416e2a
to
f2833ae
Compare
@charleskawczynski I tried playing around with the reproducer a bit, including pulling out the |
3752573
to
b945cdd
Compare
114848a
to
2b38375
Compare
6f8a39b
to
29204d4
Compare
29204d4
to
29c1e9c
Compare
This is required to enable GPU compatibility. Generally the pattern `Array(parent(...))` is discouraged, but using it to check analytic values in tests is okay.
This is required as a workaround to the type inference failure detailed in CliMA/ClimaCore.jl#2065. The failure appears when a function is broadcast over one field and two structs; the types are too complex and inference fails. This workaround introduces a wrapper struct to contain the two struct inputs, so that the function is broadcast over one field and only one struct, and the types are easier to infer.
29c1e9c
to
43801de
Compare
Purpose
Some of our buildkite runs were not running on GPU. Update buildkite so that they are.
closes #860
Content
This is required to enable GPU compatibility. Generally the pattern
Array(parent(...))
is discouraged, but using it to check analyticvalues in tests is okay.
This is required as a workaround to the type inference failure detailed
in inference failure when broadcasting over multiple structs and a Field ClimaCore.jl#2065. The failure appears
when a function is broadcast over one field and two structs; the types
are too complex and inference fails. This workaround introduces a wrapper
struct to contain the two struct inputs, so that the function is broadcast
over one field and only one struct, and the types are easier to infer.
Initial bug reported
experiments/integrated/performance/profile_allocations.jl
. Note that the same code runs on the Clima cluster on GPU. This is the error message:/central/home/kdeck/ClimaLand.jl/src/standalone/Soil/soil_heat_parameterizations.jl:65:
Called by:
Note that
typeof(hydrology_cm) <: AbstractSoilHydrologyClosure
and we have setBase.broadcastable(x::AbstractSoilHydrologyClosure) = tuple(x)
Content