-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement multidimensional initial guess and bounds for curvefit
#7821
Implement multidimensional initial guess and bounds for curvefit
#7821
Conversation
Hm the doctest failed because the result is off in the last decimal place. I can't reproduce it, even though I have the same versions of numpy |
I just noticed that the docs for |
This looks pretty good to me on first glance! I would vote to do |
curvefit
curvefit
@@ -8696,13 +8735,21 @@ def _wrapper(Y, *coords_, **kwargs): | |||
else: | |||
name = f"{str(name)}_" | |||
|
|||
input_core_dims = [reduce_dims_ for _ in range(n_coords + 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I factored this out of the function call because Black was suggesting something like
apply_ufunc(
...,
input_core_dims=[reduce_dims_ for _ in range(n_coords + 1)]
+ [
[] for _ in range(3 * n_params)
], # core_dims for p0 and bounds
...
)
which I found quite ugly.
(lb + 1) * int(lb_finite & ~ub_finite), | ||
(ub - 1) * int(~lb_finite & ub_finite), | ||
] | ||
p0 = where( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative I considered here (see cd685ba) is
p0 = np.nansum(
[
0.5 * (lb + ub) * (lb_finite & ub_finite),
(lb + 1) * (lb_finite & ~ub_finite),
(ub - 1) * (~lb_finite & ub_finite),
],
axis=0,
)
which is closer to the original, but it gave warnings about the multiplication (the original version also issues warnings if you remove the int()
cast). Not sure which is clearer.
I implemented the broadcasting for bounds also. I hope it's not too ugly. Do you think the signature for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that the nested where is a bit ugly and hard to wrap my head around, but it works and I don't immediately have a better suggestion. Tests look good. I think it's safe to narrow the arg typing as well.
295574e
to
6e4aef7
Compare
I updated the type hints now (and also did a rebase just in case). |
48a5b7e
to
48cea82
Compare
48cea82
to
ad3147a
Compare
|
||
def _wrapper(Y, *coords_, **kwargs): | ||
def _wrapper(Y, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _wrapper(Y, *args, **kwargs): | |
def _wrapper(Y, coords: list[?], p0: list[?], lb: list[?], ub: list[?], **kwargs): |
The argument handling below is hard to read and looks prone to error, is a more explicit version possible?
Might have to remove the *
in apply_ufunc as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I'm not sure it's possible. The signature is apply_ufunc(func, *args, ...)
, which only broadcasts each element in *args
, so it doesn't work if the element itself is a list, like in the case of the initial guess. If a
, b
and c
are float | DataArray
, then we can only broadcast them like f(a, b, c)
, and not like f([a, b, c])
.
Let me know if you think the code could be made easier to follow somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I found a way to do it: if we convert the outer list to an additional dimension for coords_
and for each of the params & bounds, and add that additional dimension to input_core_dims
, then we can pass array-valued arguments to _wrapper
with apply_ufunc
. I'm not sure if it's more readable though.
You can see the implementation here (see here for the diff compared to this version, and here for compared to main (note: you have to click to the Files tab, I wasn't able to link straight to the line in the diff)).
It still needs some work:
- I now use
Dataset(...).to_array(dim_name)
to do the broadcasting + adding a new dimension, but is this a good way to do it? Another way I could think of was something likeconcat([coord.expand_dims(new_dim=[i]) for i, coord in enumerate(coords_)], "new_dim")
. Both options are not very easy to understand, and probably have quite a bit of overhead. - The dimension names
"param"
and"coord"
are now hard-coded, so it will not work if the input array has these dimensions. Do you have a better suggestion? The output already has the hard-coded dimension "param", so maybe it's not so bad.
If you think it's worth pursuing this approach, I can add that commit to this branch and we can discuss further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the current version in this PR.
I apologize for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. I think the alternative version could be OK, if there was a more explicit / clearer way of saying "for each element in this list, broadcast them to be the same shape, and then concatenate them along a new temporary axis", other than Dataset(...).to_array()
. But yeah neither option is clearly better.
Co-authored-by: Illviljan <[email protected]>
Co-authored-by: Illviljan <[email protected]>
35ebb2a
to
d081ee6
Compare
Thanks @mgunyho ! |
@@ -4399,7 +4399,7 @@ def exp_decay(t, n0, tau=1): | |||
da = da.chunk({"x": 1}) | |||
|
|||
fit = da.curvefit( | |||
coords=[da.t], func=exp_decay, p0={"n0": 4}, bounds={"tau": [2, 6]} | |||
coords=[da.t], func=exp_decay, p0={"n0": 4}, bounds={"tau": (2, 6)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the list have to be changed to a tuple? If so, that's backwards incompatible and might break someone's code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change just to pass mypy after I updated the type hints, the code will work with a list as well. This is consistent with scipy, which expects a tuple of arrays (or a Bounds
object).
…ydata#7821) * Add test for multidimensional initial guess to curvefit * Pass initial guess with *args * Update curvefit docstrings * Add examples to curvefit * Add test for error on invalid p0 coords * Raise exception on invalid coordinates in initial guess * Add link from polyfit to curvefit * Update doc so it matches CI * Formatting * Add basic test for multidimensional bounds * Add tests for curvefit_helpers with array-valued bounds * First attempt at fixing initialize_curvefit_params, issues warnings * Alternative implementation of bounds initialization using xr.where(), avoids warnings * Pass also bounds as *args to _wrapper * Raise exception on unexpected dimensions in bounds * Update docstring of bounds * Update bounds docstring in dataarray also * Update type hints for curvefit p0 and bounds * Change list to tuple to pass mypy * Update whats-new * Use tuples in error message Co-authored-by: Illviljan <[email protected]> * Add type hints to test Co-authored-by: Illviljan <[email protected]> --------- Co-authored-by: Illviljan <[email protected]>
curvefit
#7768whats-new.rst
With this PR, it's possible to pass an initial guess to
curvefit
that is a DataArray, which will be broadcast to the data dimensions. This way, the initial guess can vary with the data coordinates.I also added examples of using
curvefit
to the documentation, both a basic example and one with the multidimensional guess.I have a couple of questions:
p0: dict[str, float | DataArray] | None
, instead ofdict[str, Any]
(and same for bounds)? scipy only optimizes over scalars, so I think it would be safe to assume that the values should either be those, or arrays that can be broadcast.