Implement multidimensional initial guess and bounds for `curvefit` #7821

mgunyho · 2023-05-06T13:09:49Z

Closes Supplying multidimensional initial guess to curvefit #7768
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

With this PR, it's possible to pass an initial guess to curvefit that is a DataArray, which will be broadcast to the data dimensions. This way, the initial guess can vary with the data coordinates.

I also added examples of using curvefit to the documentation, both a basic example and one with the multidimensional guess.

I have a couple of questions:

Should we change the signature to p0: dict[str, float | DataArray] | None, instead of dict[str, Any] (and same for bounds)? scipy only optimizes over scalars, so I think it would be safe to assume that the values should either be those, or arrays that can be broadcast.
The usage example of curvefit is only in the docstring for DataArray, so now the docs differ between DA and dataset. But the example uses a DataArray only, so this should be ok, right?

mgunyho · 2023-05-06T13:25:41Z

Hm the doctest failed because the result is off in the last decimal place. I can't reproduce it, even though I have the same versions of numpy 1.23.5 and scipy 1.10.1 in my env as what the CI says. Anyway, changed it in 3001eaf.

mgunyho · 2023-05-06T14:56:05Z

I just noticed that the docs for curvefit have some formatting issues, I think it's using single backticks instead of double backticks for code formatting. Should I add those to this PR as well?

slevang · 2023-05-06T15:12:57Z

This looks pretty good to me on first glance! I would vote to do p0 and bounds in one PR. It could surprise a user if one of these can take an array and the other only scalars.

mgunyho · 2023-05-07T09:15:37Z

xarray/core/dataset.py

@@ -8696,13 +8735,21 @@ def _wrapper(Y, *coords_, **kwargs):
            else:
                name = f"{str(name)}_"

+            input_core_dims = [reduce_dims_ for _ in range(n_coords + 1)]


I factored this out of the function call because Black was suggesting something like

apply_ufunc( ..., input_core_dims=[reduce_dims_ for _ in range(n_coords + 1)] + [ [] for _ in range(3 * n_params) ], # core_dims for p0 and bounds ... )

which I found quite ugly.

mgunyho · 2023-05-07T09:20:11Z

xarray/core/dataset.py

-                (lb + 1) * int(lb_finite & ~ub_finite),
-                (ub - 1) * int(~lb_finite & ub_finite),
-            ]
+        p0 = where(


An alternative I considered here (see cd685ba) is

p0 = np.nansum( [ 0.5 * (lb + ub) * (lb_finite & ub_finite), (lb + 1) * (lb_finite & ~ub_finite), (ub - 1) * (~lb_finite & ub_finite), ], axis=0, )

which is closer to the original, but it gave warnings about the multiplication (the original version also issues warnings if you remove the int() cast). Not sure which is clearer.

mgunyho · 2023-05-07T09:23:16Z

I implemented the broadcasting for bounds also. I hope it's not too ugly. Do you think the signature for p0 and bounds should be updated to explicitly allow only (tuples of) floats or DataArrays?

slevang

Agree that the nested where is a bit ugly and hard to wrap my head around, but it works and I don't immediately have a better suggestion. Tests look good. I think it's safe to narrow the arg typing as well.

mgunyho · 2023-05-16T07:35:19Z

I updated the type hints now (and also did a rebase just in case).

… avoids warnings

xarray/tests/test_dataarray.py

Illviljan · 2023-05-26T04:54:20Z

xarray/core/dataset.py


-        def _wrapper(Y, *coords_, **kwargs):
+        def _wrapper(Y, *args, **kwargs):


Suggested change

def _wrapper(Y, *args, **kwargs):

def _wrapper(Y, coords: list[?], p0: list[?], lb: list[?], ub: list[?], **kwargs):

The argument handling below is hard to read and looks prone to error, is a more explicit version possible?
Might have to remove the * in apply_ufunc as well.

I agree, but I'm not sure it's possible. The signature is apply_ufunc(func, *args, ...), which only broadcasts each element in *args, so it doesn't work if the element itself is a list, like in the case of the initial guess. If a, b and c are float | DataArray, then we can only broadcast them like f(a, b, c), and not like f([a, b, c]).

Let me know if you think the code could be made easier to follow somehow.

Okay, I found a way to do it: if we convert the outer list to an additional dimension for coords_ and for each of the params & bounds, and add that additional dimension to input_core_dims, then we can pass array-valued arguments to _wrapper with apply_ufunc. I'm not sure if it's more readable though.

You can see the implementation here (see here for the diff compared to this version, and here for compared to main (note: you have to click to the Files tab, I wasn't able to link straight to the line in the diff)).

It still needs some work:

I now use Dataset(...).to_array(dim_name) to do the broadcasting + adding a new dimension, but is this a good way to do it? Another way I could think of was something like concat([coord.expand_dims(new_dim=[i]) for i, coord in enumerate(coords_)], "new_dim"). Both options are not very easy to understand, and probably have quite a bit of overhead.

The dimension names "param" and "coord" are now hard-coded, so it will not work if the input array has these dimensions. Do you have a better suggestion? The output already has the hard-coded dimension "param", so maybe it's not so bad.

If you think it's worth pursuing this approach, I can add that commit to this branch and we can discuss further.

I think I prefer the current version in this PR.

I apologize for the noise.

No worries. I think the alternative version could be OK, if there was a more explicit / clearer way of saying "for each element in this list, broadcast them to be the same shape, and then concatenate them along a new temporary axis", other than Dataset(...).to_array(). But yeah neither option is clearly better.

xarray/core/dataset.py

Co-authored-by: Illviljan <[email protected]>

Illviljan · 2023-05-31T12:43:30Z

Thanks @mgunyho !

dcherian · 2023-06-01T15:45:38Z

xarray/tests/test_dataarray.py

@@ -4399,7 +4399,7 @@ def exp_decay(t, n0, tau=1):
            da = da.chunk({"x": 1})

        fit = da.curvefit(
-            coords=[da.t], func=exp_decay, p0={"n0": 4}, bounds={"tau": [2, 6]}
+            coords=[da.t], func=exp_decay, p0={"n0": 4}, bounds={"tau": (2, 6)}


Did the list have to be changed to a tuple? If so, that's backwards incompatible and might break someone's code.

I made this change just to pass mypy after I updated the type hints, the code will work with a list as well. This is consistent with scipy, which expects a tuple of arrays (or a Bounds object).

…ydata#7821) * Add test for multidimensional initial guess to curvefit * Pass initial guess with *args * Update curvefit docstrings * Add examples to curvefit * Add test for error on invalid p0 coords * Raise exception on invalid coordinates in initial guess * Add link from polyfit to curvefit * Update doc so it matches CI * Formatting * Add basic test for multidimensional bounds * Add tests for curvefit_helpers with array-valued bounds * First attempt at fixing initialize_curvefit_params, issues warnings * Alternative implementation of bounds initialization using xr.where(), avoids warnings * Pass also bounds as *args to _wrapper * Raise exception on unexpected dimensions in bounds * Update docstring of bounds * Update bounds docstring in dataarray also * Update type hints for curvefit p0 and bounds * Change list to tuple to pass mypy * Update whats-new * Use tuples in error message Co-authored-by: Illviljan <[email protected]> * Add type hints to test Co-authored-by: Illviljan <[email protected]> --------- Co-authored-by: Illviljan <[email protected]>

mgunyho mentioned this pull request May 6, 2023

Supplying multidimensional initial guess to curvefit #7768

Closed

mgunyho changed the title ~~Implement multidimensional initial guess for curvefit~~ Implement multidimensional initial guess and bounds for curvefit May 7, 2023

mgunyho commented May 7, 2023

View reviewed changes

slevang reviewed May 15, 2023

View reviewed changes

mgunyho force-pushed the curvefit-multidimensional-guess branch 2 times, most recently from 295574e to 6e4aef7 Compare May 16, 2023 07:34

mgunyho force-pushed the curvefit-multidimensional-guess branch from 48a5b7e to 48cea82 Compare May 16, 2023 08:23

mgunyho added 17 commits May 20, 2023 10:51

Add test for multidimensional initial guess to curvefit

c384d57

Pass initial guess with *args

2d0ea8f

Update curvefit docstrings

04fc4dc

Add examples to curvefit

094759a

Add test for error on invalid p0 coords

e02b520

Raise exception on invalid coordinates in initial guess

9bea76c

Add link from polyfit to curvefit

959ccfd

Update doc so it matches CI

a0e6659

Formatting

e88d1f2

Add basic test for multidimensional bounds

fe30d91

Add tests for curvefit_helpers with array-valued bounds

78bf56f

First attempt at fixing initialize_curvefit_params, issues warnings

045eee7

Alternative implementation of bounds initialization using xr.where(),…

6b315ef

… avoids warnings

Pass also bounds as *args to _wrapper

a468ab0

Raise exception on unexpected dimensions in bounds

d28241e

Update docstring of bounds

bc66406

Update bounds docstring in dataarray also

c863844

mgunyho added 3 commits May 20, 2023 10:55

Update type hints for curvefit p0 and bounds

d056da6

Change list to tuple to pass mypy

8181f2a

Update whats-new

ad3147a

mgunyho force-pushed the curvefit-multidimensional-guess branch from 48cea82 to ad3147a Compare May 20, 2023 08:02

Illviljan reviewed May 26, 2023

View reviewed changes

mgunyho and others added 2 commits May 27, 2023 09:57

Use tuples in error message

b1e7835

Co-authored-by: Illviljan <[email protected]>

Add type hints to test

d081ee6

Co-authored-by: Illviljan <[email protected]>

mgunyho force-pushed the curvefit-multidimensional-guess branch from 35ebb2a to d081ee6 Compare May 27, 2023 08:01

Illviljan approved these changes May 27, 2023

View reviewed changes

Illviljan added the plan to merge Final call for comments label May 27, 2023

Illviljan merged commit 9909f90 into pydata:main May 31, 2023

mgunyho deleted the curvefit-multidimensional-guess branch May 31, 2023 13:31

dcherian reviewed Jun 1, 2023

View reviewed changes

This was referenced Jun 4, 2023

Add errors option to curvefit #7891

Merged

Fix flaky doctest for curvefit #7893

Merged

mgunyho mentioned this pull request Aug 19, 2023

Consistently report all dimensions in error messages if invalid dimensions are given #8079

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement multidimensional initial guess and bounds for `curvefit` #7821

Implement multidimensional initial guess and bounds for `curvefit` #7821

mgunyho commented May 6, 2023 •

edited

Loading

mgunyho commented May 6, 2023 •

edited

Loading

mgunyho commented May 6, 2023

slevang commented May 6, 2023

mgunyho May 7, 2023

mgunyho May 7, 2023 •

edited

Loading

mgunyho commented May 7, 2023

slevang left a comment

mgunyho commented May 16, 2023 •

edited

Loading

Illviljan May 26, 2023

mgunyho May 27, 2023

mgunyho May 27, 2023 •

edited

Loading

Illviljan May 27, 2023

mgunyho May 27, 2023

Illviljan commented May 31, 2023

dcherian Jun 1, 2023

mgunyho Jun 1, 2023 •

edited

Loading


		def _wrapper(Y, coords_, *kwargs):
		def _wrapper(Y, args, *kwargs):

	def _wrapper(Y, args, *kwargs):
	def _wrapper(Y, coords: list[?], p0: list[?], lb: list[?], ub: list[?], **kwargs):

Implement multidimensional initial guess and bounds for curvefit #7821

Implement multidimensional initial guess and bounds for curvefit #7821

Conversation

mgunyho commented May 6, 2023 • edited Loading

mgunyho commented May 6, 2023 • edited Loading

mgunyho commented May 6, 2023

slevang commented May 6, 2023

mgunyho May 7, 2023

Choose a reason for hiding this comment

mgunyho May 7, 2023 • edited Loading

Choose a reason for hiding this comment

mgunyho commented May 7, 2023

slevang left a comment

Choose a reason for hiding this comment

mgunyho commented May 16, 2023 • edited Loading

Illviljan May 26, 2023

Choose a reason for hiding this comment

mgunyho May 27, 2023

Choose a reason for hiding this comment

mgunyho May 27, 2023 • edited Loading

Choose a reason for hiding this comment

Illviljan May 27, 2023

Choose a reason for hiding this comment

mgunyho May 27, 2023

Choose a reason for hiding this comment

Illviljan commented May 31, 2023

dcherian Jun 1, 2023

Choose a reason for hiding this comment

mgunyho Jun 1, 2023 • edited Loading

Choose a reason for hiding this comment

Implement multidimensional initial guess and bounds for `curvefit` #7821

Implement multidimensional initial guess and bounds for `curvefit` #7821

mgunyho commented May 6, 2023 •

edited

Loading

mgunyho commented May 6, 2023 •

edited

Loading

mgunyho May 7, 2023 •

edited

Loading

mgunyho commented May 16, 2023 •

edited

Loading

mgunyho May 27, 2023 •

edited

Loading

mgunyho Jun 1, 2023 •

edited

Loading