Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multidimensional initial guess and bounds for curvefit #7821

Merged
merged 22 commits into from
May 31, 2023

Conversation

mgunyho
Copy link
Contributor

@mgunyho mgunyho commented May 6, 2023

With this PR, it's possible to pass an initial guess to curvefit that is a DataArray, which will be broadcast to the data dimensions. This way, the initial guess can vary with the data coordinates.

I also added examples of using curvefit to the documentation, both a basic example and one with the multidimensional guess.

I have a couple of questions:

  • Should we change the signature to p0: dict[str, float | DataArray] | None, instead of dict[str, Any] (and same for bounds)? scipy only optimizes over scalars, so I think it would be safe to assume that the values should either be those, or arrays that can be broadcast.
  • The usage example of curvefit is only in the docstring for DataArray, so now the docs differ between DA and dataset. But the example uses a DataArray only, so this should be ok, right?

@mgunyho
Copy link
Contributor Author

mgunyho commented May 6, 2023

Hm the doctest failed because the result is off in the last decimal place. I can't reproduce it, even though I have the same versions of numpy 1.23.5 and scipy 1.10.1 in my env as what the CI says. Anyway, changed it in 3001eaf.

@mgunyho
Copy link
Contributor Author

mgunyho commented May 6, 2023

I just noticed that the docs for curvefit have some formatting issues, I think it's using single backticks instead of double backticks for code formatting. Should I add those to this PR as well?

@slevang
Copy link
Contributor

slevang commented May 6, 2023

This looks pretty good to me on first glance! I would vote to do p0 and bounds in one PR. It could surprise a user if one of these can take an array and the other only scalars.

@mgunyho mgunyho changed the title Implement multidimensional initial guess for curvefit Implement multidimensional initial guess and bounds for curvefit May 7, 2023
@@ -8696,13 +8735,21 @@ def _wrapper(Y, *coords_, **kwargs):
else:
name = f"{str(name)}_"

input_core_dims = [reduce_dims_ for _ in range(n_coords + 1)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I factored this out of the function call because Black was suggesting something like

apply_ufunc(
    ...,
    input_core_dims=[reduce_dims_ for _ in range(n_coords + 1)]
    + [
        [] for _ in range(3 * n_params)
    ],  # core_dims for p0 and bounds
    ...
)

which I found quite ugly.

(lb + 1) * int(lb_finite & ~ub_finite),
(ub - 1) * int(~lb_finite & ub_finite),
]
p0 = where(
Copy link
Contributor Author

@mgunyho mgunyho May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative I considered here (see cd685ba) is

 p0 = np.nansum(
	 [
		 0.5 * (lb + ub) * (lb_finite & ub_finite),
		 (lb + 1) * (lb_finite & ~ub_finite),
		 (ub - 1) * (~lb_finite & ub_finite),
	 ],
	 axis=0,
 )

which is closer to the original, but it gave warnings about the multiplication (the original version also issues warnings if you remove the int() cast). Not sure which is clearer.

@mgunyho
Copy link
Contributor Author

mgunyho commented May 7, 2023

I implemented the broadcasting for bounds also. I hope it's not too ugly. Do you think the signature for p0 and bounds should be updated to explicitly allow only (tuples of) floats or DataArrays?

Copy link
Contributor

@slevang slevang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that the nested where is a bit ugly and hard to wrap my head around, but it works and I don't immediately have a better suggestion. Tests look good. I think it's safe to narrow the arg typing as well.

@mgunyho mgunyho force-pushed the curvefit-multidimensional-guess branch 2 times, most recently from 295574e to 6e4aef7 Compare May 16, 2023 07:34
@mgunyho
Copy link
Contributor Author

mgunyho commented May 16, 2023

I updated the type hints now (and also did a rebase just in case).

@mgunyho mgunyho force-pushed the curvefit-multidimensional-guess branch from 48a5b7e to 48cea82 Compare May 16, 2023 08:23
@mgunyho mgunyho force-pushed the curvefit-multidimensional-guess branch from 48cea82 to ad3147a Compare May 20, 2023 08:02
xarray/tests/test_dataarray.py Outdated Show resolved Hide resolved
xarray/tests/test_dataarray.py Outdated Show resolved Hide resolved

def _wrapper(Y, *coords_, **kwargs):
def _wrapper(Y, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _wrapper(Y, *args, **kwargs):
def _wrapper(Y, coords: list[?], p0: list[?], lb: list[?], ub: list[?], **kwargs):

The argument handling below is hard to read and looks prone to error, is a more explicit version possible?
Might have to remove the * in apply_ufunc as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I'm not sure it's possible. The signature is apply_ufunc(func, *args, ...), which only broadcasts each element in *args, so it doesn't work if the element itself is a list, like in the case of the initial guess. If a, b and c are float | DataArray, then we can only broadcast them like f(a, b, c), and not like f([a, b, c]).

Let me know if you think the code could be made easier to follow somehow.

Copy link
Contributor Author

@mgunyho mgunyho May 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I found a way to do it: if we convert the outer list to an additional dimension for coords_ and for each of the params & bounds, and add that additional dimension to input_core_dims, then we can pass array-valued arguments to _wrapper with apply_ufunc. I'm not sure if it's more readable though.

You can see the implementation here (see here for the diff compared to this version, and here for compared to main (note: you have to click to the Files tab, I wasn't able to link straight to the line in the diff)).

It still needs some work:

  • I now use Dataset(...).to_array(dim_name) to do the broadcasting + adding a new dimension, but is this a good way to do it? Another way I could think of was something like concat([coord.expand_dims(new_dim=[i]) for i, coord in enumerate(coords_)], "new_dim"). Both options are not very easy to understand, and probably have quite a bit of overhead.
  • The dimension names "param" and "coord" are now hard-coded, so it will not work if the input array has these dimensions. Do you have a better suggestion? The output already has the hard-coded dimension "param", so maybe it's not so bad.

If you think it's worth pursuing this approach, I can add that commit to this branch and we can discuss further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the current version in this PR.

I apologize for the noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I think the alternative version could be OK, if there was a more explicit / clearer way of saying "for each element in this list, broadcast them to be the same shape, and then concatenate them along a new temporary axis", other than Dataset(...).to_array(). But yeah neither option is clearly better.

xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
mgunyho and others added 2 commits May 27, 2023 09:57
@mgunyho mgunyho force-pushed the curvefit-multidimensional-guess branch from 35ebb2a to d081ee6 Compare May 27, 2023 08:01
@Illviljan Illviljan added the plan to merge Final call for comments label May 27, 2023
@Illviljan Illviljan merged commit 9909f90 into pydata:main May 31, 2023
@Illviljan
Copy link
Contributor

Thanks @mgunyho !

@mgunyho mgunyho deleted the curvefit-multidimensional-guess branch May 31, 2023 13:31
@@ -4399,7 +4399,7 @@ def exp_decay(t, n0, tau=1):
da = da.chunk({"x": 1})

fit = da.curvefit(
coords=[da.t], func=exp_decay, p0={"n0": 4}, bounds={"tau": [2, 6]}
coords=[da.t], func=exp_decay, p0={"n0": 4}, bounds={"tau": (2, 6)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the list have to be changed to a tuple? If so, that's backwards incompatible and might break someone's code.

Copy link
Contributor Author

@mgunyho mgunyho Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change just to pass mypy after I updated the type hints, the code will work with a list as well. This is consistent with scipy, which expects a tuple of arrays (or a Bounds object).

dstansby pushed a commit to dstansby/xarray that referenced this pull request Jun 28, 2023
…ydata#7821)

* Add test for multidimensional initial guess to curvefit

* Pass initial guess with *args

* Update curvefit docstrings

* Add examples to curvefit

* Add test for error on invalid p0 coords

* Raise exception on invalid coordinates in initial guess

* Add link from polyfit to curvefit

* Update doc so it matches CI

* Formatting

* Add basic test for multidimensional bounds

* Add tests for curvefit_helpers with array-valued bounds

* First attempt at fixing initialize_curvefit_params, issues warnings

* Alternative implementation of bounds initialization using xr.where(), avoids warnings

* Pass also bounds as *args to _wrapper

* Raise exception on unexpected dimensions in bounds

* Update docstring of bounds

* Update bounds docstring in dataarray also

* Update type hints for curvefit p0 and bounds

* Change list to tuple to pass mypy

* Update whats-new

* Use tuples in error message

Co-authored-by: Illviljan <[email protected]>

* Add type hints to test

Co-authored-by: Illviljan <[email protected]>

---------

Co-authored-by: Illviljan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supplying multidimensional initial guess to curvefit
4 participants