Add `DataIntegralProblem` #491

IlianPihlajamaa · 2023-09-11T15:19:31Z

codecov · 2023-09-11T21:02:30Z

Codecov Report

Merging #491 (1e960d1) into master (03cdaed) will decrease coverage by 23.98%.
Report is 7 commits behind head on master.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           master     #491       +/-   ##
===========================================
- Coverage   57.27%   33.30%   -23.98%     
===========================================
  Files          50       50               
  Lines        3703     3681       -22     
===========================================
- Hits         2121     1226      -895     
- Misses       1582     2455      +873

Files Changed	Coverage Δ
src/SciMLBase.jl	`68.42% <ø> (-3.01%)`	⬇️
src/problems/basic_problems.jl	`87.71% <100.00%> (-0.75%)`	⬇️

... and 32 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…lem`

IlianPihlajamaa · 2023-09-13T07:49:18Z

Also added 2 constructors:

function IntegralProblem(f, x::AbstractVector{<:Number}, args...; kwargs...)
    IntegralProblem{isinplace(f, 3)}(f, x, args...; kwargs...)
end
function IntegralProblem(y::AbstractArray, x::AbstractVector{<:Number}, args...; dim::Int=1, kwargs...)
    IntegralProblem{false}(y, x, args...; dim=Val(dim), kwargs...)
end

The first for integrating a callable f over some set of sample points x, and the second for integrating an Array y over some dimension dim. I chose to put x second in both of these (contrary to what is usual in many packages/languages), because this way it is consistent with the IntegralProblem(f, lb, ub) ordering. Let me know if you agree.

IlianPihlajamaa · 2023-09-14T12:53:45Z

Oops, there was a small mistake, test SciMLBase now passes locally, sorry!

IlianPihlajamaa · 2023-09-15T09:34:18Z

There are still method ambiguities, but I don't know how to get rid of them...

lxvm · 2023-09-16T14:58:19Z

If I may comment on this pr, I think adding a separate DataIntegralProblem would be preferable to modifying the existing IntegralProblem. I see two reasons for this:

These are distinct mathematical problems since an IntegralProblem is continuous, i.e. we are given a function f and a domain [lb, ub] and we evaluate f wherever we please for any quadrature, and a DataIntegralProblem is discrete, i.e. the values of x and f.(x) are the inputs. The difference is that in the first case, choosing the integration points and weights together carefully can produce high-order accurate quadratures that are much more accurate than those available for unstructured grids (e.g. trapezoidal rule). This is due to mathematical connections between integration and interpolation.
The algorithms currently used to solve IntegralProblems only accept functions with domains, not data, so none of the existing implementation of IntegralProblems could be reused for DataIntegralProblems. The current design of this pr, which extends IntegralProblems with more options could lead to more confusion, errors, and complicated autodiff implementation.

I hope this is useful feedback and I am happy to see a pr like this because I think this feature would be very useful.

IlianPihlajamaa · 2023-09-16T16:57:11Z

Feedback is always welcome of course.
Personally, i am impartial to whether or not it should be separated in the API. Perhaps others could weigh in? (@ChrisRackauckas ) From an implementation point of view, separate structs are definitely easier.

ChrisRackauckas · 2023-09-16T18:47:22Z

Separate it for the reasons @lxvm mentioned. When I last reviewed this PR it was separated to have a separate IntegralDataProblem for those reasons. What happened and why was it changed?

Indeed overlapping them doesn't make sense because their dispatches need to be separate, along with most of their data, so it's not quite clear what would be gained by keeping them together. The AD passes indeed need to be very different too, so I do not understand why it would be part of IntegralProblem. But again, I thought they were already separate in 07e5698 ?

ChrisRackauckas

Should split to a separate IntegralDataProblem like originally discussed.

IlianPihlajamaa · 2023-09-16T19:00:38Z

The reason I merged them into the same struct was because, for the case

solve(IntegralProblem(f::Function, x::AbstractVector), Trapezoidal())

There needed to be an x-field in IntegralProblem anyway. And so I thought that it therefore wasnt necessary to split the two. I hadnt considered the AD angle.

In hindsight

In the case of integrating a function, the specification of the precise grid (as opposed to a given number of points is probably never useful. So it isn't necessary to put this extra field in.
For integrating data it is probably better abyway to have a separate api, as argued above.

So I will revert the changes (apart from the added tests) when I have time, reintroducing the DataIntegralProblem.

ChrisRackauckas · 2023-09-16T19:02:54Z

Thanks for the explanation! I'll wait for the revert to re-review.

IlianPihlajamaa · 2023-09-16T20:02:12Z

Do you prefer IntegralDataProblem or DataIntegralProblem? The last one sounds slightly better to me, but I'm not a native English speaker.

Anyway, it should be ready for review now.

lxvm · 2023-09-16T20:12:11Z

Perhaps SampledIntegralProblem? Both https://docs.scipy.org/doc/scipy/tutorial/integrate.html#integrating-using-samples and https://github.com/dextorious/NumericalIntegration.jl use the word 'sampling' so it might be more familiar to newcomers, and 'sampled' implies that the integrand has already been evaluated

IlianPihlajamaa · 2023-09-16T20:40:40Z

SampledIntegralProblem

I like it.

ChrisRackauckas · 2023-09-16T20:44:14Z

👍 on SampledIntegralProblem

ChrisRackauckas · 2023-09-17T05:41:04Z

src/problems/basic_problems.jl

+  It is assumed that the values of `y` along dimension `dim` 
+  correspond to the integrand evaluated at sampling points `x`
+- x: Sampling points, must be a subtype of `AbstractVector`.   
+- dim: Dimension along which to integrate.


Is there a reason to leave it up to the user? I think that just adds more work in the implementation without a tangible benefit. dimension should always be the last one as other wise you'll have a performance hit from column major, so we should just stick to that.

I disagree, with both (1) that the last dim is always faster, and (2) even if it were that we should force all users to use that.

Here is a simple example:

function trapz(y, dim) solve(SampledIntegralProblem(y, axes(y,dim); dim=dim), TrapezoidalRule()) end y = rand(5, 100000) julia> @btime trapz(trapz($y,2), 1) 1.165 ms (8 allocations: 400 bytes) u: 200010.59503616963 julia> @btime trapz(trapz($y,1), 1) 395.000 μs (9 allocations: 781.59 KiB) u: 200010.59503616992

while there are better ways to do a two-dimensional trapezoidal rule of course.

Doing numeric integration over some array isn't always the performance bottleneck of the algorithm. Typically, generating the data in the first place (by some experiment perhaps) takes more time. It would be annoying to force users to stick to a certain data layout to be able to do integration over it if it is just a small part of their workflow. This is also why other packages implement this feature, see SciPy or Trapz.jl for example.

Having said that, you are of course right that the last dimension is typically faster and should be the default, so I'll change that.

ChrisRackauckas · 2023-09-17T13:53:29Z

I'm a bit weary on the dim part, but let's see how this goes.

IlianPihlajamaa added 3 commits September 11, 2023 17:18

add IntegralDataProblem

07e5698

fix mistake

181847d

another small mistake

2fcfbb5

ChrisRackauckas approved these changes Sep 11, 2023

View reviewed changes

IlianPihlajamaa added 2 commits September 13, 2023 09:44

removed IntegralDataProblem, added x and dim field to `IntegralProb…

dfd917a

…lem`

Stop exporting IntegralDataProblem

e60e921

IlianPihlajamaa changed the title ~~add IntegralDataProblem~~ add x and dim field to IntegralProblem Sep 13, 2023

IlianPihlajamaa added 2 commits September 14, 2023 14:18

fix typo in IntegralProblem

a92e7f7

removed one unnecessary inner constructor

af4cfb8

IlianPihlajamaa added 4 commits September 15, 2023 09:43

remove method ambiguity

caedc7c

add problem building tests

7cee282

ran the format_document.jl thing in VScode

6b21a0d

Update basic_problems.jl

09561b6

lxvm mentioned this pull request Sep 16, 2023

add Trapezoidal rule SciML/Integrals.jl#173

Merged

4 tasks

ChrisRackauckas requested changes Sep 16, 2023

View reviewed changes

IlianPihlajamaa added 2 commits September 16, 2023 21:52

revert to DataIntegralProblem

cdc3663

add additional assert

d3619db

IlianPihlajamaa changed the title ~~add x and dim field to IntegralProblem~~ Add DataIntegralProblem Sep 16, 2023

rename DataIntegralProblem to SampledIntegralProblem

124d3be

IlianPihlajamaa requested a review from ChrisRackauckas September 16, 2023 20:45

ChrisRackauckas reviewed Sep 17, 2023

View reviewed changes

change default dimension to ndims(y)

1e960d1

ChrisRackauckas merged commit 690f6f0 into SciML:master Sep 17, 2023
65 of 71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `DataIntegralProblem` #491

Add `DataIntegralProblem` #491

IlianPihlajamaa commented Sep 11, 2023

codecov bot commented Sep 11, 2023 •

edited

Loading

IlianPihlajamaa commented Sep 13, 2023

IlianPihlajamaa commented Sep 14, 2023

IlianPihlajamaa commented Sep 15, 2023

lxvm commented Sep 16, 2023 •

edited

Loading

IlianPihlajamaa commented Sep 16, 2023

ChrisRackauckas commented Sep 16, 2023

ChrisRackauckas left a comment

IlianPihlajamaa commented Sep 16, 2023

ChrisRackauckas commented Sep 16, 2023

IlianPihlajamaa commented Sep 16, 2023

lxvm commented Sep 16, 2023

IlianPihlajamaa commented Sep 16, 2023

ChrisRackauckas commented Sep 16, 2023

ChrisRackauckas Sep 17, 2023

IlianPihlajamaa Sep 17, 2023 •

edited

Loading

ChrisRackauckas commented Sep 17, 2023

Add DataIntegralProblem #491

Add DataIntegralProblem #491

Conversation

IlianPihlajamaa commented Sep 11, 2023

codecov bot commented Sep 11, 2023 • edited Loading

Codecov Report

IlianPihlajamaa commented Sep 13, 2023

IlianPihlajamaa commented Sep 14, 2023

IlianPihlajamaa commented Sep 15, 2023

lxvm commented Sep 16, 2023 • edited Loading

IlianPihlajamaa commented Sep 16, 2023

ChrisRackauckas commented Sep 16, 2023

ChrisRackauckas left a comment

Choose a reason for hiding this comment

IlianPihlajamaa commented Sep 16, 2023

ChrisRackauckas commented Sep 16, 2023

IlianPihlajamaa commented Sep 16, 2023

lxvm commented Sep 16, 2023

IlianPihlajamaa commented Sep 16, 2023

ChrisRackauckas commented Sep 16, 2023

ChrisRackauckas Sep 17, 2023

Choose a reason for hiding this comment

IlianPihlajamaa Sep 17, 2023 • edited Loading

Choose a reason for hiding this comment

ChrisRackauckas commented Sep 17, 2023

Add `DataIntegralProblem` #491

Add `DataIntegralProblem` #491

codecov bot commented Sep 11, 2023 •

edited

Loading

lxvm commented Sep 16, 2023 •

edited

Loading

IlianPihlajamaa Sep 17, 2023 •

edited

Loading