diff --git a/docs/refs.rst b/docs/refs.rst index e390a4a86..bf13e6d9a 100644 --- a/docs/refs.rst +++ b/docs/refs.rst @@ -6,7 +6,7 @@ References Please cite the following two papers if you use this software: [FIN] -A parallel non-uniform fast Fourier transform library based on an ``exponential of semicircle'' kernel. +A parallel non-uniform fast Fourier transform library based on an "exponential of semicircle" kernel. A. H. Barnett, J. F. Magland, and L. af Klinteberg. SIAM J. Sci. Comput. 41(5), C479-C504 (2019). `arxiv version `_ diff --git a/docs/trouble.rst b/docs/trouble.rst index 61daa5859..0b11ed764 100644 --- a/docs/trouble.rst +++ b/docs/trouble.rst @@ -40,7 +40,7 @@ If FINUFFT is slow (eg, less than $10^6$ nonuniform points per second), here is - Try printing debug output to see step-by-step progress by FINUFFT. Do this by setting ``opts.debug`` to 1 or 2 then looking at the timing information. -- Try reducing the number of threads, either those available via OpenMP, or via ``opts.nthreads``, perhaps down to 1 thread, to make sure you are not having collisions between threads, or slowdown due to thread overheads. Hyperthreading (more threads than physical cores) rarely helps much. Thread collisions are possible if large problems are run with a large number of (say more than 64) threads. Another ase causing slowness is very many repetitions of small problems; see ``test/manysmallprobs`` which exceeds $10^7$ points/sec with one thread via the guru interface, but can get ridiculously slower with many threads; see https://github.com/flatironinstitute/finufft/issues/86 +- Try reducing the number of threads, either those available via OpenMP, or via ``opts.nthreads``, perhaps down to 1 thread, to make sure you are not having collisions between threads, or slowdown due to thread overheads. Hyperthreading (more threads than physical cores) rarely helps much. Thread collisions are possible if large problems are run with a large number of (say more than 64) threads. Another case causing slowness is very many repetitions of small problems; see ``test/manysmallprobs`` which exceeds $10^7$ points/sec with one thread via the guru interface, but can get ridiculously slower with many threads; see https://github.com/flatironinstitute/finufft/issues/86 - Try setting a crude tolerance, eg ``tol=1e-3``. How many digits do you actually need? This has a big effect in higher dimensions, since the number of flops scales like $(\log 1/\epsilon)^d$, but not quite as big an effect as this scaling would suggest, because in higher dimensions the flops/RAM ratio is higher. diff --git a/finufft-manual.pdf b/finufft-manual.pdf index 80dffd4f6..60d362b59 100644 Binary files a/finufft-manual.pdf and b/finufft-manual.pdf differ diff --git a/matlab/finufft_plan.m b/matlab/finufft_plan.m index 9ad4b14d7..1bc45f1c3 100644 --- a/matlab/finufft_plan.m +++ b/matlab/finufft_plan.m @@ -1,4 +1,152 @@ - +% FINUFFT_PLAN is a class which wraps the guru interface to FINUFFT. +% +% Full documentation is given in ../finufft-manual.pdf and online at +% http://finufft.readthedocs.io +% Also see examples in the matlab/examples and matlab/test directories. +% +% PROPERTIES +% mwptr - opaque pointer to a C++ finufft_plan object (see MWrap manual), +% whose properties cannot be accessed directly +% floatprec - either 'double' or 'single', tracks what precision of C++ +% library is being called +% type, dim, n_modes, n_trans, nj, nk - other plan parameters +% Note: the user should never alter these plan properties directly! Rather, +% the below methods should be used to create, use, and destroy plans. +% +% METHODS +% finufft_plan - create guru plan object for one/many general nonuniform FFTs. +% setpts - process nonuniform points for general FINUFFT transform(s). +% execute - execute single or many-vector FINUFFT transforms in a plan. +% +% General notes: +% * use delete(plan) to remove a plan after use. +% * See ERRHANDLER, VALID_*, and this code for warning/error IDs. +% +% +% +% =========== Detailed description of guru methods ========================== +% +% 1) FINUFFT_PLAN create guru plan object for one/many general nonuniform FFTs. +% +% plan = finufft_plan(type, n_modes_or_dim, isign, ntrans, eps) +% plan = finufft_plan(type, n_modes_or_dim, isign, ntrans, eps, opts) +% +% Creates a finufft_plan MATLAB object in the guru interface to FINUFFT, of +% type 1, 2 or 3, and with given numbers of Fourier modes (unless type 3). +% +% Inputs: +% type transform type: 1, 2, or 3 +% n_modes_or_dim if type is 1 or 2, the number of Fourier modes in each +% dimension: [ms] in 1D, [ms mt] in 2D, or [ms mt mu] in 3D. +% Its length sets the dimension, which must be 1, 2 or 3. +% If type is 3, in contrast, its *value* fixes the dimension +% isign if >=0, uses + sign in exponential, otherwise - sign. +% eps relative precision requested (generally between 1e-15 and 1e-1) +% opts optional struct with optional fields controlling the following: +% opts.debug: 0 (silent, default), 1 (timing breakdown), 2 (debug info). +% opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots) +% opts.spread_sort: 0 (don't sort NU pts), 1 (do), 2 (auto, default) +% opts.spread_kerevalmeth: 0: exp(sqrt()), 1: Horner ppval (faster) +% opts.spread_kerpad: (iff kerevalmeth=0) 0: don't pad to mult of 4, 1: do +% opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc +% opts.upsampfac: sigma. 2.0 (default), or 1.25 (low RAM, smaller FFT) +% opts.spread_thread: for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc +% opts.maxbatchsize: for ntrans>1 only. max blocking size, or 0 for auto. +% opts.nthreads: number of threads, or 0: use all available (default) +% opts.floatprec: library precision to use, 'double' (default) or 'single'. +% for type 1 and 2 only, the following opts fields are also relevant: +% opts.modeord: 0 (CMCL increasing mode ordering, default), 1 (FFT ordering) +% opts.chkbnds: 0 (don't check NU points valid), 1 (do, default) +% Outputs: +% plan finufft_plan object (opaque pointer) +% +% Notes: +% * For type 1 and 2, this does the FFTW planning and kernel-FT precomputation. +% * For type 3, this does very little, since the FFT sizes are not yet known. +% * Be default all threads are planned; control how many with opts.nthreads. +% * The vectorized (many vector) plan, ie ntrans>1, can be much faster +% than repeated calls with the same nonuniform points. Note that here the I/O +% data ordering is stacked rather than interleaved. See ../docs/matlab.rst +% * For more details about the opts fields, see ../docs/opts.rst +% +% +% 2) SETPTS process nonuniform points for general FINUFFT transform(s). +% +% plan.setpts(xj) +% plan.setpts(xj, yj) +% plan.setpts(xj, yj, zj) +% plan.setpts(xj, [], [], s) +% plan.setpts(xj, yj, [], s, t) +% plan.setpts(xj, yj, zj, s, t, u) +% +% When plan is a finufft_plan MATLAB object, brings in nonuniform point +% coordinates (xj,yj,zj), and additionally in the type 3 case, nonuniform +% frequency target points (s,t,u). Empty arrays may be passed in the case of +% unused dimensions. For all types, sorting is done to internally store a +% reindexing of points, and for type 3 the spreading and FFTs are planned. +% The nonuniform points may be used for multiple transforms. +% +% Inputs: +% xj vector of x-coords of all nonuniform points +% yj empty (if dim<2), or vector of y-coords of all nonuniform points +% zj empty (if dim<3), or vector of z-coords of all nonuniform points +% s vector of x-coords of all nonuniform frequency targets +% t empty (if dim<2), or vector of y-coords of all frequency targets +% u empty (if dim<3), or vector of z-coords of all frequency targets +% Input/Outputs: +% plan finufft_plan object +% +% Notes: +% * For type 1 and 2, the values in xj (and if nonempty, yj and zj) must +% lie in the interval [-3pi,3pi). For type 1 they are "sources", but for +% type 2, "targets". In contrast, for type 3 there are no restrictions other +% than the resulting size of the internal fine grids. +% * s (and t and u) are only relevant for type 3, and may be omitted otherwise +% * The matlab vectors xj,... and s,... should not be changed before calling +% future execute calls, because the plan stores only pointers to the +% arrays (they are not duplicated internally). +% * The precision (double/single) of all inputs must match that chosen at the +% plan stage using opts.floatprec, otherwise an error is raised. +% +% +% 3) EXECUTE execute single or many-vector FINUFFT transforms in a plan. +% +% result = plan.execute(data_in); +% +% For plan a previously created finufft_plan object also containing all +% needed nonuniform point coordinates, do a single (or if ntrans>1 in the +% plan stage, multiple) NUFFT transform(s), with the strengths or Fourier +% coefficient inputs vector(s) from data_in. The result of the transform(s) +% is returned as a (possibly multidimensional) array. +% +% Inputs: +% plan finufft_plan object +% data_in strengths (types 1 or 3) or Fourier coefficients (type 2) +% vector, matrix, or array of appropriate size. For type 1 and 3, +% this is either a length-M vector (where M is the length of xj), +% or an (M,ntrans) matrix when ntrans>1. For type 2, in 1D this is +% length-ms, in 2D size (ms,mt), or in 3D size (ms,mt,mu), or +% each of these with an extra last dimension ntrans if ntrans>1. +% Outputs: +% result vector of output strengths at targets (types 2 or 3), or array +% of Fourier coefficients (type 1), or, if ntrans>1, a stack of +% such vectors or arrays, of appropriate size. +% Specifically, if ntrans=1, for type 1, in 1D +% this is a length-ms column vector, in 2D a matrix of size +% (ms,mt), or in 3D an array of size (ms,mt,mu); for types 2 and 3 +% it is a column vector of length M (the length of xj in type 2), +% or nk (the length of s in type 3). If ntrans>1 its is a stack +% of such objects, ie, it has an extra last dimension ntrans. +% +% Notes: +% * The precision (double/single) of all inputs must match that chosen at the +% plan stage using opts.floatprec, otherwise an error is raised. +% +% +% 4) To deallocate (delete) a nonuniform FFT plan, use delete(plan) +% +% This deallocates all stored FFTW plans, nonuniform point sorting arrays, +% kernel Fourier transforms arrays, etc. classdef finufft_plan < handle properties