Merge branch 'flatironinstitute:master' into cpython

flatironinstitute · Apr 17, 2024 · 55889d4 · 55889d4
2 parents 9d9d3bc + 9744d14
commit 55889d4
Show file tree

Hide file tree

Showing 14 changed files with 119 additions and 66 deletions.
diff --git a/.github/workflows/python_wheel.yml b/.github/workflows/python_wheel.yml
@@ -131,7 +131,7 @@ jobs:
         /Library/Frameworks/Python.framework/Versions/3.12/bin/python3 -m pip wheel python/finufft -w wheelhouse
 
         PYTHON_BIN=/Library/Frameworks/Python.framework/Versions/3.12/bin/
-        $PYTHON_BIN/python3 -m pip install delocate
+        $PYTHON_BIN/python3 -m pip install delocate==0.10.7
         ls wheelhouse/finufft*.whl | xargs -n1 $PYTHON_BIN/delocate-wheel -w fixed_wheel/
         /Library/Frameworks/Python.framework/Versions/3.6/bin/python3 -m pip install --pre finufft -f fixed_wheel/
         /Library/Frameworks/Python.framework/Versions/3.6/bin/python3 python/finufft/test/run_accuracy_tests.py

diff --git a/CHANGELOG b/CHANGELOG
@@ -1,6 +1,10 @@
 List of features / changes made / release notes, in reverse chronological order.
 If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).
 
+* new docs troubleshooting accuracy limitations due to condition number of the
+  NUFFT problem.
+* new sanity check on nj and nk (<0 or too big); new err code, tester, doc.
+* MAX_NF increased from 1e11 to 1e12, since machines grow.
 * improved GPU python docs: migration guide; usage from cupy, numba, torch,
   pycuda. PyPI pkg still at 2.2.0beta.
 

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -45,6 +45,11 @@ if(FINUFFT_USE_CPU)
         add_definitions(-D_CRT_SECURE_NO_WARNINGS)
     endif()
 
+    # make apple with gnu use old linker, new linker breaks, see issue #360
+    if((APPLE) AND (CMAKE_CXX_COMPILER_ID STREQUAL "GNU"))
+        add_link_options("-ld64")
+    endif()
+
     set(CPM_DOWNLOAD_VERSION 0.38.0)
     include(cmake/setupCPM.cmake)
 

diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@ see `docs/ackn.rst` for full list of contributors.
 
 This is a lightweight CPU library to compute the three standard types of nonuniform FFT to a specified precision, in one, two, or three dimensions. It is written in C++ with interfaces to C, Fortran, MATLAB/octave, Python, and (in a separate [repository](https://github.com/ludvigak/FINUFFT.jl)) Julia. It now also integrates the GPU CUDA library cuFINUFFT (which currently does all but type 3).
 
-Please see the [online documentation](http://finufft.readthedocs.io/en/latest/index.html) which can also be downloaded as a [PDF manual](https://finufft.readthedocs.io/_/downloads/en/latest/pdf/).
+Please see the [online documentation](http://finufft.readthedocs.io/en/latest/index.html) which can also be downloaded as a [PDF manual](https://finufft.readthedocs.io/_/downloads/en/latest/pdf/), and a [project overview](https://users.flatironinstitute.org/~ahb/notes/finufft-project-summary-2023.pdf).
 You will also want to see CPU example codes in the directories `examples`, `test`, `fortran`, `matlab/test`, `matlab/examples`, `python/finufft/test`, etc, and GPU examples in `examples/cuda`, `test/cuda`, etc.
 
 If you cannot build via cMake, try the makefile. Python users try `pip install finufft`. See the docs for details. See our GitHub Issues for tips.

diff --git a/docs/error.rst b/docs/error.rst
@@ -5,12 +5,12 @@ Error (status) codes
 
 In all FINUFFT interfaces, the returned value ``ier`` is a status indicator.
 It is ``0`` if successful, otherwise the error code
-has the following meanings (see ``include/finufft_errors.h``):
+has the following meanings (see codes in ``include/finufft_errors.h``):
 
 ::
 
   1  requested tolerance epsilon too small to achieve (warning only)
-  2  attemped to allocate internal array larger than MAX_NF (defined in defs.h)
+  2  stopped due to needing internal array size >MAX_NF (defined in defs.h)
   3  spreader: fine grid too small compared to spread (kernel) width
   4  spreader: if chkbnds=1, a nonuniform point coordinate is out of input range [-3pi,3pi]^d
   5  spreader: array allocation error
@@ -19,21 +19,22 @@ has the following meanings (see ``include/finufft_errors.h``):
   8  upsampfac not a value with known Horner poly eval rule (currently 2.0 or 1.25 only)
   9  ntrans not valid in "many" (vectorized) or guru interface (should be >= 1)
   10 transform type invalid
-  11 general allocation failure
+  11 general internal allocation failure
   12 dimension invalid
   13 spread_thread option invalid
   14 invalid mode array (more than ~2^31 modes, dimension with 0 modes, etc)
-  15 cuda failure (failure to call any cuda function/kernel)
+  15 CUDA failure (failure to call any cuda function/kernel, malloc/memset, etc))
   16 attempt to destroy an uninitialized plan
   17 invalid spread/interp method for dim (attempt to blockgather in 1D, e.g.)
   18 size of bins for subprob/blockgather invalid
   19 GPU shmem too small for subprob/blockgather parameters
+  20 invalid number of nonuniform points: nj or nk negative, or too big (see defs.h)
 
 When ``ier=1`` (warning only) the transform(s) is/are still completed, at the smallest epsilon achievable, so, with that caveat, the answer should still be usable.
 
 For any other nonzero values of ``ier`` the transform may not have been performed and the output should not be trusted. However, we hope that the value of ``ier`` will help to narrow down the problem.
 
-FINUFFT sometimes also sends error text to ``stderr`` if it detects faulty input parameters.
+FINUFFT sometimes also sends error text to ``stderr`` if it detects faulty input parameters. Please check your terminal output.
 
 If you are getting error codes, please reread the documentation
 for your language, then see our :ref:`troubleshooting advice <trouble>`.
@@ -45,11 +46,15 @@ Large internal arrays
 In case your input parameters demand the allocation of very large arrays, an
 internal check is done to see if their size exceeds a rather generous internal
 limit, set in ``defs.h`` as ``MAX_NF``. The current value in the source code is
-``1e11``, which corresponds to about 1TB for double precision.
+``1e12``, which corresponds to about 10TB for double precision.
 Allocations beyond this cause a graceful exit with error code ``2`` as above.
 Such a large allocation can be due to enormous ``N`` (in types 1 or 2), or ``M``,
 but also large values of the space-bandwidth product (loosely, range of :math:`\mathbf{x}_j` points times range of :math:`\mathbf{k}_j` points) for type 3 transforms; see Remark 5 in :ref:`reference FIN <refs>`.
 Note that mallocs smaller than this, but which still exceed available RAM, may cause segfaults as usual. For simplicity of code, we do not do error checking on every malloc or STL vector creation in the code, and neither is this recommended in modern style guides.
 If you have a large-RAM machine and want to exceed the above hard-coded limit, you will need
 to edit ``defs.h`` and recompile.
 
+Similar sanity checks are done on the numbers of nonuniform points, and it is
+(barely) conceivable that you could want to
+increase ``MAX_NU_PTS`` beyond its current value
+of ``1e14`` in ``defs.h``, and recompile.
diff --git a/docs/opts.rst b/docs/opts.rst
@@ -115,7 +115,7 @@ Diagnostic options
 
   * ``spread_debug=1`` : prints some timing information
 
-  * ``spread_debug=1`` : prints lots. This can print thousands of lines since it includes one line per *subproblem*.
+  * ``spread_debug=2`` : prints lots. This can print thousands of lines since it includes one line per *subproblem*.
 
 
 **showwarn**: Whether to print warnings (these go to stderr).
@@ -157,7 +157,7 @@ There is thus little reason for the nonexpert to mess with this option.
 
 * ``spread_kerpad=0`` : do not pad
 
-* ``spread_kerpad=0`` : pad to next multiple of four
+* ``spread_kerpad=1`` : pad to next multiple of four
 
 
 **upsampfac**: This is the internal real factor by which the FFT (fine grid)

diff --git a/docs/overview.src b/docs/overview.src
@@ -24,7 +24,7 @@ It has been developed since 2017 at the `Center for Computational Mathematics
 <https://www.simonsfoundation.org/flatiron/center-for-computational-mathematics/>`_ at the `Flatiron Institute <https://www.simonsfoundation.org/flatiron>`_,
 by :ref:`Alex Barnett and others<ackn>`,
 and is released under an
-`Apache v2 license <https://github.com/flatironinstitute/finufft/blob/master/LICENSE>`_.
+`Apache v2 license <https://github.com/flatironinstitute/finufft/blob/master/LICENSE>`_. Here is a `project overview <https://users.flatironinstitute.org/~ahb/notes/finufft-project-summary-2023.pdf>`_.
 
 What does FINUFFT do?
 -----------------------

diff --git a/docs/trouble.rst b/docs/trouble.rst
@@ -11,28 +11,26 @@ similar GitHub `Issues <https://github.com/flatironinstitute/finufft/issues?q=is
 If that fails, post an Issue. The lead developer may also be contacted at [email protected]
 
 
-Mathematical "issues" and advice
-********************************
-
-- When requested tolerance is around $10^{-14}$ or less in double-precision,
-  or $10^{-6}$ or less in single-precision, it
-  will most likely be impossible for FINUFFT (or any other NUFFT library)
-  to achieve this, due to inevitable round-off error.
-  Here, "error" is to be understood relative to the norm of the returned vector
-  of values.
-  This is especially true when there is a large number of modes in
-  any single dimension ($N_1$, $N_2$ or $N_3$), since this empirically
-  scales the round-off error (fortunately, round-off does not appear to scale
-  with the total $N$ or $M$).
-  Such round-off error is analysed and measured in Section 4.2 of our `SISC paper <https://arxiv.org/abs/1808.06736>`_.
-
-- If you request a tolerance that FINUFFT knows it cannot achieve, it will return ``ier=1`` after performing transforms as accurately as it can. However, the status ``ier=0`` does not imply that the requested accuracy *was* achieved, merely that parameters were chosen to give this estimated accuracy, if possible. As our SISC paper shows, for typical situations, relative $\ell_2$ errors match the requested tolerances over a wide range.
-  Users should always check *convergence* (by, for instance, varying ``tol`` and measuring any changes in their results); this is generally true in scientific computing.
-
-- On the above topic: while the tolerance is usually a good upper bound for actual accuracy, strangely, in single-precision requesting tolerance
-  of $10^{-7}$ or $10^{-6}$ can give slightly *worse* accuracy than $10^{-5}$ or $10^{-4}$. We are looking into this. It is usually best to request at least 2--3 digits above the respective machine precision, for either single or double precision. Ie, do not set tolerance below $10^{-5}$ in single or $10^{-13}$ in double.
-
-- The type 1 and type 2 transforms are adjoints but **not inverses of each other** (unlike in the plain FFT case, where, up to a constant factor $N$, the adjoint is the inverse). Therefore, if you are not getting the expected answers, please check that you have not made this assumption. In the :ref:`tutorials <tut>` we will add examples showing how to invert the NUFFT; also see `NFFT3 inverse transforms <https://www-user.tu-chemnitz.de/~potts/nfft/infft.php>`_.
+Mathematical "issues" and error size
+************************************
+
+- The type 1 and type 2 transforms are adjoints but **not inverses of each other** (unlike in the plain FFT case, where, up to a constant factor $N$, the adjoint is the inverse). Thus you cannot "undo" one transform by applying the other! (Unless the nonuniform points are precisely regularly spaced, and then you should instead be using the plain FFT). Therefore, if you are not getting the expected answers, please check that you have not made this assumption. In the :ref:`tutorials <tut>` we have an example showing how to invert the NUFFT; also see `NFFT3 inverse transforms <https://www-user.tu-chemnitz.de/~potts/nfft/infft.php>`_.
+
+- If you request a tolerance ``tol`` that FINUFFT knows it cannot achieve, it will return the warning code ``ier=1`` after performing transforms as accurately as it can. Conversely, the status ``ier=0`` does not imply that the requested accuracy *was* achieved, merely that parameters were chosen to attempt this estimated accuracy. As our SISC paper shows, for typical situations, relative $\ell_2$ errors match the requested tolerances over a wide range, barring the caveats below. Users should always check *convergence* (by, for instance, varying ``tol`` and measuring any changes in their results); this is of course generally true in scientific computing.
+
+- When requested tolerance ``tol`` is around $10^{-14}$ or less in double-precision, or $10^{-6}$ or less in single-precision, it will most likely be impossible for FINUFFT (or any other NUFFT library) to achieve this, due to inevitable round-off error. The next point goes into this in more detail.
+
+- The **condition number of the problem** may be the factor limiting the accuracy in your application. In this case, *no NUFFT algorithm nor code could, even in principle, compute it more accurately* given the working precision of the input data! This applies particularly to 1D transforms with large $N$ (number of modes), say $N\ge 10^5$, and especially in single precision. Recall that machine error $\epsilon_{mach}$ is around ``6e-8`` in single and ``1e-16`` in double precision. The rule of thumb here is that one cannot demand that NUFFT relative output errors be smaller than $N_{max} \epsilon_{mach}$, where $N_{max}$ is the largest of the mode sizes $N_1,\dots,N_d$ in the $d$-dimensional case. This applies in the $\ell_2$ or maximum norms. No such dependence on $M$ occurs. In type 3 transforms, the $N_{max}$ should be replaced by the space-bandwidth product (maximum $x$ spread times maximum $k$ spread) in any dimension.
+
+  Let us explain why, recalling the definition :eq:`1d1`. The simple reason is that $(d/dx_j) e^{ikx_j} = ik e^{ikx_j}$, so that the Jacobian derivative of the outputs of a type 1 or type 2 NUFFT, with respect to variation of the input locations $x_j$, grows like the mode index $k$. The magnitude of the latter can be as large as $N/2$, i.e., $O(N)$, or $O(N_{max})$ in the multi-dimensional case. Since the inputs $x_j$ inevitably have a rounding error of $\epsilon_{mach}$, this gets amplified by the above factor to give a lower bound on the error of even the best (most stable) algorithm for the NUFFT.
+
+  In contrast the DFT (e.g., as computed by the FFT) has no growth in condition number vs $N$ (transform size), because it is (up to scaling) an isometry. The crucial difference for the NUFFT is the presence of the new input type (nonuniform point locations), to which it can be highly sensitive.
+
+  For background on condition number of a problem, please read Ch. 12-15 of *Numerical Linear Algebra* by Trefethen and Bau (SIAM, 1997).
+
+  NUFFT error is analysed and measured in Section 4.2 of our `SISC paper <https://arxiv.org/abs/1808.06736>`_, although in that work we omitted attributing the round-off error to the condition number of the problem.
+
+- Finally, re error: while the tolerance ``tol`` is usually a good upper bound for observed accuracy (barring the $\epsilon_{mach}N_{max}$ error above), strangely, in single-precision requesting tolerance of $10^{-7}$ or $10^{-6}$ can give slightly *worse* accuracy than $10^{-5}$ or $10^{-4}$. We are looking into why. It is usually best to request at least 2--3 digits less accurate than the respective machine precision, for either single or double precision. Ie, do not set ``tol`` below $10^{-5}$ in single or $10^{-13}$ in double.
 
 
 Speed issues and advice
@@ -120,9 +118,11 @@ If cuFINUFFT is slow (eg, less than $10^8$ nonuniform points per second), here i
 Crash (segfault) issues and advice
 ****************************************
 
-- The most common problem is passing in pointers to the wrong size of object, eg, single vs double precision, or int32 vs int64. The library includes both precisions, so make sure you are calling the correct one (commands begin ``finufft`` for double, ``finufftf`` for single).
+- Are you using ``int64`` (``integer*8``) types for sizes ``M``, ``N``, etc? (If you have warnings switched off, you may not notice this until execution.)
+
+- Are you passing in pointers to the wrong size of object, eg, single vs double precision? The library includes both precisions, so make sure you are calling the correct one (commands begin ``finufft`` for double, ``finufftf`` for single).
 
-- If you use C++/C/Fortran and tried to change options, did you forget to call ``finufft_default_opts`` first?
+- If you use C++/C/Fortran and changed the options struct values, did you forget to call ``finufft_default_opts`` first?
 
 - Maybe you have switched off nonuniform point bounds checking (``opts.chkbnds=0``) for a little extra speed? Try switching it on again to catch illegal coordinates.
 

diff --git a/docs/users.rst b/docs/users.rst
@@ -94,17 +94,21 @@ For the latest see: Google Scholar `FINUFFT citations <https://scholar.google.co
 
 #. Sriramkrishnan Muralikrishnan at the Jülich Supercomputing Centre is running cufinufft on 6144 A100 GPUs (the NERSC-9 supercomputer), for a particle-in-Fourier method for plasma simulations. https://pasc23.pasc-conference.org/presentation/?id=msa167&sess=sess154
 
+#. Related to that, FINUFFT is being used for a better-converging Fourier approach to the Immersed Boundary method of Peskin and his group at NYU. Zhe Chen and Charles Peskin, https://arxiv.org/abs/2302.08694
+
 #. Pei R, Askham T, Greengard L, Jiang S (2023). "A fast method for imposing periodic boundary conditions on arbitrarily-shaped lattices in two dimensions." J. Comput. Phys. 474, 111792. https://doi.org/10.1016/j.jcp.2022.111792 Uses FINUFFT for plane wave sums.
 
 #. Dylan Green, JR Jamora, and Anne Gelb (2023). "Leveraging joint sparsity in 3D synthetic aperture radar imaging," Appl. Math. Modern Chall. 1, 61-86. https://doi.org/10.3934/ammc.2023005 Uses 3D transforms between $N=201^3$ modes (voxels) and $M=313300$ data points. As they state, "...the computational cost of each method heavily depends on the NUFFT algorithm used."
 
 
-Papers or codes using our new ES window (spreading) function but not the whole FINUFFT package:
+Papers or codes using our new ES window (kernel spreading) function, but not the whole FINUFFT package:
 
 1. Davood Shamshirgar and Anna-Karin Tornberg, "Fast Ewald summation for electrostatic potentials with arbitrary periodicity", exploit our "Barnett-Magland" (BM), aka exp-sqrt (ES) window function. https://arxiv.org/abs/1712.04732
 
 #. Martin Reinecke: codes for radio astronomy reconstruction including https://gitlab.mpcdf.mpg.de/mtr/ducc
 
+#. Aref Hashemi et al, "Computing hydrodynamic interactions in confined doubly-periodic geometries in linear time," J. Chem. Phys. 158(15): 154101 (2023).
+DOI:10.1063/5.0141371.  https://arxiv.org/abs/2210.01837
 
 
 Papers influenced by other aspects of FINUFFT:

diff --git a/include/finufft/defs.h b/include/finufft/defs.h
@@ -55,10 +55,13 @@
 #define MAX_NQUAD 100
 
 // Internal (nf1 etc) array allocation size that immediately raises error.
-// (Note: next235 takes 1s for this size, so it is also to prevent hang here.)
-// Increase this if you need >1TB RAM... (used only in common.cpp)
-#define MAX_NF    (BIGINT)1e11
+// (Note: next235 takes 1s for 1e11, so it is also to prevent hang here.)
+// Increase this if you need >10TB (!) RAM...
+#define MAX_NF    (BIGINT)1e12
 
+// Maximum allowed number M of NU points; useful to catch incorrectly cast int32
+// values for M = nj (also nk in type 3)...
+#define MAX_NU_PTS    (BIGINT)1e14
 
 
 // -------------- Math consts (not in math.h) and useful math macros ----------

diff --git a/include/finufft_errors.h b/include/finufft_errors.h
@@ -2,9 +2,8 @@
 #define FINUFFT_ERRORS_H
 
 // ---------- Global error/warning output codes for the library ---------------
-// NB: if change these numbers, also must regen test/results/dumbinputs.refout
+// All documentation is at ../docs/errors.rst (not here):
 #define FINUFFT_WARN_EPS_TOO_SMALL         1
-// this means that a fine grid array dim exceeded MAX_NF; no malloc tried...
 #define FINUFFT_ERR_MAXNALLOC              2
 #define FINUFFT_ERR_SPREAD_BOX_SMALL       3
 #define FINUFFT_ERR_SPREAD_PTS_OUT_RANGE   4
@@ -14,16 +13,14 @@
 #define FINUFFT_ERR_HORNER_WRONG_BETA      8
 #define FINUFFT_ERR_NTRANS_NOTVALID        9
 #define FINUFFT_ERR_TYPE_NOTVALID          10
-// some generic internal allocation failure...
 #define FINUFFT_ERR_ALLOC                  11
 #define FINUFFT_ERR_DIM_NOTVALID           12
 #define FINUFFT_ERR_SPREAD_THREAD_NOTVALID 13
 #define FINUFFT_ERR_NDATA_NOTVALID         14
-// cuda malloc/memset/kernel failure/etc
 #define FINUFFT_ERR_CUDA_FAILURE           15
 #define FINUFFT_ERR_PLAN_NOTVALID          16
 #define FINUFFT_ERR_METHOD_NOTVALID        17
 #define FINUFFT_ERR_BINSIZE_NOTVALID       18
 #define FINUFFT_ERR_INSUFFICIENT_SHMEM     19
-
+#define FINUFFT_ERR_NUM_NU_PTS_INVALID     20
 #endif