-
Notifications
You must be signed in to change notification settings - Fork 2
/
ChangeLog
465 lines (357 loc) · 15.4 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
0.30.0:
- sht:
- new function `pseudo_analysis_general` for iterative least-squares analyis
of spherical maps on rbitrry grids
0.29.0:
- general:
- rework multi-threading infrastructure to allow integration with external
thread pool implementations.
- make more extensive use of uninitialized arays. This helps performance
and scaling, especially for small SHTs.
- fft:
- add the functions `r2r_separable_fht` and `r2r_genuine_fht`, which perform
Hartley transforms using the commonly adopted convention, i.e.
FHT(x) = FFT(x).real - FFT(x).imag
instead of the unusual convention of `r2r_separable_hartley` and
`r2r_genuine_hartley`, which use
FHT(x) = FFT(x).real + FFT(x).imag
`r2r_separable_hartley` and `r2r_genuine_hartley` should not be used in
new code, but they are kept for backwards compatibility.
- sht:
- new functions `synthesis_general` and `adjoint_synthesis_general`
for SHTs on maps without any constraint on pixel locations.
This allows, for example, using maps whose pixel positions have been
distorted by lensing, or QuadCube maps.
- functions `synthesis`, `adjoint_synthesis`, `synthesis_deriv1`,
and `pseudo_analysis`: introduce an optional integer parameter `mmax`
and allow it to be different from `lmax` (`mmax==lmax` was implicitly
assumed so far)
- nufft:
- re-write core part of the 2D Type 1 NUFFT, which was miscompiled on some
platforms. Unfortunately, the new implementation is slightly slower on
x86-64.
- allow very small uniform grid dimensions (down to 1)
- wgridder:
- allow very small uniform grid dimensions (down to 2)
- misc:
- new function `empty_noncritical` for building empty numpy arrays without
critical strides
0.28.0:
- general:
- allow control over multithreading via environment variables
DUCC0_NUM_THREADS, DUCC0_PIN_DISTANCE, and DUCC0_PIN_OFFSET.
This is still experimental and may change in the future.
- fft:
- introduce a flag `allow_overwriting_input` to `c2r`, which can speed up
execution by avoiding temporary arrays
- nufft/wgridder:
- changed kernel database to hold optimized kernels depending on
dimensionality and floating point accuracy. This allows for slightly
better tuning and improves maximum attainable accuracy in 2D and 3D.
- julia:
- start of a Julia wrapper, currently focused on FFT, NUFFT and SHT support
- math:
- computations involving Peano-Hilbert indices are now much faster
- misc:
- new function `roll_resize_roll`, which allows efficient combined
rolling/padding/truncation of arrays.
0.27.0:
- general:
- modernize CI and build machinery
- some code was moved from header files into .cc files to avoid duplicate
symbols in some situations.
- nufft:
- added a "plan" class, which allows efficient repeated execution of a
transform with fixed grid geometry and nonuniform point positions.
(For transforms that are only executed once, the traditional interface
should be preferred.)
- added new parameters for data periodicity (formerly hardwired to 2pi) and
data ordering on the regular grids (formerly starting with the most negative
frequency, now also allows "standard" FFT ordering starting with the zero
mode).
- added a benchmark demo script for easier comparison to FINUFFT and NFFT.jl.
0.26.0:
- general:
- clarify that the preferred installation method is compilation from source
- wgridder:
- add "self-tuning" versions of `vis2dirty` and `dirty2vis` which attempt to
save time by
- splitting visibilities into a small-w and large-w part and processing
them separately, and/or
- subdividing the field of view into facets.
This can be advantageous when FFT cost would dominate using the naive
approach and the number of w-planes is large (roughly 50 and higher).
- sht:
- the methods synthesis, adjoint_synthesis, and synthesis_deriv1 now accept
an optional leading dimension in the alm/map arrays, to allow "batched"
transforms. If the batch size is large enough, parallelization will not
be done within a single transforms, but rather over different transforms,
which can be beneficial, especially if the transforms are small.
- a new method "pseudo_analysis" was added, which performs iterative,
approximate map analysis using the LSMR algorithm.
0.25.0:
- general:
- try to fix the package on 32bit platforms
- nufft:
- significant performance and accuracy improvements
- wgridder:
- recalculated kernels, improved error model
- small performance tweaks
0.24.0:
- general:
- work around a compilation problem with gcc 7
- nufft:
- beginnings of a non-uniform FFT module
Conventions are closely following the FINUFFT library.
The interface is not finalized yet.
- wgridder:
- improved pre-sorting of visibilities
0.23.0:
- general:
- improved template code for multi-array operations (internal detail)
- fft:
- fix a bug in multi-D Hartley transform which was introduced in ducc0 0.21.
This bug was triggered in cases with two or three transformed axes and at
least one untransformed axis.
- use clear dual-license headers in all files required for the FFT component
- healpix:
- input arrays to all functions can now be float32/int32 as well
- wgridder:
- performance tweaks to FFT and kernel evaluation parts; performance gain
on the order of 10%.
0.22.0:
- general:
- many internal cleanups and consistency improvements
- preparations for release as an Alpine Linux package
- fft:
- re-introduce plan caching. This is possible since plans for large 1D
transforms no longer scale with the length of the transform,but only its
square root, limiting the memory overhead
- code tweaks to improve copying steps for multi-D transforms (basically a
workaround for mis-optimizations by gcc)
0.21.0:
- general:
- support for more platforms (e.g. Raspberry Pi)
- rewrite of the classes for multidimensional array views, which allows
many simplifications, multithreading etc.
- fft:
- low-level tweaks which accelerate internal function calls; this especially
helps multi-D transforms with short axis lengths
- genuine Hartley transforms over 2 and 3 axes no longer require big temporary
arrays
- healpix:
- multithreading support for most functions
0.20.0:
- general:
- minimum required Python version is now 3.7
- tests: retire Ubuntu 18, improve tests with icpx
- fix compilation failure on non-x86 platforms
- fft:
- allow individual, compile-time, selection of SIMD types to be used
- sht/healpix:
- prepare better support for Healpix pixelization
- misc:
- convenience function for building numpy arrays without critical strides
0.19.0:
- general:
- binary wheels can now be built and uploaded to PyPI; the installation
instructions have been updated accordingly. Please provide feedback in case
of problems!
- fft:
- C++ sources for FFT calculation now have their own subdirectory.
- new function `r2r_fftw`, which supports FFTW's halfcomplex storage scheme.
- new function `convolve_axis`, which performs efficient convolution of arrays
with arbitrary 1D kernels, optionally followed by zero-padding/truncation.
0.18.0:
- sht:
- implement adoint_analysis
CAUTION: this is still really experiental!
- wgridder:
- improve cost model, assuming that the FFT component will not scale perfectly
0.17.0:
- general:
- more information available on PyPI
- fft:
- performance tweaks for 1D FFTs
- reduced memory overhead for 1D FFTs
- multithreading support for 1D FFTs (this is only advantageous for very long
transforms at the moment)
- sht:
- interface for fully general SHTs is now accessible from Python; this is not
completely finalized, however.
- improved a_lm rotation performance
0.16.0:
- general:
- the GIL is now released in many more functions
-fft:
- very long 1D transforms now have a lower memory overhead and should be
faster
-sht:
- a_lm rotation is now much more accurate, but slightly slower
- the improved spherical harmonic analysis capabilities are now documented
-misc:
- two new convenience functions vdot() and l2error() were added
0.15.0:
- general:
- the code is now compiled with the "-fvisibility=hidden" flag, which reduces
the size of the resulting binary.
- demo codes were adjusted to use the new SHT interface.
- fft:
- added some functions to reduce the amount of unnecessary memory
allocations and data copying.
- sht:
- it is no longer necessary to pre-allocate an array for the output of the
`sht.experimental.*2d*` functions. If not provided, the functions will
create the array automatically now, which requires passing of new `ntheta`
and `nphi` parameters in come cases.
- the `sht.experimental.*2d*` functions now take an optional `mmax` parameter
which can be used to limit the maximum azimuthal moment of a transform.
If not supplied, the code assumes that `mmax` is equal to `lmax`.
- added some unit tests for the new SHT interface.
- reduced memory overhead forsome of the `sht.experimental.*2d*` functions.
- misc:
- added functionality (originally from Planck Level-S) to simulate time
streams of detector noise.
0.14.0:
- general:
- ducc0.__version__ is now also defined under Windows
- sht:
- further performance improvements
- added functions for manipulation of "2D maps", i.e. maps consisting of
(ntheta*nphi) pixels with equidistant pixels in phi, and rings distributed
along theta according to one of the CC, DH, F1, F2, GL, MW, MWflip schemes.
- totalconvolve:
- bug fix in the adjoint convolution: results were inadvertently conjugated
0.13.0:
- general:
- more comprehensive references in README.md
- sht:
- bug fixes
- tweaks to the experimental interface for extracting moments up to lmax
from maps with only lmax+1 or lmax+2 equidistant rings.
0.12.0:
- general:
- update installation instructions in README.md
- sht:
- expose functionality for computing gradient maps from spherical harmonic
coefficients
0.11.0:
- general:
- beginning of Doxygen documentation for the C++ part
- fixes to the #include statements in header files; now every header can be
included in isolation.
- some CI streamlining
0.10.0:
- general:
- HTML documentation generation using Sphinx
Up-to-date documentation for the ducc0 branch is available at
https://mtr.pages.mpcdf.de/ducc/.
- more and improved docstrings
- SIMD datatypes are now much more compatible with C++ upcoming SIMD types.
The code can be compiled with the types from <experimental/simd> if
available, with very small manual changes.
- reshuffling and renaming of files
- fft:
- 1D transforms have been rewritten using a much more flexible class hierarchy
which allows more optimizations. For example 1D FFTs can now be partially
multi-threaded and the Bluestein algorithm can be used as a single pass
instead of just replacing a whole transform.
- sht:
- design of a new SHT interface. Parts of this interface are made visible
from Python, in the "sht.experimental" submodule. The "sharpjob_d"-based
interface will be kept for compatibility purposes until ducc1 is released.
- experimental support for spherical harmonic analysis that only requires
lmax+1 or lmax+2 equidistant rings for exact analysis up to lmax.
- misc.rotate_alm was moved to the sht submodule.
- totalconvolver:
- interface change to synchronize it better with the upcoming SHT interface.
Basically, if an array has a "number of components" axis, this is now
always in first place.
Strictly speaking this is an interface-breaking change, but to the best of
my knowledge the interface in question has not been used in other projects
yet.
0.9.0:
- general:
- improved and faster computation of Gauss-Legendre nodes and weights
using Ignace Bogaert's implementation (https://doi.org/10.1137/140954969,
https://sourceforge.net/projects/fastgausslegendrequadrature/)
- Intel OneAPI compilers are now supported
- new accepted value "none-debug" for DUCC0_OPTIMIZATION
- wgridder:
- fixed a bug which could cause memory accesses beyond the end of an array
- fft:
- slightly improved buffer re-use
- misc:
- substantially faster a_lm rotation code based on the Mikael Slevinsky's
FastTransforms package (https://github.com/MikaelSlevinsky/FastTransforms)
0.8.0:
- general:
- compiles and runs on MacOS 11
- choice of various optimization and debugging levels by setting
the DUCC0_OPTIMIZATION variable before compilation.
Valid choices are
"none":
no optimization or debugging, fast compilation
"portable":
Optimizations which are portable to all CPUs of a given family
"portable-debug":
same as above, with debugging information
"native":
Optimizations which are specific to the host CPU, non-portable library
"native-debug":
same as above, with debugging information
Default is "native".
- wgridder:
- more careful treatment of u,v,w-coordinates and phase angles, leading to
better achievable accuracies for single-precision runs
- performance improvements by making the computed interval in "n-1" symmetric
around 0. This reduces the number of required w planes significantly.
Speedups are bigger for large FOVs and when FFT is dominating.
- allow working with dirty images that are shifted with respect to the phase
center. This can be used for faceting and incorporating DDEs.
- new optional flag "double_precision_accumulation" for gridding routines,
which causes accumulation onto the uv grid to be done in double precision,
regardless of input and output precision. This can be helpful to avoid
accumulation errors in special circumstances.
- pointingprovider:
- improved performance via vectorized trigonometric functions
0.7.0:
- general:
- compilation with MSVC on Windows is now possible
- wgridder:
- performance (especially scaling) improvements
- oversampling factors up to 2.5 supported
- new, more flexible interface in submodule `wgridder.experimental`
(subject to further changes!)
- totalconvolver:
- now performs non-equidistant FFT interpolation also in psi direction,
making it much faster for large kmax.
- new low-level interface which allows flexible re-distribution of work
over MPI tasks (responsibility of the caller)
0.6.0:
- general:
- multi-threading improvements contributed by Peter Bell
- wgridder:
- new, smaller internal data structure
0.5.0:
- wgridder:
- internally used grid size is now chosen automatically, and the parameters
"nu" and "nv" are ignored; they will be removed in ducc1.
0.3.0:
- general:
- The package should now be installable from PyPI via pip even on MacOS.
However, MacOS >= 10.14 is required.
- wgridder:
- very substantial performance and scaling improvements
0.2.0:
- wgridder:
- kernels are now evaluated via polynomial approximation, allowing much
more freedom in the choice of kernel function
- switch to 2-parameter ES kernels for better accuracy
- unnecessary FFT calculations are skipped
- totalconvolve:
- improved accuracy by making use of the new wgridder kernels
- *INTERFACE CHANGE* removed method "epsilon_guess()"
- pointingprovider:
new, experimental module for computing detector pointings from a time stream
of satellite pointings. To be used by litebird_sim initially.