-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pbasex #264
Comments
Thanks for finding this @stggh! It would be nice to have a pBasex implementation in PyAbel. With any new algorithm, we would want to re-make all of the figures from the paper and see how the algorithm compares with the others. In this case, it looks like the resolution is much worse, but perhaps this is simply the basis set. Please go ahead and contact Elio (if he doesn't already see this thread) and see if he is interested. |
Hi Dan and Stephen, In terms of the example above, I see two immediate issues:
Here's what I got with your sample image, using 256 radial basis functions (512 total, up to l=2). |
Thanks @e-champenois! Your calculation looks much better. I will attempt to implement |
After #263, please! :–) ... so that I can finish my stuff...
|
Fair enough! |
Any update about the implementation of pBasex in PyAbel? cheers! |
Sorry to have blocked progress on this for so long. Yes, if you have the time, please start the changes. At the moment I am tied up with documentation, in preparation for a laser safety verification of my lab. |
I was reading the pBasex article very carefully, but found that it is not clear about some details, so if anybody can clarify them, it would be great. I'm asking mostly from the perspective of my forthcoming “rBasex” (feel free to try it), but the answers should be also helpful when/if somebody is going to work on implementing pBasex itself.
|
Uh oh. The last time that you read an article very carefully, it was our article, and it was a lot of extra work for the authors! :)
I don't have it. I'd suggest that we contact the authors of the pBasex article. I think that their current e-mails are listed here: https://www.synchrotron-soleil.fr/en/beamlines/desirs
No idea. Using 8 GB of memory seems fairly crazy... I also don't know the answers to you other technical questions, but it seems like you are asking important questions. Do I understand correctly that you rBasex method is similar to the pBasex method, and you're trying to figure out exactly how related the two methods are? Seems like it would be worth getting in touch with the authors of the original paper and figuring out some of the details. EDIT: the first author of the pBasex paper, Gustavo Garcia, has also done some work on photoelectron distributions from chiral molecules in circularly polarized light: https://www.cfel.de/e4929/e20895/e5101/e5102/2013-12-19GustavoGarcia_eng.pdf |
I can't answer about specifics of the original paper/code but some quick comments: You are right that you can get into memory issues with large basis sets, though you have an extra factor of 4 in your estimate since there is 4-fold quadrant symmetry in the even-l only case (or at least 2-fold symmetry in the general case, i.e. chiral molecule or w+2w ionization, which could be another nice source of data to test your asymmetric codes: see Zipp et al. Optica 2014). I ran into these issues trying to fit some above threshold ionization (ATI) data with very high l_max. I think the strength of pBASEX is using the sparsity of the Legendre representation, so perhaps it doesn't make sense to use it anymore (or is not even possible, memory-wise) when this sparsity is lost, which is the case for ATI data. For practical purposes, N>>M, so the S and V matrices take up very little storage compared to U, which has the same size as G. With U, S, and V calculated, G doesn't need to be saved so the SVD doesn't affect the storage (I guess it might affect the peak memory use unless the SVD can be done in place). |
Mostly for myself. :–) But in the end everybody's extra work was rewarded, I hope.
Yes, since they are closely related, I need to describe what the differences are, but I do not want to make false statements.
Well, when I asked Hanna Reisler (my PhD adviser) whether we have pBasex, she told me that they did try to get it from the authors, but encountered unreasonable difficulties. Maybe the situations has changed since then, but I still feel somewhat hesitant to contact them. :–) However, if nobody else can help, it's worth trying.
I also think so. In the article they talk only about 256×265 (after resampling) images and 128×3 basis functions, which with single-precision floats would give “just” 100 MB, easily doable even in 2004, although still heavy in terms of the disk space. The problem is that this does not scale well to larger images even today... |
@e-champenois, thanks for the reply!
In the original article they do not say anything about the symmetry folding, but even if their implementation actually does it, this is at most a factor of 4, as you said. However, the memory requirements are still in the gigabytes range. And they scale cubically with the image radius.
The scaling with lmax is just linear. I guess, it would be much easier to run out of memory by trying to get a high-resolution transform of a 2048×2048 image even with lmax = 2. So it's not really a sparsity issue.
But overall you do need at least one N × M matrix on the disk and in the RAM, correct? By the way, did you write your implementation just based on the general ideas from the pBasex article, or you had something from the authors? Did you compare with the original pBasex? |
Oh, indeed! I'm really happy with the way that manuscript turned out.
Shall I send them an e-mail?
Well, perhaps that's just the way that it works out. Perhaps the assumption is that large basis sets won't need to be used to describe most images... |
I actually wrote to Gustavo on Monday and got the source code from him. It is supposed to be compiled by the users and works only in Unix-like systems (he said that other people have created other versions for Windows). I compiled it in Linux without any issues and ran some tests. Yesterday I wrote back to Gustavo to thank him and ask some remaining questions, so I'll report my findings here after he replies. But unfortunately my suspicion about the disk/memory requirements turned out to be correct, and the slowness of the basis generation actually exceeded my expectations... |
Now Gustavo wrote that currently he actually uses another implementation, coded in Igor Pro, “with a theta sampling which depends on the radius, which is more efficient and logical”. But since I do not have it yet, here is my summary about the original C implementation. First, the answers to my questions above:
Some timing/size tests: Here NR = radial bins, NTH = angular bins, NK = radial basis functions = NR / 2 (recommended); NL = angular functions = 2 (default) in all cases. The basis generation has two time-consuming steps: the Abel integration of the basis functions (“basis”) and the singular-value decomposition of the resulting matrix (“SVD”). The decomposition results are saved to separate files for U, V and S; their sizes are added in the table.
So the basis size and generation time scale linearly with NR × NTH × NK (3rd power of the image radius), and the SVD time scales as NR × NTH × NK2 (4th power!). I did not try to create a narrow basis set with NK = NR = 512, which is required to analyze 1-megapixel images with 1-pixel resolution, but it apparently will take >9 h and >2 GB. The transform timings for a 1 Mpx image are (I have slightly modified the code to print these metrics; the last “Done in ...” is printed by the original code):
for the default settings (256 × 256 resampled image, 128 functions) and
for the 512 × 512 resampled image and 256 functions. Not slow, but not very fast either. Here I should make a note about the performance of this original implementation. First, its code is not optimized at all, except that it does the left–right symmetrization of the images (but never does the top–bottom symmetrization, even when odd orders are not used) and works with only one half. Otherwise everything is done quite inefficiently (for example, each Legendre polynomial is evaluated at each integration point anew through a recurrent formula, with memory allocation/deallocation...). Second, for all nontrivial calculations it uses the GNU Scientific Library (including the use of its general-purpose integration function to compute the basis projections). The GSL is probably not the most efficient and runs in a single thread (although all the heavy computations can be efficiently parallelized). I guess that optimizing the computations and using more efficient linear-algebra libraries could improve the timings by a factor of 10–100. The memory usage can be reduced by a factor of 2 (using the top–bottom symmetry) at most. But the time and memory scalings with the image size are still worse than for all other methods, meaning that even with the best optimizations the pBasex method cannot be practically used for high-resolution transforms. Finally, some additional remarks about this original pBasex implementation and the pBasex article itself.
|
It depends on the expectations for what “most” means. :–) I have shown earlier an example of the 0.3% kinetic-energy resolution (FWHM) in my setup (which was not the first to achieve this), which means the radial resolution ~0.15% and hence at least ~700 distinct radial values, so in practice >1000 basis functions are needed to represent this adequately. But on the other hand, in many experiments the resolution is indeed much worse and cannot be improved by reasonably better instruments. In these cases a low-resolution method will be enough. |
@MikhailRyazanov: That's great that you were able to make contact with Gustavo and awesome that he was kind enough to provide the original source code. Your analysis of the pBASEX code seem quite insightful! It seems that, generally speaking, the scaling of the computation time with the size of the image is not especially favorable. How does your rBasex method provide similar results with better scaling? Your final two points (about the cartesian-to-polar conversion and setting negative values to zero) are perfect examples of how confusion can arise when trying to compare Abel transform methods implemented in different software! So, have you reached a conclusion regarding if/how we should incorporate pBasex? Or is more study needed? |
It employs the separability into radial and angular parts (which does not work with Legendre polynomials, but works with cosine powers). So instead of solving the huge problem that connects every pixel to every basis function (where these huge matrices appear), it is splits into two subproblems: 1) get the radial distributions for each angular part (a standard procedure, linear in the number of pixels = quadratic in the radius), 2) transform each of these radial distributions (they are 1-dimensional, so it is a matrix-by-vector multiplication, also quadratic in the radius). The 1st subproblem is as overdetermined as the “huge problem”, but has an explicit solution, without the need of the SVD and extra memory. And the 2nd is not overdetermined, so it's just rmax equations for rmax variables (separately for each angular part). Plus, the basis projections are calculated analytically, which is much faster, and not for each pixel, but just for each radius.
It can be implemented directly as described in the article (like the original code does). The only tricky part would be to calculate the basis projections efficiently, if we don't want this to take many hours. Or, as Gustavo wrote, use a different resampling method (I haven't received his new code yet, but my guess is that it's similar to the rasampling in the “polar onion peeling” method). And, in my opinion, the approach used by @e-champenois (without resampling) is more reasonable. The only problem is that it is intended to transform images of some particular size, so different image sizes require different basis sets to be generated (which is very slow). The original pBasex resampling just takes its samples from any image, and the basis-set size only determines the relative radial resolution, but not the input/output image size. |
Not certain if this should be made as a new issue or not, but it is sort of in the vein of this thread. I have been attempting to use the rbasex implementation to invert some above threshold ionization (ATI) data that requires a very high order. I have found that when the order exceeds 20 (or 21 for odds) their is a sharp change in the noise. The figure below shows an example of this problem. You can produce the same result with the rbasex "beam block" example. The cutoff is always at order 20 (or 21 for odd). As the order increase the problem only gets worse. If this is just a limit of rBasex then it should probably be mentioned in the documentation that their is a maximum order that is usable. |
Can you try other “polar” methods (lin-BASEX, pBasex, POP) to see how they behave with very high orders? And, before going into computational details, is ATI reasonably described in terms of Legendre polynomials? I have no experience with it, but from the few examples that I've read, it seems to be non-perturbative and involve strong interference between many channels. |
@talbert48 Thanks for bringing this to our attention, and for posting the nice data. I think that we should move this to a separate issue. I opened #286 for the discussion. |
I came across a nice python implementation of the
pbasex
algorithm:Modifying the provided
CPBASEX/examples/sample_pbasex.py
provides thisO2-ANU1024.txt
Abel transform comparison withPyAbel (basex)
:test-pbasex.py
There may be a case to work with Elio Champenois to:
pbasex
intoPyAbel
O2-ANU1024.txt
image comparison indicates less resolution, peak intensity, and baseline issues. In particular, the intensity required axR
scaling to matchPyAbel
, which may even bePyAbel
's fault, see abel.tools.vmi.angular_integration() Jacobian should be R^2 not R #253.Thoughts?
The text was updated successfully, but these errors were encountered: