-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histograms from timed dataframe #116
Conversation
7a2ba24
to
c9b8a17
Compare
This does not produce the right kind of normalization histogram yet, I think... |
This implements both variants now. Both routes produce almst identical results, but histograms from timed dataframes are substantially faster. Also a normalization routine for the compute function is added. One issue remains: If histogram axes are jittered, the actual values of the jitter are different during computation of the data and of the normalization histogram, leading to small errors in the histogram on the order of a percent or so. A workaround would be to include the histogram calculation into bin_partition. This would also avoid reading the data twice, which I think is the major time bottleneck at the moment. |
@zainsohail04 This one is pending on your implementation into the Flash loader (and some review)... |
d3076e6
to
0d64a32
Compare
Rebased, and added dummy implementation for flash loader. Tests are also still pending. |
Pull Request Test Coverage Report for Build 6760852807
💛 - Coveralls |
0d64a32
to
1fc17c6
Compare
1fc17c6
to
b8f996b
Compare
@zain-sohail For me the flash timed_dataframes don't work. They consume endless amounts of memory and never finish computation. |
Just tried and it seems to work fine. Where are you facing a problem? |
Or does it? Are there so many empty macrobunches? |
if you look at the tables you show, the pulse IDs in the electron dataframe are very sparse (7, 38, 69...) so it seems like count rate was really bad, and therefore I expect many empty pulses. |
b8f996b
to
2fc1e0a
Compare
rebased to current main |
I realized that after posting my previous post, but don't undertand the data format completely yet. What exactly are the Train, pulse, and electron IDs? Are these Macrobunches, microbunches, and electrons per microbunch? And what is the time-base for the timed_dataframe? I was expecting macrobunches, but then I would expect one row per trainID, no? Now it seems one row per microbunch. This is certainly typically more than electrons. And why is it going up to 1000, if tpyically there are 400 or 500 microbunches in a macrobunch? |
How you described it is all correct. |
All you want here is to capture changes of the scanned variable, which typically is anyways only read out once per macrobunch, no? So no, there should be no difference. It would only make a (marginal) difference for parameters that are varied within the macrobunch. Not sure, though, how this would look like if you consider, e.g. BAM correction etc. |
Right! but still there might be some pulse resolved normalizations one would want, like the FEL intensity for example. I'm afraid we need to keep the per-pulse dataframe for those |
Wouldn't make sense that there is a timing difference in the compute of dataframe, only makes a difference in creating buffer files quicker, I would imagine. Regarding your question about why the dataframe is bigger, I think Steinn managed to answer it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I tested it together with the hextof changes from #169 and seems to work correctly. Only exception the comments I made. Once those are resolved you can merge for what concerns me.
I accidentally uploaded the branch where i tested this. I'll leave it till this is merged. its |
…code to binning.py
fix performance issue in mpes loader for timed dataframes return normalization histograms as xarrays
…essor function add accessor functions for binned and normalized histograms and normalization histograms
05c2eb2
to
0d621d9
Compare
0d621d9
to
9c3faf0
Compare
I fixed this, and modified the tests to test the timed dataframes. |
I also updated your test branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this and seems to work fine. LGTM!
closes #101 |
I am working on improving tests for this still, and update the new processor functions, then I will merge |
This adds support for a per-time-unit dataframe, and histogram calculation based on this dataframe.
Histogram calculation in the example notebook takes on our machine ~55s. This can potentially still be improved, e.g. by taking our fast binning method (currently fails because of xarray generation and concatenation issues)