-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
questions related to fast interferometry #49
Comments
This is excellent feedback, thanks Casey! I think several of the issues you mention relate to the fundamental design of Bifrost vs. other approaches to solving similar problems. While the intention is to be as modular and flexible as possible (and we hope to support many varied use-cases), Bifrost primarily targets applications involving continuously streaming data; i.e., where there is a dominant 'time' axis. This manifests in its use of ring buffers (which, importantly, are not designed to behave like queues) to pass data between blocks and its ability to have each block run asynchronously. This design does not preclude it from being used for more batch-oriented processing, but it may mean that it provides less advantage over other approaches for such applications.
Bifrost provides a means of implementing pipeline parallelism, but not batch parallelism. As it currently stands, parallelism within a block is the responsibility of the block (typically one would use OpenMP or the GPU). It might be possible to provide some form of automatic batch parallelism, but it has not been on my radar. The model I've had in mind so far is to use multiple pipelines for parallelising over batches (e.g., running on multiple compute nodes as you suggest).
This problem should be addressed imminently; there will be full support for N-D data, which should make things easier. However, blocks will still operate on 'gulps' of time.
If you need to compute statistics over large spans of time (and cannot do it using single-pass/online algorithms), then you will need to have blocks process large gulps at a time. Theoretically you could gulp in the entire dataset at once if it will fit in memory, but this is not the ideal way to use the library.
Bifrost always steps through data along a single axis (usually time, e.g., streamed from some data source), where each 'frame' contains all the other axes. E.g., you might read in 10 time slices each containing 100 channels and 2 polarisations, and you process that [10,100,2] array before moving on to the next 10 time slices. The idea is for the time axis to be processed in smalls gulps so that they fit in memory. A quick question about your processing pipeline: does 'iteration' correspond to the time axis? And for dedispersion, is that just rolling the time axis, without summing over channels? |
Thanks, Ben. Yes, we do use time as our basic iteration axis. That is also a natural way to parallelize the bulk of the processing (e.g., read 8 integrations and image one integration per core). Your suggestion of reading in a small number of integrations would be fine to feed most of the algorithms in my pipeline. The killer is the dedispersion, though. We have dispersion delays as large as hundreds of integrations, which is why we typically read in frames larger than that. I wonder if there's a way to do this by having a dedispersion block write out to multiple rings? Perhaps one ring per DM trial, each of which accumulates data until it has the entire DM sweep. It sounds like my algorithms are generally more oriented around batches. To bridge that to the stream concept, I'll need a good strategy for accumulating data. If there is a good general way to do that, it may extend your streaming concepts to more use cases. |
Hi Casey, Thanks for trying Bifrost out. Would you like to join in on our next telecon? (usually they are on Fridays at 1pm PT). I think hearing more about your experience so far would be useful for prioritizing different development ideas we have for Bifrost. Regarding data accumulation, would you need to put the DM trials in rings? Which blocks are immediately after the dedisp block in your pipeline? You can accumulate data over multiple gulps of a ring within a block, if that is what you are asking. I think if an algorithm requires some sort of internal accumulation, it generally might be best to keep that as one whole block (coarse-grained pipelines are fine in BF). class ...Block...:
# block definition (for current /master branch API)
def main(self):
...
for inspan in self.read('in_1'):
... # some calculation
accumulator.append(data)
... # do something with accumulator If you are reusing gulp data between reads, there are plans to make a method that will let you provide stride specifications to ring reads, so you can re-read data on a ring, which will make sliding window reads on a ring for things like FIR filters more convenient. There are also plans to make another data type for storing models that are intermittently updated by a block (e.g., gains), but that data structure has not been fleshed out yet. Also, apologies we don't have much documentation at the moment. Please post usage issues/send emails for questions as they come up... |
I put your telecon time on my calendar, so I'll keep it in mind.
BTW, prior to hearing about bifrost, my plan was to use dask distributed. That is designed more generally and tends to serialize everything and pass it around. Serialization is my biggest concern there, which is why bifrost is attractive. |
With help from Danny, I'm playing with bifrost in the context of my work with fast-sampled interferometric data. I like what you have so far and I think it could work for me, but there are a few big issues I need to understand first. I would appreciate any tips to help me move forward.
For context, my data analysis had previously been based on Python and multiprocessing (e.g., http://github.com/caseyjlaw/rtpipe). I need to redesign to use GPUs. I also need to scale up to a 32-node cluster, although that is easy given how my problem parallelizes.
The basic pipeline is:
I'd appreciate input on the following concepts:
Block computing load
In my playing, I see that bifrost tends to assume that each block gets a core. My tests quickly reveal bottlenecks in my bifrost pipeline. What is the best way to deal with blocks of different computing demand? A related point is how best to optimize for CPU usage (just profile? or is there way of getting the equivalent a resource pool as in multiprocessing?).
Parallelism
Traditionally, I've read data into shared memory (on say an 8-core worker node) and used multiprocessing as best I could to get 100% CPU usage. Multiprocessing makes this simple: open, start, close the pool. What is the best way to do this with bifrost? I could imagine having the Pipeline class include this as a feature via the threading library.
High-D data
I typically work with a 4d numpy array. How should I use concepts like "gulp" and so on? They seem tuned to a 1d stream.
Iteration
My past use case pushed me to read in lots of data (1 to 10 GB) into memory and craft each stage of the parallelize efficiently. This lets me measure statistical properties for large segments of data (e.g., to find a bad channel). However, bifrost encourages the algorithm to treat each iteration through a data stream as independent. What is the recommended way of measuring statistical properties over large data volumes with bifrost?
I've also noticed that in some of stages of my pipeline I'd like to iterate through the data structure in one way (e.g., by channel), while later I may want to iterate in a different way (e.g., by integration). Is that allowed with bifrost?
Thanks for your time. I am excited by the potential here and am willing to put more time into making it work for my use case.
The text was updated successfully, but these errors were encountered: