Blimpy roadmap discussion #38

telegraphic · 2018-06-15T05:56:52Z

Hi all, I wanted to start a thread on future direction of blimpy. I have been refactoring the code in the last week, but this does not add any functionality or change the user-level API. Given that this is a heavily used piece of code now, we should discuss together how we continue development. We should make sure that bugs like the ones @christiangil found don’t arise again!

Here are some open questions about the future path:

1. Do we want stability, features, or both?

We could decide to ‘freeze’ blimpy, and make other functionality different packages. Or, we could prefer features, but adding features may break stuff. Having both requires more diligence, code reviews and overall collaboration.

2. Should we stop spending time improving blimpy?

Apart from bugfixes, is blimpy ‘good enough’, and should we put our effort into different goals?

3. Design philosophy

As packages grow, they inevitably get messier. We should decide on what design patterns we want to adopt, and approaches to clean development. I personally don’t like giant files with thousands of lines, and like to split things up by functionality. But this isn’t necessarily the way to go.

For example, we could write our code like this:

class ThingDoer(object):
    def load_data(filename):
        [code to load data here]
    def dedisperse(self, dm):
        [code to dedisperse here]
    def find_et(self)
        [code to find ET here]

d = ThingDoer()
d.dedisperse(dm=521)
d.find_et()

I quite like being able to do things like d.dedisperse()! But we could equally have three files:

/thing_doer_pkg/
  |- thing_doer.py
  |- dedisperse.py
  |- find_et.py

and split the code into three files:

# thing_doer.py
class ThingDoer(object):
    def load_data(filename):
        [code to load data here]

# dedisperse.py
def dedisperse(thing_doer_object, dm):
    [code to dedisperse here]

# find_et.py
def find_et(thing_doer_object):
    [code to dedisperse here]

I think the ‘separation of concerns’ approach to software engineering would favor the second, but there is a lot to like about the first. I note with numpy you can do:

a = np.array([1,2,3])
m0 = a.mean()
m1 = np.mean(a)
# m0 and m1 are identical

I probably favor the numpy approach, assuming that a.mean() and np.mean actually just calls the same code.

4. Do we still need Filterbank()?

blimpy started off as just one file, a filterbank reader. The Waterfall class improves file handling, and has all the same features (it currently subclasses Filterbank). Do we need both still, or do we move toward Waterfall only?

5. Tests and coverage

I would like to keep Travis CI for basic unit tests and add code coverage tests with coveralls.io. But we need to write good unit tests! And I don’t think Travis is good for testing really large files, so maybe we need to set something up like Jenkins. Or is this wasted development effort?

6. Should we redesign to support other formats?

Currently we haven’t really abstracted away the concept of a filterbank: we still rely on headers like foff and fch1. What if we wanted to support FITS files, .ar files, etc? Do we have a separate class for each, or make Waterfall handle this?

7. Py2 and Py3 support

This is painful, but I think for now we should try and support both. There are new features like data classes that would completely break Py2 support but might be nice longer term. Is this (or other Py3 features) compelling enough to dump Py2?

8. API and speed vs sigpyproc

Gerry and I did some speed tests and found that sigpyproc was about 2x faster loading filterbank data than blimpy. But this has a very different API, loads integration by integration (it can’t slice), and can’t read HDF5. It was also harder to install – although I’ve tried to fix this somewhat. What is there to learn from sigpyproc, and do we want to merge some functionality? For example, the dedisperser code? Also, do we want to use a similar design where we offload some parts of the code to C, and wrap it in Python?

The text was updated successfully, but these errors were encountered:

gpshead · 2018-11-28T04:52:14Z

Regarding (7) on Py3: Python 3 isn't even a question. You must support it or your library becomes irrelevant to the world. Python 2 is EOL in a year.

Supporting both at once from a single codebase is the normal transition path for most projects so that users of those packages can transition their code between versions as well.

jeenriquez · 2018-11-28T05:12:20Z

Thanks Gregory. Indeed, moving to Python3 is not a question anymore.
Fortunately we have updated the code enough that at the moment the code is passing on Travis for Py3.

It doesn't mean we are out of the woods. I still need to make myself Py3 compatible. :)

gijzelaerr · 2018-11-28T12:02:15Z

I would work towards a really nice release, label that 'long term support' and move on from there. Only solve bugs in the stable one, then you can go more wild in master.
in my opinion programming as 'functional' as possible makes code easier to maintain and test. You can still have a OO style API to the end user though, which would just be a wrapper.
If your unittests are slow because of big files they are not unit tests but integration tests. Still a good plan to have both. I would propose keep on running unittests on travis but set up a nice fat jenkins server which can run the (nighty) containerized integration tests on some nice big files https://github.com/UCBerkeleySETI/bl_sw_meta/issues/31
Debian is dropping py2 support for astro packages, Ubuntu 20.04 will probably have close to no py2 support. If you don't intend to use the code base in the same memory space with difficult to port py2 only project I would slowly move into the py3 only direction, just like matplotlib and astropy. People who need py2 can use the LTS version mentioned in point 1 and their needs are probably not going to change much since they already live in history.

siemion · 2020-10-29T16:40:31Z

I'm closing this issue for now, as it seems development has moved well past many of these issues.

telegraphic added the enhancement label Jun 15, 2018

jeenriquez assigned jeenriquez and telegraphic Sep 7, 2018

telegraphic mentioned this issue Oct 11, 2018

Universal read_header function #61

Open

gijzelaerr self-assigned this Nov 28, 2018

telegraphic assigned telegraphic and unassigned telegraphic Nov 29, 2018

siemion closed this as completed Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blimpy roadmap discussion #38

Blimpy roadmap discussion #38

telegraphic commented Jun 15, 2018

gpshead commented Nov 28, 2018

jeenriquez commented Nov 28, 2018

gijzelaerr commented Nov 28, 2018 •

edited

Loading

siemion commented Oct 29, 2020

Blimpy roadmap discussion #38

Blimpy roadmap discussion #38

Comments

telegraphic commented Jun 15, 2018

1. Do we want stability, features, or both?

2. Should we stop spending time improving blimpy?

3. Design philosophy

4. Do we still need Filterbank()?

5. Tests and coverage

6. Should we redesign to support other formats?

7. Py2 and Py3 support

8. API and speed vs sigpyproc

gpshead commented Nov 28, 2018

jeenriquez commented Nov 28, 2018

gijzelaerr commented Nov 28, 2018 • edited Loading

siemion commented Oct 29, 2020

gijzelaerr commented Nov 28, 2018 •

edited

Loading