Refactor with recent changes #36

telegraphic · 2018-06-03T14:39:13Z

There is quite a lot in here!

The main change is creation of a generic Reader class, that subclasses H5 and FIL readers. This is on the same lines as the earlier subclass of Filterbank by Waterfall, and again is driven to avoid code duplication and to use OOP design patterns.

I also added a new test, which runs through and compares the Waterfall loaded Voyager data in both fil and h5 formats.

The rest of the changes are docstrings/typo/readability changes.

This function was in both Filterbank and Waterfall classes, but was slightly different. The waterfall invocation was marginally better, albeit pretty much identical. This has been moved to Filterbank only, the Waterfall class will still have access through inheritance.

This functionality can be done simply with np.array.size, or np.prod(np.array.shape)

EE: Please check this does what you think it should.

(CamelCase for classes in PEP8)

Also made read_header return the parsed dictionary, for consistency with FilReader.read_header() Now both H5 and Fil readers return a dictionary, which is useful.

Other methods such as _setup_freqs and _setup_chans

At top setting plt.rcParams['axes.formatter.useoffset'] = False instead of calling ax.get_xaxis().get_major_formatter().set_useOffset(False) This means fewer lines of code, but should be more resilient to matplotlib versions that don't support this API.

plot_time_series was failing if orientation=None

We are not strictly following PEP8 but fixing some simple things

This function was a helper, that was never used and confusing. No functionality has been lost.

although blob_dim is a keyword argument, it isn't actually used.

Not sure we still need / use this functionality (generation of a filterbank file from a header and data blob), perhaps it could be removed in the future.

…I#32 UCBerkeleySETI#33 UCBerkeleySETI#34

telegraphic · 2018-06-06T04:29:16Z

Ok, I'm now able to do plot zooms with all combos of fil/hdf5 and Filterbank/Waterfall. So I think Issues #31 #32 #33 #34 are fixed in this commit.

Also the Py3 unit tests are passing! Woohoo!

telegraphic · 2018-06-06T04:48:15Z

Here is the output of code coverage:

Name                                            Stmts   Miss  Cover
-------------------------------------------------------------------
blimpy-1.2.0-py2.7.egg/blimpy/__init__.py          17      4    76%
blimpy-1.2.0-py2.7.egg/blimpy/dice.py             100    100     0%
blimpy-1.2.0-py2.7.egg/blimpy/fil2h5.py            36     21    42%
blimpy-1.2.0-py2.7.egg/blimpy/fil2hdf.py           59     59     0%
blimpy-1.2.0-py2.7.egg/blimpy/file_wrapper.py     389    150    61%
blimpy-1.2.0-py2.7.egg/blimpy/filterbank.py       571    252    56%
blimpy-1.2.0-py2.7.egg/blimpy/gup2hdf.py           61     50    18%
blimpy-1.2.0-py2.7.egg/blimpy/guppi.py            249    214    14%
blimpy-1.2.0-py2.7.egg/blimpy/h52fil.py            36     21    42%
blimpy-1.2.0-py2.7.egg/blimpy/match_fils.py        74     61    18%
blimpy-1.2.0-py2.7.egg/blimpy/sigproc.py          188    112    40%
blimpy-1.2.0-py2.7.egg/blimpy/utils.py             55      0   100%
blimpy-1.2.0-py2.7.egg/blimpy/waterfall.py        329    249    24%
-------------------------------------------------------------------
TOTAL                                            2164   1293    40%

Which is telling us what fraction of code gets executed by the unit tests.

We don't have any tests for dice yet, and fil2hdf is deprecated (shall we delete this?) Also no real tests of guppi raw data. So I think we can boost that number pretty easily.

jeenriquez

I have been removing the dependence of waterfall.py from filterbank.py. Please move this function back.

jeenriquez

Thanks Danny, this will do for now. We can think in the mid future on how to either make it more general, or just add each telescope at a time.

jeenriquez

I can't remember why I pulled these out from sigproc. Did you double check they where the same code?

telegraphic · 2018-06-06T23:19:01Z

Hey @jeenriquez , I think the pull request grew too large to see your specific comments?

Re: waterfall and filterbank dependece, how would you feel long-term about fully deprecating Filterbank in favor of Waterfall?

Re: sigproc calls in waterfall, was this to remove dependencies maybe? The earliest versions of Waterfall were pretty much stand-alone right?

jeenriquez · 2018-06-06T23:23:39Z

Hey @telegraphic, yeah, I was just looking into this, I thought I was giving comments for each individual commit. I'll make a summary and then give one single comment.

No need for this in filterbank -- use waterfall if you need this functionality.

jeenriquez · 2018-06-11T00:21:15Z

Hi Danny,

I had a look at all your changes. Quite a lot! and I think in general all looks very good!
I'm just wondering on the level of testing. For the 1.1.9 we made tests for several file types.
I like that you are adding more testing to Parkes data which was clearly needed. Have you also done some testing on GBT data? (besides the voyager data). This is not necessary for merging, but I would like to know so that one of us (you, me, or Matt) looks into it.

I would like to merge soon, since I think version is more stable that the current half way of the refactor. But not sure what happens then when all these comments. Could you reply to my comments below?

Consolidated blank_dc function

I have been removing the dependence of waterfall.py from filterbank.py. Please move this function back. (blanc_dc).

Removing duplicated code from sigproc.py:

I can't remember why I pulled these out from sigproc. Maybe I did to be able to edit it and avoid breaking the original. Did you double check they where the same code?

Follow-up on removal of gen_from_header.

Good question, I don’t think we are using this anymore. For now, unless is “on the way” we could keep it and just keep an eye.

Updating unit tests

This made me think that it would be good to have a monthly meeting to discuss the status of blimpy, as well as the new developments.

Updated to build h5py better

Could you explain this one to me. How does it work now? This may need to be updated in the readme.

More bytestring Py3 shennanigans

Wondering about the use of six here. If I understand this correctly that is the preferred option for python 2.5. https://docs.python.org/3/howto/pyporting.html
Not important now, but just thinking for the future to fully support python 3.

Py3 throws a hissy fit with None comparisons, making these more robust

Missing to change
if not init

Updates for #31 #32 #33 #34 plotting issues

The plotting issues are #30 to #33, this seems to be affecting the comments on the issues. Not sure if there is a way to fix it.

calc_n_coarse_channels not working at Parkes, changing logic to fix

Are you planning to make it work also for the other Parkes resolutions?

telegraphic · 2018-06-11T02:05:30Z

Hey Emilio, thanks for going through it all!

Tests on big files

We definitely need some tests with full-size data products from GBT + PKS. Shall we set up a dedicated test platform somewhere? Could be at GBT or PKS, or even in the cloud -- at Berkeley might make the most sense though?
Do you and Matt have scripts already at GBT that we could add in to the repository? I don't think we can/should add huge files to to the repository, so the Travis CI tests will have to only be on small files. I think we can set up this Jenkins thing to run tests automatically, but for starters I think we should copy some good test files to somewhere well-documented and accessible:

Old GBT data
New GBT data
Parkes single-pixel data
Parkes MB data

Data in HDF5 and FIL for all resolutions.

Consolidated blank_dc function

We can add this back for now. Maybe this is part of a bigger discussion about Filterbank vs Waterfall -- should we totally deprecate Filterbank? I am pretty pro-consolidation in general and Waterfall has more functionality. Here's the current state:

Waterfall inherits the functions from Filterbank:

read_hdf5 *               Not needed in waterfall (confusing to have)
read_filterbank *         Not needed in waterfall (confusing to have)
write_to_filterbank *     Confuses with Waterfall.write_to_fil
_setup_freqs
_setup_time_axis
_calc_extent
compute_lst **            Move to Waterfall?
compute_lsrk **           Move to Waterfall?
blank_dc **
generate_freqs
plot_spectrum 
plot_spectrum_min_max
plot_waterfall 
plot_time_series
plot_kurtosis
plot_all
calibrate_band_pass_N1 **

Things with ** are I think are functions that could be moved to Waterfall to keep Fliterbank lightweight. Things with * don't belong in Waterfall.

Waterfall overrides:

__init__
info
grab_data
write_to_hdf5

New methods that Waterfall adds:

read_data
populate_freqs
populate_timestamps
write_to_fil
calc_n_coarse_chan (via file_wrapper)

My thought is that we move everything apart from barebones functionality from Filterbank to Waterfall. We would remove read_hdf5, read_fil, write_to_hdf5 and write_to_filterbank from Filterbank, so writing is only supported by Waterfall. The read_hdf5 and read_fil functions were only added so you can do this:

a = Filterbank()
a.read_fil('voyager.fil')

instead of

a = Filterbank('voyager.fil')

But it seems like this isn't worth supporting anymore, so I would get rid of the read functions.

Removing duplicated code from sigproc.py:

I did double-check the code was the same.

Follow-up on removal of gen_from_header

It is a little useful to be able to take a header and data and make a Filterbank object, so we can keep the functionality for now but earmark it for future tidy-up or removal.

Updating unit tests

Agreed - we should schedule one in every 2 weeks, and cancel it if we have no changes to discuss (keep meetings to a minimum!). Or we could have a 'software stack' meeting?

Updated to build h5py better

This one is only for the Travis CI testing stuff. Basically, ubuntu puts the hdf5 libraries in an unusual place and bitshuffle / h5py don't always find them. The CFLAGS include just points to the location of the hdf5 libraries for inclusion when building.

More bytestring Py3 shennanigans

Agreed, I don't think we should support Py 2.5! Maybe not even Py 2.6? We might be able to get rid of six in the future, but at the moment it's being used to catch something that Py3 doesn't support, and to reroute that code so Py3 is happy.

Py3 throws a hissy fit with None comparisons, making these more robust

Good catch, changed. I really hate Py 3 for these, I wonder if there is a neater way to do it that I'm not aware of?

Updates for #31 #32 #33 NOT # 34 plotting issues

Oops, sorry, yeah I think this is pretty hard to change :(.

calc_n_coarse_channels not working at Parkes, changing logic to fix

This code probably needs some robust testing. Really would be nice to have N_COARSE_CHANS in the metadata headers :/.

Here's the code as it currently stands. I think it will fail with Parkes, but will raise an error making it clear that it has failed.

    def calc_n_coarse_chan(self):
        """ This makes an attempt to calculate the number of coarse channels in a given file.

            Note:
                This is unlikely to work on non-Breakthrough Listen data, as a-priori knowledge of
                the digitizer system is required.

        """
        nchans = int(self.header[b'nchans'])

        # Do we have a file with enough channels that it has coarse channelization?
        if nchans >= 2**20:
            # Does the common FFT length of 2^20 divide through without a remainder?
            # This should work for most GBT and all Parkes hires data
            if nchans % 2**20 == 0:
                n_coarse_chan = nchans // 2**20
                return n_coarse_chan
            # Early GBT data has non-2^N FFT length, check if it is GBT data
            elif self.header[b'telescope_id'] == 6:
                coarse_chan_bw = 2.9296875
                bandwidth = abs(self.f_stop - self.f_start)
                n_coarse_chan = int(bandwidth / coarse_chan_bw)
                return n_coarse_chan
            else:
                raise RuntimeError("Couldn't figure out n_coarse_chan")
        else:
            raise RuntimeError("This function currently only works for hires BL Parkes or GBT data.")

Refactor with recent changes

telegraphic added 29 commits June 3, 2018 20:27

Removed unneccesary _flat_file_dimension

2bdf80c

This functionality can be done simply with np.array.size, or np.prod(np.array.shape)

Modified calc_n_coarse_channels to support GBT and Parkes

c51574b

EE: Please check this does what you think it should.

Updated calc_n_blobs for readability

6259aba

PEP8: Classes FIL_reader and H5_reader -> FilReader and H5Reader

7fe834c

(CamelCase for classes in PEP8)

Removing duplicated code from sigproc.py: header_keyword_types

c414291

Removing duplicated code from sigproc.py: fil_double_to_angle

2ff2de9

Removing duplicated code from sigproc.py: read_next_header_keyword

1ee6cbc

Removing duplicated code from sigproc.py: read_header

19e04dc

Removing duplicated code from sigproc.py: len_header

2f23a4e

Removed _ from read_header in H5 and Fil Readers

79d628b

Also made read_header return the parsed dictionary, for consistency with FilReader.read_header() Now both H5 and Fil readers return a dictionary, which is useful.

Shuffled order of methods so _methods first, then populate, etc

11bdc7a

Renamed _get_n_ints_in_file to _setup_n_ints_in_file for consistency

2e63830

Other methods such as _setup_freqs and _setup_chans

Added a docstring and propagate return_idxs in read_header

d83f7a2

Delete unused struct import

2a3721e

Remove unused astropy units import

5ec6a05

Typo and sanity check on read_header

55909fd

Adding basic test of Waterfall() reader

e5c0684

Modify plot_time_series so unit test passes.

6f5281d

plot_time_series was failing if orientation=None

Minor tidy-up for PEP8

0f75684

We are not strictly following PEP8 but fixing some simple things

Removing unused imports and adding fail-over for non-package usage

eac357b

Removed gen_from_header function in filterbank.py

848e19c

This function was a helper, that was never used and confusing. No functionality has been lost.

Removed unsupported keyword arguments in read_hdf5

789cafe

_find_blob_start tidy: doesn't actually use blob_dim in fil reader

602ee31

although blob_dim is a keyword argument, it isn't actually used.

Added new tests in test_compare_voyager

fd82448

Follow-up on removal of gen_from_header.

a38b45d

Not sure we still need / use this functionality (generation of a filterbank file from a header and data blob), perhaps it could be removed in the future.

Added deprecation notes in docstrings.

9445558

Bumped version number

b964099

telegraphic requested a review from jeenriquez June 3, 2018 14:39

telegraphic added 8 commits June 6, 2018 00:18

Py3 Print parenthesis.

ed252c4

Added unit tests for utils

09e244b

Unit tests now passing for Waterfall

269f060

Added unpack unit tests

f341299

Fixed up unpack code and added unittests

1ef38cf

Updated waterfall.py grab_data to fix UCBerkeleySETI#31 UCBerkeleySET…

9a969f1

…I#32 UCBerkeleySETI#33 UCBerkeleySETI#34

utils.py coverage now at 100%

49285f2

More Py3 parentheses

f006d5c

telegraphic added 5 commits June 6, 2018 20:20

First attempt to add code coverage

5b81eb3

First attempt to add code coverage

7a687e7

First attempt to add code coverage

acd7003

Typo fix pin -> pip

4a1caec

Adding coveralls.io badge

fcec275

jeenriquez suggested changes Jun 6, 2018

View reviewed changes

jeenriquez approved these changes Jun 6, 2018

View reviewed changes

jeenriquez reviewed Jun 6, 2018

View reviewed changes

telegraphic added 3 commits June 8, 2018 21:42

calc_n_coarse_channels not working at Parkes, changing logic to fix

ee284c8

Removing calc_n_coarse_chan from filterbank

16591fc

No need for this in filterbank -- use waterfall if you need this functionality.

Removing unused arg from blob_start call

07dfc19

None comparison edit in response to jeenriquez code review

a2d668a

jeenriquez merged commit 9b047ef into UCBerkeleySETI:master Jun 11, 2018

FX196 pushed a commit to FX196/blimpy that referenced this pull request Jul 6, 2019

Merge pull request UCBerkeleySETI#36 from telegraphic/master

19596c2

Refactor with recent changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor with recent changes #36

Refactor with recent changes #36

telegraphic commented Jun 3, 2018

telegraphic commented Jun 6, 2018

telegraphic commented Jun 6, 2018

jeenriquez left a comment

jeenriquez left a comment

jeenriquez left a comment

telegraphic commented Jun 6, 2018

jeenriquez commented Jun 6, 2018 •

edited

Loading

jeenriquez commented Jun 11, 2018

telegraphic commented Jun 11, 2018

Refactor with recent changes #36

Refactor with recent changes #36

Conversation

telegraphic commented Jun 3, 2018

telegraphic commented Jun 6, 2018

telegraphic commented Jun 6, 2018

jeenriquez left a comment

Choose a reason for hiding this comment

jeenriquez left a comment

Choose a reason for hiding this comment

jeenriquez left a comment

Choose a reason for hiding this comment

telegraphic commented Jun 6, 2018

jeenriquez commented Jun 6, 2018 • edited Loading

jeenriquez commented Jun 11, 2018

Consolidated blank_dc function

Removing duplicated code from sigproc.py:

Follow-up on removal of gen_from_header.

Updating unit tests

Updated to build h5py better

More bytestring Py3 shennanigans

Py3 throws a hissy fit with None comparisons, making these more robust

Updates for #31 #32 #33 #34 plotting issues

calc_n_coarse_channels not working at Parkes, changing logic to fix

telegraphic commented Jun 11, 2018

Tests on big files

Consolidated blank_dc function

Removing duplicated code from sigproc.py:

Follow-up on removal of gen_from_header

Updating unit tests

Updated to build h5py better

More bytestring Py3 shennanigans

Py3 throws a hissy fit with None comparisons, making these more robust

Updates for #31 #32 #33 NOT # 34 plotting issues

calc_n_coarse_channels not working at Parkes, changing logic to fix

jeenriquez commented Jun 6, 2018 •

edited

Loading