Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor with recent changes #36

Merged
merged 66 commits into from
Jun 11, 2018
Merged

Conversation

telegraphic
Copy link
Contributor

There is quite a lot in here!

The main change is creation of a generic Reader class, that subclasses H5 and FIL readers. This is on the same lines as the earlier subclass of Filterbank by Waterfall, and again is driven to avoid code duplication and to use OOP design patterns.

I also added a new test, which runs through and compares the Waterfall loaded Voyager data in both fil and h5 formats.

The rest of the changes are docstrings/typo/readability changes.

This function was in both Filterbank and Waterfall classes, but was
slightly different. The waterfall invocation was marginally better,
albeit pretty much identical. This has been moved to Filterbank only,
the Waterfall class will still have access through inheritance.
This functionality can be done simply with np.array.size, or np.prod(np.array.shape)
EE: Please check this does what you think it should.
Also made read_header return the parsed dictionary, for consistency
with FilReader.read_header()

Now both H5 and Fil readers return a dictionary, which is useful.
Other methods such as _setup_freqs and _setup_chans
At top setting
plt.rcParams['axes.formatter.useoffset'] = False

instead of calling
ax.get_xaxis().get_major_formatter().set_useOffset(False)

This means fewer lines of code, but should be more resilient to
matplotlib versions that don't support this API.
plot_time_series was failing if orientation=None
We are not strictly following PEP8 but fixing some simple things
This function was a helper, that was never used and confusing. No
functionality has been lost.
although blob_dim is a keyword argument, it isn't actually used.
Not sure we still need / use this functionality (generation of a
filterbank file from a header and data blob), perhaps it could be
removed in the future.
@telegraphic telegraphic requested a review from jeenriquez June 3, 2018 14:39
@telegraphic
Copy link
Contributor Author

Ok, I'm now able to do plot zooms with all combos of fil/hdf5 and Filterbank/Waterfall. So I think Issues #31 #32 #33 #34 are fixed in this commit.

Also the Py3 unit tests are passing! Woohoo!

screen shot 2018-06-06 at 2 26 49 pm

@telegraphic
Copy link
Contributor Author

Here is the output of code coverage:

Name                                            Stmts   Miss  Cover
-------------------------------------------------------------------
blimpy-1.2.0-py2.7.egg/blimpy/__init__.py          17      4    76%
blimpy-1.2.0-py2.7.egg/blimpy/dice.py             100    100     0%
blimpy-1.2.0-py2.7.egg/blimpy/fil2h5.py            36     21    42%
blimpy-1.2.0-py2.7.egg/blimpy/fil2hdf.py           59     59     0%
blimpy-1.2.0-py2.7.egg/blimpy/file_wrapper.py     389    150    61%
blimpy-1.2.0-py2.7.egg/blimpy/filterbank.py       571    252    56%
blimpy-1.2.0-py2.7.egg/blimpy/gup2hdf.py           61     50    18%
blimpy-1.2.0-py2.7.egg/blimpy/guppi.py            249    214    14%
blimpy-1.2.0-py2.7.egg/blimpy/h52fil.py            36     21    42%
blimpy-1.2.0-py2.7.egg/blimpy/match_fils.py        74     61    18%
blimpy-1.2.0-py2.7.egg/blimpy/sigproc.py          188    112    40%
blimpy-1.2.0-py2.7.egg/blimpy/utils.py             55      0   100%
blimpy-1.2.0-py2.7.egg/blimpy/waterfall.py        329    249    24%
-------------------------------------------------------------------
TOTAL                                            2164   1293    40%

Which is telling us what fraction of code gets executed by the unit tests.

We don't have any tests for dice yet, and fil2hdf is deprecated (shall we delete this?) Also no real tests of guppi raw data. So I think we can boost that number pretty easily.

Copy link
Contributor

@jeenriquez jeenriquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been removing the dependence of waterfall.py from filterbank.py. Please move this function back.

Copy link
Contributor

@jeenriquez jeenriquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Danny, this will do for now. We can think in the mid future on how to either make it more general, or just add each telescope at a time.

Copy link
Contributor

@jeenriquez jeenriquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember why I pulled these out from sigproc. Did you double check they where the same code?

@telegraphic
Copy link
Contributor Author

Hey @jeenriquez , I think the pull request grew too large to see your specific comments?

Re: waterfall and filterbank dependece, how would you feel long-term about fully deprecating Filterbank in favor of Waterfall?

Re: sigproc calls in waterfall, was this to remove dependencies maybe? The earliest versions of Waterfall were pretty much stand-alone right?

@jeenriquez
Copy link
Contributor

jeenriquez commented Jun 6, 2018

Hey @telegraphic, yeah, I was just looking into this, I thought I was giving comments for each individual commit. I'll make a summary and then give one single comment.

@jeenriquez
Copy link
Contributor

Hi Danny,

I had a look at all your changes. Quite a lot! and I think in general all looks very good!
I'm just wondering on the level of testing. For the 1.1.9 we made tests for several file types.
I like that you are adding more testing to Parkes data which was clearly needed. Have you also done some testing on GBT data? (besides the voyager data). This is not necessary for merging, but I would like to know so that one of us (you, me, or Matt) looks into it.

I would like to merge soon, since I think version is more stable that the current half way of the refactor. But not sure what happens then when all these comments. Could you reply to my comments below?

Consolidated blank_dc function

I have been removing the dependence of waterfall.py from filterbank.py. Please move this function back. (blanc_dc).

Removing duplicated code from sigproc.py:

I can't remember why I pulled these out from sigproc. Maybe I did to be able to edit it and avoid breaking the original. Did you double check they where the same code?

Follow-up on removal of gen_from_header.

Good question, I don’t think we are using this anymore. For now, unless is “on the way” we could keep it and just keep an eye.

Updating unit tests

This made me think that it would be good to have a monthly meeting to discuss the status of blimpy, as well as the new developments.

Updated to build h5py better

Could you explain this one to me. How does it work now? This may need to be updated in the readme.

More bytestring Py3 shennanigans

Wondering about the use of six here. If I understand this correctly that is the preferred option for python 2.5. https://docs.python.org/3/howto/pyporting.html
Not important now, but just thinking for the future to fully support python 3.

Py3 throws a hissy fit with None comparisons, making these more robust

Missing to change
if not init

Updates for #31 #32 #33 #34 plotting issues

The plotting issues are #30 to #33, this seems to be affecting the comments on the issues. Not sure if there is a way to fix it.

calc_n_coarse_channels not working at Parkes, changing logic to fix

Are you planning to make it work also for the other Parkes resolutions?

@telegraphic
Copy link
Contributor Author

Hey Emilio, thanks for going through it all!

Tests on big files

We definitely need some tests with full-size data products from GBT + PKS. Shall we set up a dedicated test platform somewhere? Could be at GBT or PKS, or even in the cloud -- at Berkeley might make the most sense though?
Do you and Matt have scripts already at GBT that we could add in to the repository? I don't think we can/should add huge files to to the repository, so the Travis CI tests will have to only be on small files. I think we can set up this Jenkins thing to run tests automatically, but for starters I think we should copy some good test files to somewhere well-documented and accessible:

  • Old GBT data
  • New GBT data
  • Parkes single-pixel data
  • Parkes MB data

Data in HDF5 and FIL for all resolutions.

Consolidated blank_dc function

We can add this back for now. Maybe this is part of a bigger discussion about Filterbank vs Waterfall -- should we totally deprecate Filterbank? I am pretty pro-consolidation in general and Waterfall has more functionality. Here's the current state:

Waterfall inherits the functions from Filterbank:

read_hdf5 *               Not needed in waterfall (confusing to have)
read_filterbank *         Not needed in waterfall (confusing to have)
write_to_filterbank *     Confuses with Waterfall.write_to_fil
_setup_freqs
_setup_time_axis
_calc_extent
compute_lst **            Move to Waterfall?
compute_lsrk **           Move to Waterfall?
blank_dc **
generate_freqs
plot_spectrum 
plot_spectrum_min_max
plot_waterfall 
plot_time_series
plot_kurtosis
plot_all
calibrate_band_pass_N1 **

Things with ** are I think are functions that could be moved to Waterfall to keep Fliterbank lightweight. Things with * don't belong in Waterfall.

Waterfall overrides:

__init__
info
grab_data
write_to_hdf5

New methods that Waterfall adds:

read_data
populate_freqs
populate_timestamps
write_to_fil
calc_n_coarse_chan (via file_wrapper)

My thought is that we move everything apart from barebones functionality from Filterbank to Waterfall. We would remove read_hdf5, read_fil, write_to_hdf5 and write_to_filterbank from Filterbank, so writing is only supported by Waterfall. The read_hdf5 and read_fil functions were only added so you can do this:

a = Filterbank()
a.read_fil('voyager.fil')

instead of

a = Filterbank('voyager.fil')

But it seems like this isn't worth supporting anymore, so I would get rid of the read functions.

Removing duplicated code from sigproc.py:

I did double-check the code was the same.

Follow-up on removal of gen_from_header

It is a little useful to be able to take a header and data and make a Filterbank object, so we can keep the functionality for now but earmark it for future tidy-up or removal.

Updating unit tests

Agreed - we should schedule one in every 2 weeks, and cancel it if we have no changes to discuss (keep meetings to a minimum!). Or we could have a 'software stack' meeting?

Updated to build h5py better

This one is only for the Travis CI testing stuff. Basically, ubuntu puts the hdf5 libraries in an unusual place and bitshuffle / h5py don't always find them. The CFLAGS include just points to the location of the hdf5 libraries for inclusion when building.

More bytestring Py3 shennanigans

Agreed, I don't think we should support Py 2.5! Maybe not even Py 2.6? We might be able to get rid of six in the future, but at the moment it's being used to catch something that Py3 doesn't support, and to reroute that code so Py3 is happy.

Py3 throws a hissy fit with None comparisons, making these more robust

Good catch, changed. I really hate Py 3 for these, I wonder if there is a neater way to do it that I'm not aware of?

Updates for #31 #32 #33 NOT # 34 plotting issues

Oops, sorry, yeah I think this is pretty hard to change :(.

calc_n_coarse_channels not working at Parkes, changing logic to fix

This code probably needs some robust testing. Really would be nice to have N_COARSE_CHANS in the metadata headers :/.

Here's the code as it currently stands. I think it will fail with Parkes, but will raise an error making it clear that it has failed.

    def calc_n_coarse_chan(self):
        """ This makes an attempt to calculate the number of coarse channels in a given file.

            Note:
                This is unlikely to work on non-Breakthrough Listen data, as a-priori knowledge of
                the digitizer system is required.

        """
        nchans = int(self.header[b'nchans'])

        # Do we have a file with enough channels that it has coarse channelization?
        if nchans >= 2**20:
            # Does the common FFT length of 2^20 divide through without a remainder?
            # This should work for most GBT and all Parkes hires data
            if nchans % 2**20 == 0:
                n_coarse_chan = nchans // 2**20
                return n_coarse_chan
            # Early GBT data has non-2^N FFT length, check if it is GBT data
            elif self.header[b'telescope_id'] == 6:
                coarse_chan_bw = 2.9296875
                bandwidth = abs(self.f_stop - self.f_start)
                n_coarse_chan = int(bandwidth / coarse_chan_bw)
                return n_coarse_chan
            else:
                raise RuntimeError("Couldn't figure out n_coarse_chan")
        else:
            raise RuntimeError("This function currently only works for hires BL Parkes or GBT data.")

@jeenriquez jeenriquez merged commit 9b047ef into UCBerkeleySETI:master Jun 11, 2018
FX196 pushed a commit to FX196/blimpy that referenced this pull request Jul 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants