Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference implementation of archive format from issue #15 #57

Merged
merged 8 commits into from
Aug 15, 2017
Merged

Reference implementation of archive format from issue #15 #57

merged 8 commits into from
Aug 15, 2017

Conversation

djanderson
Copy link
Contributor

This (rather large, sorry!) PR adds a reference implementation for the single file archive format suggested in issue #15.

Since I've added a not-insignificant amount of code, I first beefed up the unit testing capability of the reference implementation. Specifically, I've taken advantage of some features of pytest, since the existing tests looked to be pytest style. I also add a tox.ini config file so that running tox in the repo root runs pytest against all supported python versions in a virtualenv. tox -e coverage gives us an idea of how well the reference implementation is tested (currently 80%).

Regarding the actual archive capability, if a SigMFFile holds valid metadata and data_file is set, then calling the new archive method will create a .sigmf file in accordance with the spec. The archive method takes name and fileobj parameters that attempt to be reasonably consistent with python's tarfile module.

https://docs.python.org/3.5/library/tarfile.html#tarfile-objects

Lastly, to go in the other direction, I've added the function sigmffile.fromarchive. Given a valid sigmf archive path and a directory, extract the sigmf archive to that directory and return a SigMFFile instance with the archive's metadata loaded and data_file pointing to the extracted data.

Please review!

Try:
  `pytest` in the repo root
  `python setup.py test` in the repo root runs pytest
  `tox` in the repo root runs pytest in a virtualenv against all
        intalled interpreters
  `tox -e coverage` in the repo root produces a test coverage report
Copy link
Contributor

@bhilburn bhilburn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large PR, but I read through a most of it. Each time I thought I found an issue, I discovered it had actually been handled, hah. Bravo.

I only have one question, possibly applicable in two places, and it's fairly minor.

Thanks so much for the excellent PR, @djanderson, and for keeping the actual implementation up-to-speed with the spec.

sigmf/archive.py Outdated
else:
has_ext = self.name.endswith(SIGMF_ARCHIVE_EXT)
name = self.name if has_ext else self.name + SIGMF_ARCHIVE_EXT
sigmf_fd = open(name, "wb")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above re: catching permission errors on file open.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could definitely throw an error, so it might be nice to catch it. If this were a command-line utility, I would definitely catch it and print out the error and then exit, but since it's designed to be imported by another program, the polite thing to do is let exceptions raise (rise?). We could catch it and wrap it in one of the SigMFError exception classes I created, but the few cases where I used those I didn't use them to catch and wrap existing exceptions, I just didn't want to raise something generic like RuntimeError. My personal feeling is that just letting the original error raise is a bit cleaner than wrapping it and re-raising. Especially in python3 where you will get the traceback chain with the original error anyways!

>>> t = tempfile.mkstemp()
>>> t
(3, '/tmp/tmpl4smirjf')
>>> f = open(t[1], mode="r")
>>> f.writable()
False
>>> f.write("stuff")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: not writable
>>> try:
...   f.write("stuff")
... except Exception as err:
...   raise RuntimeError(err)  # pretending this is SigMFFileError or something
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
io.UnsupportedOperation: not writable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
RuntimeError: not writable
>>> 

I dunno, considering we will need to raise anyways, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have a good point, and I would prefer that we don't create new exception classes for the sole purpose of renaming them to SigMF exceptions.

My one concern with the above, though, is that if this code is being used more indirectly by a user, it may not be clear to them which file was not writable, and the Python3 traceback isn't immediately useful for identifying which filepath was the problem.

So, how about passing on the built-in exception, but printing a bit more information about which file the user needs to fix?

sigmf_data_filename = archive_name + SIGMF_DATASET_EXT
sigmf_data_path = os.path.join(tmpdir, sigmf_data_filename)

with open(sigmf_md_path, "w") as mdfile:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we wrap this with try/catch in case we don't have write permissions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, sigmf_md_path is being created under tmpdir, which is guaranteed to be "readable, writable, and searchable only by the creating user ID." (mkdtemp), so I think we're 100% safe here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, good call.

@bhilburn bhilburn self-assigned this Jun 21, 2017
If a SigMFFile holds valid metadata and a data_file path, then calling
the `archive` method tars metadata and data according to the spec, and
returns the path to the archive.

This commit also adds a function: `sigmffile.fromarchive`. If passed a
valid sigmf archive and a path, extract the archive to that path and
return a SigMFFile object holding the archive's metadata and pointing
to the extracted `data_file`.
…useful error

Also refactor for readability (chop up a few larger functions) and
correctness (don't use "fd" when we're working with file-like objects, not file descriptors)
@djanderson
Copy link
Contributor Author

@bhilburn, I addressed the issue by catching the unhelpful error from open and reraising a SigMFFileError with a more helpful description of what went wrong. I did that instead of printing the information, since in my use-case of SigMF, anything that's not logged or raised as an exception (and subsequently logged) is not going to be seen.

Since I did some refactoring in the same commit, I'll highlight the main changes:

In archive.py, if a fileobj is passed, test that it's open and writable. The fileobj.write(byte()) feels a bit hacky, but I couldn't find a cleaner way to do this across all the myriad file-like object types in both py2 and py3 🤷‍♂️

    def _get_output_fileobj(self):
        try:
            fileobj = self._get_open_fileobj()
        except:
            if self.fileobj:
                e = "fileobj {!r} is not byte-writable".format(self.fileobj)
            else:
                e = "can't open {!r} for writing".format(self.name)

            raise error.SigMFFileError(e)

        return fileobj

    def _get_open_fileobj(self):
        if self.fileobj:
            fileobj = self.fileobj
            fileobj.write(bytes())  # force exception if not byte-writable
        else:
            fileobj = open(self.name, "wb")

        return fileobj

In test_archive.py, I've added the falling two unit tests, which ensure that an unwritable name or fileobj input raise the appropriate exception:

def test_unwritable_fileobj_throws_fileerror(test_sigmffile):
    with tempfile.NamedTemporaryFile(mode="rb") as t:
        with pytest.raises(error.SigMFFileError):
            test_sigmffile.archive(fileobj=t)


def test_unwritable_name_throws_fileerror(test_sigmffile):
    unwritable_file = "/root/unwritable.sigmf"  # assumes root is unwritable
    with pytest.raises(error.SigMFFileError):
        test_sigmffile.archive(name=unwritable_file)

@djanderson
Copy link
Contributor Author

@bhilburn, was there anything left you wanted me to clean up here? We've been using this code locally and it's working well.

@bhilburn
Copy link
Contributor

bhilburn commented Aug 15, 2017

@djanderson - Nope! Sorry, was just a matter of me getting the time to sit down and focus. In short, it was my fault =)

This PR is excellent, by the way, and represents the first major contribution to the public upstream. Great work.

@bhilburn bhilburn merged commit d103635 into sigmf:master Aug 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants