Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update google storage #601

Merged
merged 13 commits into from
Sep 20, 2018
Merged

Update google storage #601

merged 13 commits into from
Sep 20, 2018

Conversation

wtgee
Copy link
Member

@wtgee wtgee commented Sep 17, 2018

Part of #599, depends on #600.

  • Storage relies on gcloud authentication (Update google instructions #600) removal of previous env variable
  • Methods added:
    • upload renamed to upload_file
    • get_file_blob / get_file_blobs - returns blob data
    • download_file - get the actual file
    • lookup_fits_header stream just the fits header from the storage
  • Utility functions:
    • upload_observation_to_bucket - uses gsutil rather than the underlying gcloud because that allows for easy parallelization.

ToDo: Figure out how to mock tests. Current tests are designed for a live system and actually communicate with the network (requires flag to test).

@wtgee wtgee requested a review from jamessynge September 17, 2018 00:39
@wtgee wtgee mentioned this pull request Sep 17, 2018
13 tasks
@wtgee
Copy link
Member Author

wtgee commented Sep 17, 2018

Mocking suggestions from @jamessynge

Does this require "real" cloud storage, or do you have support for a mock somewhere?
FWIW:

Copy link
Contributor

@jamessynge jamessynge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, more later


@pytest.fixture(scope="module")
def auth_key():
local_config = load_config('pocs_local', ignore_local=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pocs_local AND ignore_local? If intended to test load_config, then I think the test belongs elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now rely on authentication being done for the system as a whole. As such, I am removing the ability to to pass an auth_key to the PanStorage object directly. That also then makes some of this testing infrastructure disappear.

)


@pytest.fixture(scope="module")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not make this a function scope fixture and have it take the existing config fixture as a param?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

def test_unit_id(storage):
assert storage.unit_id is not None
# TODO(wtgee)Verify the unit id better after #384 is done.
assert re.match(r'PAN\d\d\d', storage.unit_id), storage.logger.error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is being tested here? Shouldn't we just ensure for testing that we provide a config that has the correct contents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, test removed as not appropriate.



@pytest.fixture(scope="function")
def storage(auth_key):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't help but think that there should be a way that we can have an auth_key that is not in github for all the world to see, but that can be used by each tester for accessing the test bucket without having to use the same credentials that give access to the primary bucket. Of course, maybe you've already addressed this.

If not, travis-ci does support secret values being passed into tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a generic testing auth_key. I saw reference to that somewhere, will try to find.

@@ -0,0 +1,111 @@
import pytest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments please?

will not want people to delete things. However it's good to test and we
want to remove the above files
"""
remote_path = '{}/pong.txt'.format(storage.unit_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd very much like to avoid depending on tests running serially and in a specific order, unless there is a fixture that is used to ensure that happens. I sometimes run using the xdist or random plugins in order to try find mistakes where our tests are non-hermetic.

"""
Note: The util wrappers don't provide a way to delete because we generally
will not want people to delete things. However it's good to test and we
want to remove the above files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both files aren't removed.

remote_path (str|`google.cloud.storage.blob.Blob`): Blob or path to remote blob.
If just the blob name is given then file will be downloaded.
save_dir (str, optional): Path for saved file, defaults to current directory.
force (bool, optional): Force a download even if file exists locally, default False.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps call this overwrite, which we use elsewhere?

Returns:
str: Path to local (uncompressed) FITS file
"""
# Get blob object if just a path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally a nit: Please end sentences with a period.

See https://fits.gsfc.nasa.gov/fits_primer.html for overview of FITS format.

Args:
remote_blob (google.cloud.storage.blob.Blob): Blob object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not allow a string as done in the prior method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thanks.

# Get a header card
start_byte = 2880 * (i - 1)
end_byte = (2880 * i) - 1
b_string = remote_blob.download_as_string(start=start_byte, end=end_byte)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you know that the FITS header is usually just one block, I'd be inclined to simply download the first N blocks of size 2880 bytes (e.g. 50 blocks), then operate on each of those blocks (or really treat it as a longer sequence of 80 byte lines. This will very likely be faster than downloading two 2880 blocks.

Or test to see if I'm right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that the first block is the compression header information (for .fz packed files) so we are skipping that block and starting at block 2. After that we don't know how many blocks it will be.

I'll add some comments so it's clear what is trying to be done then we can figure out if that is what actually should be done. :)

pocs/utils/google/storage.py Show resolved Hide resolved
@wtgee wtgee merged commit d111904 into panoptes:develop Sep 20, 2018
@wtgee wtgee deleted the update-google-storage branch September 20, 2018 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants