Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added supplement versions table #85

Merged
merged 6 commits into from
Feb 16, 2021
Merged

Added supplement versions table #85

merged 6 commits into from
Feb 16, 2021

Conversation

javierggt
Copy link
Contributor

@javierggt javierggt commented Feb 11, 2021

Description

This PR adds two tables to the supplement:

  • versions
  • last_updated

They have the same structure. Each is a one-row table with string dtype. It is easy to remove the last_updated table if you prefer not to have it.

I thought of the following options:

  • saving as meta data for each table. Jean wanted a global version as well, plus there should be a method to retrieve versions, and we already have utils.get_supplement_table
  • saving in a single table, with one row for each existing table, one column for version and another for last_updated. This is fine. It just meant the user code had to jump through some hoops to get the version.

Testing

  • Passes unit tests on MacOS, linux, Windows (at least one required)
  • Functional testing. You can run this:
    (ska3-flight-2021.2rc4) javierg agasc $ export AGASC_DIR=`pwd`/test
    (ska3-flight-2021.2rc4) javierg agasc $ ipython
    Python 3.8.3 (default, Jul  2 2020, 11:26:31) 
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.
    histogram not found
    
    In [1]: import logging 
       ...: logging.basicConfig(level='DEBUG') 
       ...: import agasc 
       ...: from agasc.supplement.utils import save_version, get_supplement_table 
       ...: from agasc.supplement.utils import save_version, get_supplement_table 
       ...: filename = 'test/agasc_supplement.h5' 
       ...: save_version(filename, obs=agasc.__version__, mags=agasc.__version__, bad=agasc.__version__) 
       ...:                                                                                                                                                      
    DEBUG:agasc.supplement:Creating agasc supplement table "versions"
    DEBUG:agasc.supplement:Creating agasc supplement table "last_updated"
    DEBUG:agasc.supplement:Adding "obs" to agasc supplement "versions" table
    DEBUG:agasc.supplement:Adding "obs" to agasc supplement "last_updated" table
    DEBUG:agasc.supplement:Adding "mags" to agasc supplement "versions" table
    DEBUG:agasc.supplement:Adding "mags" to agasc supplement "last_updated" table
    DEBUG:agasc.supplement:Adding "bad" to agasc supplement "versions" table
    DEBUG:agasc.supplement:Adding "bad" to agasc supplement "last_updated" table
    DEBUG:agasc.supplement:Adding "supplement" to agasc supplement "versions" table
    DEBUG:agasc.supplement:Adding "supplement" to agasc supplement "last_updated" table
    In [2]: get_supplement_table('versions')                                                                                                                     
    Out[2]: 
    <Table length=1>
             obs                   mags                  bad                supplement     
           bytes32               bytes32               bytes32               bytes32       
    --------------------- --------------------- --------------------- ---------------------
    4.10.3.dev89+g8c21663 4.10.3.dev89+g8c21663 4.10.3.dev89+g8c21663 4.10.3.dev89+g8c21663
    
    In [3]: get_supplement_table('last_updated')                                                                                                                 
    Out[3]: 
    <Table length=1>
              obs                     mags                    bad                  supplement      
            bytes32                 bytes32                 bytes32                 bytes32        
    ----------------------- ----------------------- ----------------------- -----------------------
    2021-02-11 19:11:48.412 2021-02-11 19:11:48.412 2021-02-11 19:11:48.412 2021-02-11 19:11:48.412
    

Fixes #74

Copy link
Member

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One actual bug (I think), an API suggestion and ideas for simplifying / reducing the code (less is more!). Mostly looks good though.

versions = _load_or_create(filename, 'versions')
last_updated = _load_or_create(filename, 'last_updated')

time = CxoTime().iso
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CxoTime.now() is preferred for being more explicit. The CxoTime() equivalent is there only for DateTime compatibility.

Also, I'd suggest setting the time precision to 0 so it just shows to the nearest second (don't need millisec in the date).

time = CxoTime.now()
time.precision = 0
# Then later just use last_updated[key] = time.iso

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check, but I think you can reduce much of the code in the function code to basically:

logger.debug(f'Updating {key} in "versions" and "last_updated" tables')
for key, value in kwargs.items():
   versions[key] = [value]
   last_updated[key] = [time]

You don't need the 'unknown', nor need to pre-declare as columns with a dtype. The key point is that astropy will add in the column or else replace it as the case may be. And I think that the HDF5 writing step will just overwrite whatever is there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use CxoTime.now and set precision.

You don't need ... to pre-declare as columns with a dtype.

well, I have gotten upset by astropy changing the types when saving/loading fits files (which might be caused by fits, but still) so I decided to have all table types explicit. Anyway, I changed it according to the suggestion.

versions = get_supplement_table('versions')

:param filename: pathlib.Path
:param kwargs: dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function is always being called (except in testing) with save_version(filename, <table_name>=agasc.__version__), maybe just simplify to save_version(filename, table_name)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I thought of that, and I also though of having a default argument. I can remove the argument because it makes it simpler, and one can always add it if you want to use a custom version string.

Are you sure you will not want a custom version string? I changed it according to the suggestion, but one can still undo that commit.

@javierggt
Copy link
Contributor Author

I just added changes according to all these comments. I hope this is it, because I rebase the branch in #86 in my local copy, fixed conflicts, and started doing some refactoring.

@jeanconn
Copy link
Contributor

Sure. I had been thinking one table would be fine, but two is also fine.

if not bad_star_ids:
logger.info('Nothing to update')
return

if not Path(suppl_file).exists():
if not suppl_file.exists():
if not create:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm if we have a create flag, the logger message might not be "warning". But that's a nit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right. I didn't think of that. I just left the message that was there (which I had made inconsistent because it said "creating new file" and then raised an exception)

Copy link
Member

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!!

@taldcroft
Copy link
Member

@javierggt - I'm leaving to you to merge just to be sure.

@javierggt
Copy link
Contributor Author

Hey @taldcroft,

I had changed the call to Table.write as you suggested, but now I get this error which results from passing a pathlib.Path instead of a string:

In [1]: from pathlib import Path 
   ...: from astropy.table import Table 
   ...: t = Table([{'a':1, 'b':2}]) 
   ...: filename = Path('whatever.h5') 
   ...: t.write(filename, format='hdf5', path='versions', append=True, overwrite=True) 
   ...:                                                                                                                                                      
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-1045fcc97cd7> in <module>
      3 t = Table([{'a':1, 'b':2}])
      4 filename = Path('whatever.h5')
----> 5 t.write(filename, format='hdf5', path='versions', append=True, overwrite=True)

~/SAO/miniconda3-shiny/envs/ska3-flight-2021.2rc4/lib/python3.8/site-packages/astropy/table/connect.py in __call__(self, serialize_method, *args, **kwargs)
    125         instance = self._instance
    126         with serialize_method_as(instance, serialize_method):
--> 127             registry.write(instance, *args, **kwargs)

~/SAO/miniconda3-shiny/envs/ska3-flight-2021.2rc4/lib/python3.8/site-packages/astropy/io/registry.py in write(data, format, *args, **kwargs)
    561 
    562     writer = get_writer(format, data.__class__)
--> 563     writer(data, *args, **kwargs)
    564 
    565 

~/SAO/miniconda3-shiny/envs/ska3-flight-2021.2rc4/lib/python3.8/site-packages/astropy/io/misc/hdf5.py in write_table_hdf5(table, output, path, compression, append, overwrite, serialize_meta, **create_dataset_kwargs)
    314     else:
    315 
--> 316         raise TypeError('output should be a string or an h5py File or '
    317                         'Group object')
    318 

TypeError: output should be a string or an h5py File or Group object

When you told me, I tried, it worked and I though "ah, well, maybe I got it wrong"... but no.

@taldcroft
Copy link
Member

It looks like a bug in astropy, but if you remove the format='hdf5' then it works. Go figure. Or you can put the str() back in, sorry for the noise.

@taldcroft
Copy link
Member

I checked and this is still a problem in astropy master. @javierggt - can you file a bug report?

@javierggt javierggt merged commit 2411e20 into master Feb 16, 2021
@taldcroft taldcroft deleted the suppl-version branch February 16, 2021 21:58
@javierggt javierggt mentioned this pull request Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a version to the supplement
3 participants