-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support HEALpix-indexed AGASC HDF5 files and more #155
Conversation
@jskrist @jeanconn - I changed the file selection logic and added a bunch of docs and examples. I'm worried this is too complicated or overdesigned given that it somewhat requires pseudo-code to explain. That's always a bad sign. But I couldn't come up with something simple that handles our key use cases:
Note that this version requires that MATLAB set the |
I agree that changing the default to be the proseco agasc makes sense. This would be Changing the default would then lead to the question of whether to stop making the miniagasc. Here are the columns that get dropped. I suspect there are no applications that really need those columns which could not use the full AGASC.
On the MATLAB side, thanks @jeanconn for pointing out this UI which I didn't know about. So one question I have is how the list of available file names is generated and where the default directory is defined. If MATLAB ends up calling Python to get the actual file path (via And it is a good point that if MATLAB wants to maintain a GUI menu with all available AGASC files then that will require a code change for each release and therefore there is no point in adding a "latest release" option. That said, the current available items only include the latest release, not 1.6, so that implies that MATLAB users don't really need previous releases. In that case changing the menu to include only the different flavors proseco, mini (maybe) and full at the latest version would be the last time that code got touched. So this is for @jskrist . |
Officially for @jskrist , but tiny answer back that I think the list of files for that is probably the list from the agasc setup script in I don't know if they can call Python at that point to 'resolve' the name, but maybe that's not needed anyway. |
@jeanconn is right that MATLAB May not have access to python at some points during startup. We don't currently use python to get the list of agasc files available. I'm not sure about all the constraints on this package and the required features, but from the MATLAB perspective, if we can set the path to be checked via environment variable and specify the selected file in that directory via a separate environment variable, I think we are good. The hiccup I ran into was regarding the special handling of files specifying the |
@jskrist - the idea there was that the "special handling" of Conversely, an But that behavior doesn't need to apply to the environment variables. So here is a possible new start for the
|
@javierggt @jeanconn - this is now ready for code review. I updated the top description and made a few more changes. There are still some to-do's, but the main code is (in theory) all there in good shape. |
Guessing these tests in test_agasc_2 use the default agasc and expect it to have MAG_CATID as a field. No biggie.
|
Not sure, but I did note above that tests are not expected to be passing at the moment. |
That's fine - last comment wasn't specific about which to-dos remain to-dos and I generally start code review by running the tests. |
We talked in our meeting about handling
|
I take back my comment -- the docs are clear that if the user supplies AGASC_HDF5_FILE the agasc will be read from agasc dir / AGASC_HDF5_FILE and there's no advertising support for the "*" syntax with the AGASC_HDF5_FILE variable. And we probably don't have a testing use case. So let's stick with AGASC_HDF5_FILE is a string that ends in '.h5' that references that one file. |
ca4613b reduces memory from the supplement processing by around 20-30 Mb. |
:param pm_filter: Use PM-corrected positions in filtering | ||
:param fix_color1: set COLOR1=COLOR2 * 0.85 for stars with V-I color | ||
:param use_supplement: Use estimated mag from AGASC supplement where available | ||
(default=value of AGASC_SUPPLEMENT_ENABLED env var, or True if not defined) | ||
:param cache: Cache the AGASC data in memory (default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, these agasc changes are very well tested and documented, but I'm not sure if there is a cache kwarg test or use case documented. Given the impact that seems fine -- I think the plan is this would only be used by the advanced user to increase speed at the expense of memory. Though I'm not sure about the magnitude of benefit or cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance gains are now documented in the description with a new profiling script in the dev
directory. One use case is to replace get_agasc_cone_fast
in kady
(of course that requires the matlab_pm_bug
so who knows). In retrospect it probably wasn't worth the effort but it is done now, let's be positive! 😄
Description
This PR makes a broad set of updates to the
agasc
package:create_derived_agasc_h5.py
to make it more general and make the options independent and atomic.get_agasc_file
.create_derived_agasc_h5.py
andagasc.py
into smaller well-documented sub-functions.healpix.py
module as the place for most HEALpix related functionality.cache
keyword toget_agasc_cone
for performance-critical applications requiring repeated cone searches. This will read the AGASC file into memory and use that for subsequent cone searches.To do:
get_star()
andget_stars()
andget_agasc_cone()
Closes #152
Interface impacts
miniagasc.h5
to the latest version ofproseco_agasc
(e.g.proseco_agasc_1p8.h5
) in the default AGASC directory.miniagasc.h5
pointing to the latestminiagasc_1pN.h5
is no longer consider an official part of the AGASC data files.$SKA/data/agasc
:agasc_healpix_1p7.h5
(HEALpix-ordered version ofagasc1p7.h5
)proseco_agasc_1p8[rcN].h5
Testing
Unit tests
Independent check of unit tests by Jean
Functional tests
Memory performance (integration with sparkles/proseco)
From within the
dev
directory in a ska3-dev environment.Dev (peak use 123 Mb): memray-flamegraph-profile-memory-dev
Flight (peak use 283 Mb): memray-flamegraph-profile-memory-flight.html
JC (for the record prior to further memory optimization)
For brief initial memory profiling, I ran
and
with this PR with memray memory profiler. In aggregate, the 1p7 version showed peak resident memory use of 280Mb. The 1p8 healpix version showed peak resident memory use of 154Mb.
Speed performance
From within the agasc git repository:
Flight
agasc
package with Dec-ordered AGASC 1.7 fileThis PR
agasc
with Dec-ordered AGASC 1.7 fileThe expectation is to see similar performance.
This PR
agasc
with HEALpix-ordered AGASC 1.7 fileThis should be faster.
Caching performance improvement
Using
cache=True
for 100 cone searches improves the speed by a factor of two. This uses thedev/profile_cache.py
script.kady (slow NFS)
Mac laptop (fast SSD)