Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sdss id zway #425

Merged
merged 64 commits into from
Nov 14, 2024
Merged

Sdss id zway #425

merged 64 commits into from
Nov 14, 2024

Conversation

zachway1996
Copy link
Contributor

Added functionality for creating the catalogidx_to_catalogidy match between crossmatch versions.

Added method to append new sdss_ids to the tables sdss_id_flat and sdss_id_stacked

@zachway1996 zachway1996 requested a review from albireox as a code owner April 18, 2024 13:54
@albireox
Copy link
Member

HI @zachway1996. Sorry for not commenting on this earlier.

What is the status of this PR? It seems to be mostly in good shape, but if I wanted to test the procedure under append_to_sdss_id.py to add catalogids to the sdss_id tables, is that ready to go?

I have a few comments, some of which maybe are not really relevant or you're already working on them:

  • Right now, if I understand the code, you need to provide a list of new catalogids to add to sdss_id. I think that's mostly fine but I wonder if we should have a mode in which the code automatically looks for missing catalogids from targetdb.target and catalogdb.target_non_carton and add them. Maybe that's what append_to_tables without a catalogid_list does? There is not documentation for the parameters of that class which would be useful.
  • To keep in line with the rest of the package, could you make the classes CamelCase (e.g., append_to_tables -> AppendToTables)? A name as append_to_tables is a bit confusing because it seems that it's a function that does something, while it's a class on which you need to call some methods, but that's not really a bit deal.
  • Could we consolidate all the sdss_id functions and classes under a single sdss_id.py file? Or if you prefer to split things up, feel free to create a sdss_id/ submodule. I also would rename files that seem imperative but are not actual scripts, like create_catalogidx_to_catalogidy.py, but again, not a big deal.
  • The package uses flake8 and isort linting. I think you have most of it, but I would install it and check what's missing; it seems to be mostly spaces around operators.
  • Is there documentation for this procedure? I think Pramod and I will like a tour once you think it's ready, but some written documentation may be useful.
  • Something that doesn't worry me too much, but I think is worth thinking about is that the code is very hardcoded for the three versions we have right now ... That's probably fine as we most likely won't need to sdss_id-match another version, but it also means that if we ever need to do it a very deep knowledge of the code is needed ...

@zachway1996
Copy link
Contributor Author

zachway1996 commented Apr 29, 2024

Howdy @albireox,

What is the status of this PR? It seems to be mostly in good shape, but if I wanted to test the procedure under append_to_sdss_id.py to add catalogids to the sdss_id tables, is that ready to go?

Yes, the code is functional. But given some of your suggestions below, I should probably change some things.

  • Right now, if I understand the code, you need to provide a list of new catalogids to add to sdss_id. I think that's mostly fine but I wonder if we should have a mode in which the code automatically looks for missing catalogids from targetdb.target and catalogdb.target_non_carton and add them. Maybe that's what append_to_tables without a catalogid_list does? There is not documentation for the parameters of that class which would be useful.

This should work for both cases, providing a list of catalogids and linking to a table to search for new sdss_ids. However, most of my testing has been on the former case. I'll test the latter and write some documentation on how to use that.

  • To keep in line with the rest of the package, could you make the classes CamelCase (e.g., append_to_tables -> AppendToTables)? A name as append_to_tables is a bit confusing because it seems that it's a function that does something, while it's a class on which you need to call some methods, but that's not really a bit deal.

That's an easy fix.

  • Could we consolidate all the sdss_id functions and classes under a single sdss_id.py file? Or if you prefer to split things up, feel free to create a sdss_id/ submodule. I also would rename files that seem imperative but are not actual scripts, like create_catalogidx_to_catalogidy.py, but again, not a big deal.

I think I would prefer to keep the two separate because they serve different purposes. create_catalogidx_to_catalogidy searches for catalogid pairs whereas all of the sdss_id logic is in append_to_sdss_id. I'll put them in a submodule.

  • The package uses flake8 and isort linting. I think you have most of it, but I would install it and check what's missing; it seems to be mostly spaces around operators.

Realized this after I submitted the pull request, I've downloaded a linter now.

  • Is there documentation for this procedure? I think Pramod and I will like a tour once you think it's ready, but some written documentation may be useful.

There is documentation for how the schema is set up here, but I would agree that it needs to be fully fleshed out.

One worry I have is that assigning the sdss_id is not definite anymore in the way I have written things. This is primarily because I assigned sdss_id-s based on a SERIAL sequence as new matches were appended to the list. But, since there have been changes to the v31 crossmatch, I'm not sure that sdss_id-s would match up. I can email you and Pramod about this.

  • Something that doesn't worry me too much, but I think is worth thinking about is that the code is very hardcoded for the three versions we have right now ... That's probably fine as we most likely won't need to sdss_id-match another version, but it also means that if we ever need to do it a very deep knowledge of the code is needed ...

This is something I've thought about but haven't found a way around. A lot of the sdss_id logic is based on assuming that v21, v25, and v31 already (mostly) exist. Adding a new crossmatch would work within the crossmatch but many new rows could be created (e.g. a stellar source in v31 is resolved into two sources in v??).

I could write some text about how one would go about adding a v??, if that would help. Also could make comments in the code along with the text.

@albireox albireox merged commit a6569d6 into main Nov 14, 2024
4 of 5 checks passed
@albireox albireox deleted the sdss_id_zway branch November 14, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants