Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

afscgap: Ocean health tools for NOAA AFSC GAP species presence data #93

Closed
22 of 33 tasks
sampottinger opened this issue Mar 25, 2023 · 74 comments
Closed
22 of 33 tasks

Comments

@sampottinger
Copy link

sampottinger commented Mar 25, 2023

Submitting Author: Sam Pottinger (@sampottinger)
All current maintainers: @sampottinger, @gizarp
Package Name: afscgap
One-Line Description of Package: Community contributed Python-based tools for working with public bottom trawl surveys data from the NOAA Alaska Fisheries Science Center Groundfish Assessment Program (NOAA AFSC GAP).
Repository Link: https://github.com/SchmidtDSE/afscgap
Version submitted: 0.0.7
Editor: @ocefpaf
Reviewer 1: @7yl4r
Reviewer 2: @ayushanand18
Archive: DOI
JOSS DOI: DOI
Version accepted: 0.0.7
Date accepted (month/day/year): 05-30-2023


Code of Conduct & Commitment to Maintain Package

Description

Python-based tool set for interacting with bottom trawl surveys from the Ground Fish Assessment Program (GAP). This provides information about where certain species were seen and when under what conditions, information useful for research in ocean health.

It offers:

  • Pythonic access to the official NOAA AFSC GAP API service.
  • Memory-efficient tools for inference of the "negative" observations not provided by the API service but required for some common types of analysis.
  • Visualization tools for quickly exploring and creating comparisons within the dataset that provide an "on-ramp" to deeper analysis. The visual analytics components aim to serve both expert programmers and audiences with limited programming experience.

Note that GAP is an excellent dataset produced by the Resource Assessment and Conservation Engineering (RACE) Division of the Alaska Fisheries Science Center (AFSC) as part of the National Oceanic and Atmospheric Administration's Fisheries organization (NOAA Fisheries). See also the RACEBASE NOAA InPort entry.

Additional information at https://pyafscgap.org/.

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

related to data viz category: see presubmission
Domain Specific & Community Partnerships

  • Geospatial
  • Education
  • Pangeo

Community Partnerships

If your package is associated with an
existing community please check below:

  • For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

This library supports retrieval of data from the official NOAA AFSC GAP REST API service but, as that service on its own is often not sufficient due to certain data limitations, it also offers zero catch inference as required for a number of common types of investigations (hence data processing). Finally, given the needs of the community and the vast breadth of the dataset, it offers a community application to explore the data which, in turn, can generate Python code to help users get started with continued analysis within their own scripts.

  • Who is the target audience and what are scientific applications of this package?

This project largely benefits scientific researchers in the ocean health space as this dataset is useful for fisheries management, biodiversity research, and marine science more generally. An example of what this analysis may look like is provided in our example notebook hosted on mybinder.org.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

We are not aware of other Python packages working with the AFSC GAP dataset.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Please see our presubmission.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.

See NOAA InPort entry. We are primarily using Distribution 6 in the API form. As described, these data are available with the following access constraints: "There are no legal restrictions on access to the data. They reside in public domain and can be freely distributed."

See BSD License within repo.

  • contains a README with instructions for installing the development version.

See project README.md.

  • includes documentation with examples for all functions.

See project website and usage section of project website's documentation microsite.

  • contains a tutorial with examples of its essential functions and uses.

See tutorial notebook on mybinder.

  • has a test suite.

See main tests for package and supplemental tests for website (backend tests and frontend tests).

  • has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

See CI and CD (library, documentation).

Publication Options

JOSS Checks

  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.

See example notebook and app intro.

  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.

In addition to query compilation / emulation, we are hopeful that the negative record inference and complementing visualization tool are enough to escape the thin API client status. Coming in at over 10k lines of code, we would seem to fit some of the "hard" criteria that the journal puts forward. Though this library happily provides Pythonic access to an API service, we believe that this contribution as a whole goes beyond a thin client by providing one of the only public mechanisms for conducting investigation requiring negative catch data and provides unique tools for comparative analysis within with the dataset.

  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.

See https://github.com/SchmidtDSE/afscgap/blob/main/inst/paper.md (or PDF preview at https://github.com/SchmidtDSE/afscgap/blob/main/inst/paper.pdf).

  • The package is deposited in a long-term repository with the DOI:

We have submitted to Code Ocean. Please confirm if this will suffice. See 10.24433/CO.4905407.v1 / https://codeocean.com/capsule/4905407/tree/v1

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

We would be delighted to have your pull requests! 🎉

Confirm each of the following by checking the box.

  • I have read the author guide.
  • I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

  • Last but not least please fill out our pre-review survey](https://forms.gle/F9mou7S3jhe8DMJ16). This helps us track
    submission and improve our peer review process. We will also ask our reviewers
    and editors to fill this out.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The [editor template can be found here][Editor Template].

The [review template can be found here][Review Template].

@sampottinger
Copy link
Author

This is a continuation of #91

@sampottinger
Copy link
Author

Sorry updated issue body to fix a small formatting issue and to clarify that we have two maintainers.

@NickleDave
Copy link
Contributor

No worries, edit as much as you need -- we will end up updating the metadata in the YAML header as we go anyway.

I will do initial checks and start finding an editor for this review early next week.

@sampottinger
Copy link
Author

Fantastic thank you! Have a great weekend :)

@NickleDave
Copy link
Contributor

You too! 🙂

@NickleDave
Copy link
Contributor

Hi again @sampottinger I'm adding the initial editor checks below.

Good news! afscgap passes with flying colors. I made a couple of comments in the checklist but I don't see anything that needs to be addressed--looks like we're ready for review.

I will begin looking for an editor this week and update you here as soon as we have found one.

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci
review. Below are the basic checks that your package needs to pass
to begin our review. If some of these are missing, we will ask you
to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements
below.

  • Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
    • might be worth adding a conda-forge package?
    • The package imports properly into a standard Python environment import package-name.
  • Fit The package meets criteria for fit and overlap.
    • yes, data retrieval (from GAP dataset through API), data processing, data visualization
  • Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
    • User-facing documentation that overviews how to install and start using the package.
    • Short tutorials that help a user understand how to use the package and what it can do for them.
    • API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format. We suggest using the Numpy docstring format.
  • Core GitHub repository Files
    • README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
    • Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
    • Code of Conduct The package has a Code of Conduct file.
      • adapted from p5.js COC
    • License The package has an OSI approved license.
      NOTE: We prefer that you have development instructions in your documentation too.
      • BSD 3-Clause
  • Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
  • Automated tests Package has a testing suite and is tested via GitHub actions or another Continuous Integration service.
  • Repository The repository link resolves correctly.
  • Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

  • Initial onboarding survey was filled out
    We appreciate each maintainer of the package filling out this survey individually. 🙌
    Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌


Editor comments

@sampottinger
Copy link
Author

Incredible @NickleDave thank you so much!

@NickleDave
Copy link
Contributor

Hi @sampottinger and @gizarp, very happy to say that @ocefpaf has generously volunteered to act as editor for this review. He will introduce himself and take the lead from here.

@sampottinger
Copy link
Author

Thanks @NickleDave and great to meet you @ocefpaf!

@ocefpaf
Copy link
Member

ocefpaf commented Apr 10, 2023

Hi folks! Sorry for the delay in getting here. (Holiday+longweekend in South America this past few days.)

@NickleDave I'm catching up with the reading today. Please let me know if there are any urgent button/forms I need press/fill.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 10, 2023

@sampottinger I have a few really minor comments after a first pass. These are not blockers and you don't need to address them for this application! They are really just comments:

  1. I see you have a module named utils. That can be problematic as we tend to put everything in there. It also make future refactors or even maintaining the code a bit hard. Most of the functions there seems to be "converters," right? Maybe renamed it to converters.py (horrible name, I know) and avoid the "utils.py" trap.
  2. The query method argument list is daunting! I'm not sure if there is a way around it. I know you are limited by the data structure upstream. Maybe it is a matter of creating more examples and docs? Or other methods with a reduced search scope that calls the main one? Not sure if that even makes sense.
  3. The package name suffers from the usual NOAA acronym problem. I understood it but I am "in the know" b/c I'm a NOS/IOOS/SECOORA contractor (see ;-p) . Maybe a friendlier name can help folks outside of NOAA-Fisheries find this package. However, if there is a user base, even a small one, please disregard this comment. Renaming a package after folks adopted it is worse than a bad name.

PS: I showed your package to a few colleagues at IOOS and they believe that a notebook with an afscgap example would make a nice addition to our Code Lab https://ioos.github.io/ioos_code_lab/content/intro.html. Let me know if you are interested in that.

@sampottinger
Copy link
Author

Hey @ocefpaf! Thanks so much for all of these. Let me pull together a PR real fast and some comments!

@sampottinger
Copy link
Author

sampottinger commented Apr 10, 2023

Thanks again @ocefpaf for your thoughtful comments! Please see below.

I see you have a module named utils

This is a good suggestion. I made a PR at SchmidtDSE/afscgap#49. Let me know if this refactor looks good to you!

The package name suffers from the usual NOAA acronym problem ... if there is a user base, even a small one, please disregard this comment

Yeah the dataset's whole name is NOAA AFSC GAP RACEBACSE FOSS 😅 . Thanks for being understanding on this one. I think we will politely decline this suggestion if that's OK. We have some community continuity concerns like you mentioned and some community sensitivity concerns we want to respect... we have to be a little careful around not appearing to have activity in other datasets (NOAA or otherwise) that may be at different moments in their open science journey than AFSC GAP or not (yet? 🤞) open in the same way that the AFSC GAP API service is. Given our current context, this specificity is probably desirable for at least the medium-term future of the project.

a nice addition to our Code Lab

Amazing! We'd be honored 😄. We'd be more than happy to co-develop something with you or another community member in that space. Just for the conversation, I'll highlight that we do have a mybinder example available (source in repo). Totally understand if it might not be quite the right vibe though.

The query method argument list is daunting!

I think that design choice is a strong opinion loosely held. I'm not sure if it is helpful but I'll document some thoughts we had internally but 100% open to additional ideas!

We went this route to fulfill a few goals: we wanted to reflect the structure of the API service (any combination of filters in the keyword set is valid in the NOAA API service so there isn't a clear functional decomposition) while also providing documented discoverability. There were some other ideas explored:

  • kwargs: Leaving the keyword arguments as variable is a bit easier for implementation but, though documentation for the AFSC GAP dataset exists, it's not necessarily convenient to reference / use from a Python context. So, we wanted to have docstring description for all of the available options to benefit IDEs, API doc generation, etc. We also wanted the possible list of filters to be enumerated for the user in those contexts as well.
  • pass objects: There's the option of a more ORM-like approach where a list of filters are passed like query(filters=[CommonNameFilter(eq='Pacific cod')] but we didn't like forcing a lot of our type system into client code as it would require quite a large number of types to maintain our goals around docstring coverage of the available filter parameters and discoverability.
  • builder chaining: Probably my personal favorite, we could turn this into an object with syntax like query().commonName(eq='Pacific cod').year(min=2021).get(). While I personally like this syntax, there's been conversation internally and with community partners for using this library in an educational context as well. As this syntax may be less common in earlier Python programming journeys, there was a feeling that the keyword arguments approach was more inclusive / approachable for someone in, say, an introductory Python course or a student not in a computer science curriculum (we got invited for one coming up soon!).

Just wanted to list this out but absolutely open to other thoughts! This just felt like the right balance around the broader goals with regards to documentation / discoverability, keeping the type system encapsulated (see below), and maintaining accessibility.

library

@ocefpaf
Copy link
Member

ocefpaf commented Apr 13, 2023

This is a good suggestion. I made a PR at SchmidtDSE/afscgap#49. Let me know if this refactor looks good to you!

Better than I thought. Hopefully that will make your life easier in the future.

I think we will politely decline this suggestion if that's OK.

No arguments here! It is better to keep the name now that the package is out.

Totally understand if it might not be quite the right vibe though.

It is a good fit. I'll ping you outside of pyOpenSci later to talk more about it.

Regarding the query functionality. I do like the idea of the builder chaining and it is not too uncommon. Pandas, pyjanitor and many other mainstream libraries use it. It is quite common in the R world too. Also, many similar packages, like argopy, have something like that.

With that said, again, anything I write here is not a blocker for acceptance. The current implementation is good and solid. Just suggestions for the future.

PS: I disagree with the learning curve there. Maybe it is my bais but I teach a lot of basic Python and I make sure to expose the students to a flat vs nested model for operations.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 13, 2023

👋 Hi Tylar Murray and Ayush Anand! Thank you for volunteering to review
for pyOpenSci!

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit
    here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: May 4th, 2023


Reviewers: @7yl4r and @ayushanand18
Due date: 2023-05-04

PS: You can comment, open issue, and/or PR directly in the afscgap for this review. Just make sure to cross-reference them here.

@sampottinger
Copy link
Author

Thanks @ocefpaf!

Better than I thought.

Awesome! Merged in.

I disagree with the learning curve there.

That's helpful to have your perspective on that. Let me float this around a little bit and see how we feel about migration. I'll try to make a call on that in the next day or so to keep things moving.

With that said, again, anything I write here is not a blocker for acceptance. The current implementation is good and solid. Just suggestions for the future.

Thanks!

wave Hi Tylar Murray and Ayush Anand!

Lovely to meet you both! Thank you for your time. :D

@ayushanand18
Copy link

Thank you so much @ocefpaf and @sampottinger!
Great work building this package. The current implementation looks robust and clean.

But I do have some observations to make.

  1. There is this function afscgap.query().get_bottom_temperature_c(), and I do think that it might be hard to guess the function name for someone who hasn't seen this dataset before. Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')? This observation is similar to other functions like cpue_*(), duration_hr, etc. I think could have been derived from the direct field names in the dataset, and those who know about the dataset can easily figure out what the name would be, without looking at the docs, but it would be difficult for people from outside (like me) to figure out without going back-and-forth to the docs.

  2. Also about the filters and getters, we can utilise the getitem() method in the class with something like this,

def __getitem__(self, key):
    return self.__getattribute__(key) 

This might be more intuitive. Or to make it much easier can we can just go by the name get(key), and specify the key we want. Ref

I would love to hear your opinion, and these are surely non-blockers for this review!

@sampottinger
Copy link
Author

sampottinger commented Apr 17, 2023

Hello @ayushanand18 and @ocefpaf!

I do like the idea of the builder chaining and it is not too uncommon

Thanks again for the conversation @ocefpaf! I've started implementing the builder pattern at SchmidtDSE/afscgap#50.

Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')

Thanks for the suggestion @ayushanand18! I've started that at SchmidtDSE/afscgap#51

we can utilise the getitem() method in the class

Thanks for this suggestion but I may pass on this if that's ok @ayushanand18.

In light of the stuff we did earlier in process, our reasoning is that the library carries an explicit goal to provide docstrings and type annotations for these fields. I think, given the complexity of the data model itself as well as the difficulty in using some of the upstream documentation from NOAA directly in the Python ecosystem, I want to leave those methods there for discoverability and encourage use of those typed interfaces unless the user "opts out" first.

To that end, there is a to_dict method on afscgap.model.Record for those that prefer that key-based approach or want to read into something like pandas.DataFrame but we hope requiring to_dict first helps inform the user that the object returned from a query does offer those other interfaces, information which may be missed if they simply try treating the result as a dictionary. In other words, I'd rather have the user first explicitly decline use of the default typed / strong interface given the benefits it offers for static checking and affordance.

@sampottinger
Copy link
Author

Hello all! @ocefpaf, the builder interface was implemented in SchmidtDSE/afscgap#52.

@sampottinger
Copy link
Author

sampottinger commented Apr 18, 2023

Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')

Thanks @ayushanand18! I have implemented in SchmidtDSE/afscgap#55

@sampottinger
Copy link
Author

sampottinger commented Apr 18, 2023

I am in the process of confirming SchmidtDSE/afscgap#54 which includes the following requested changes before they are resolved on main:

SchmidtDSE/afscgap#54 will also soon include updates to documentation.

@lwasser lwasser moved this to pyos-accepted in peer-review-status Jul 11, 2023
@lwasser lwasser moved this from pyos-accepted to Joss accepted in peer-review-status Sep 13, 2023
@SimonMolinsky SimonMolinsky mentioned this issue Sep 29, 2023
Closed
31 tasks
@cmarmo cmarmo mentioned this issue Oct 15, 2023
30 tasks
@NickleDave NickleDave mentioned this issue Nov 15, 2023
30 tasks
@cmarmo cmarmo mentioned this issue Jan 18, 2024
29 tasks
@tomalrussell tomalrussell mentioned this issue Feb 2, 2024
28 tasks
@Batalex Batalex mentioned this issue Feb 7, 2024
30 tasks
@banesullivan banesullivan mentioned this issue Mar 18, 2024
30 tasks
@ocefpaf ocefpaf mentioned this issue Apr 2, 2024
30 tasks
@snacktavish snacktavish mentioned this issue Apr 4, 2024
33 tasks
@dhomeier dhomeier mentioned this issue Apr 19, 2024
32 tasks
@hamogu hamogu mentioned this issue Jun 3, 2024
32 tasks
@lwasser lwasser moved this from under-joss-review to joss-accepted in peer-review-status Jun 4, 2024
@sneakers-the-rat sneakers-the-rat mentioned this issue Jun 28, 2024
30 tasks
@Batalex Batalex mentioned this issue Aug 10, 2024
30 tasks
@cmarmo cmarmo mentioned this issue Sep 16, 2024
32 tasks
@hamogu hamogu mentioned this issue Oct 1, 2024
31 tasks
@banesullivan banesullivan mentioned this issue Nov 23, 2024
32 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: joss-accepted
Development

No branches or pull requests

6 participants