-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC Project Proposal]: NOAA trawl survey database #74
Comments
I'm interested in working on this. Are there any preferred db systems? As I'm thinking of using postgres since this is obviously huge and going to be pretty query-heavy. |
Thank you @w-nityammm for your interest. To the best of my knowledge we do not have a preferred db system but we can ask around to see if something is preferred within NOAA and get back you. @dgbolser might have more information when he returns the office this week. |
Yes, thanks for your interest @w-nityammm! Postgres seems like a good option and I don't see any obvious incompatibilities with other databases we maintain. I am checking with the database manager here at HQ to see if there's a preferred system. |
@w-nityammm Oracle has been our default but I confirmed that there won't be issues with if we go with postgres. Our folks see the advantages and support going in that direction if it is best for the project. |
Alright sounds good. Will start looking into it :) . Thank you for the response! @dgbolser |
Interesting thought @7yl4r -- I'd defer to someone who knows more about OBIS, but my initial reaction is that we might be constrained by size. OBIS claims to have 136,000,000 records. I don't know exactly how many records we'll be dealing with -- but at least an order of magnitude more than the 3.3 million aggregate records we already have (more if we include more than 55 species). I can do some initial summaries of samples / species to get an idea. In an ideal world, we'd be able to serve up data for any species, including some of the corals, sponges, and other invertebrates. There's a lot of 0s for rarer species, and we wouldn't need to store those -- but it would be useful to include data on individual samples. |
Project Description
The primary objective of this project is to create an accessible international database of transboundary marine survey data across the Northeast Pacific Ocean. Initial work for this project cleaned and joined haul-level data from several surveys operating along the west coast of North America, spanning two countries into a data frame (3.3 million observations of 55 species). Extending this work into a database, rather than data frame, would (1) allow for joining more data such as life-history information like the age of fish in the haul, and (2) allow for a larger number of species to be included (because of file sizes currently, we are providing data from only 55 of more than 1000 species). Most of the data are publicly available in independent-regional databases but no international database exists and the independent databases are not standardized, which significantly inhibits the use of the data for research and in assessments of the status of marine resources. This international database would help strengthen our understanding of climate-driven shifts in groundfish distribution in the North Pacific Ocean through data sharing between Fisheries and Oceans Canada (DFO) and NOAA Fisheries (NMFS) and it has the potential to improve the assessments and management of those species, serving as as a proof-of-concept and foundation for a proposed North America-wide effort to join survey data.
Expected Outcomes
We expect at a minimum that the data already compiled into the joined data frame would be enhanced by being moved to a queryable database. Second, more data that require the relational structure of a database, e.g., age- and length-composition data, could be added to the database as time allows. Including such data will allow for the database to be a one-stop shop for survey data, which would drastically reduce the time needed to compile the data for use in both research and management-related tasks.
Skills Required
SQL, R
Additional Background/Issues
Mentor(s)
Eric Ward (@ericward-noaa), Sean Anderson (@seananderson), Kelli Johnson (@kellijohnson-NOAA), Derek Bolser (@dgbolser)
Expected Project Size
175 hours
Project Difficulty
Intermediate
The text was updated successfully, but these errors were encountered: