Basic Info

Database: KNB

Language: R

Contents:

notebooks:
- knb-notebook1.md: downloading the API and understanding its usage
- knb-notebook2.md: explore the database using their API
- knb-notebook3.md: find the most popular headers in the KNB database and download the datasets that have these headers
- knb-process.md: understand PISCO datasets and analyze their attributes
- knb-location.md: extract location and date information from PISCO, species and sea star wasting syndrome datasets and merge them
- knb-sckat.md: merge species count and size data and PISCO datasets in order to find a relation between count and size of species and temperature
data:
1. generated:
- knb-attrs.csv:
  Extracted from the KNB website, around 700,000 rows. Each row contains information of an individual dataset that has header information in its metadata file.
- knb-pop-attrs.csv:
  Common headers of the datasets, ordered by their frequencies.
- pisco-locations-dates.csv:
  Group PISCO datasets by location and date.
- PISCOwSeason.csv:
  Add season column to PISCO location and date datasets.
- [ca_sea_star_vs_pisco.csv]
- []
- [pop_ds.csv]
1. downloaded:
- seastarkat_size_count_totals_download.csv: Species count and size data (sea stars and katharina only) requested from MARINe.
- sswd_sea_star_observations_2019_0411.csv: Sea star wasting syndrome data requested from MARINe.
- phototranraw_download.csv: requested from MARINe.
- downloaded PISCO csv files: two large to show online; stored in a hard drive

Exploring KNB

This repo has the data, code and reports for my exploratary analysis on KNB, which is a website that aggregates ecology related datasets.

Accessing the data in KNB

In order to access the data in KNB programmatically, I downloaded their API(notebook1). Then following Ciera's suggestion, in notebook3, I was able to find the most popular headers in their database, under the help of the KNB staff. Playing with the headers, I decided to work on the datasets from PISCO and I need combine all the datasets first, which are around 200GB in total.

Understanding PISCO datasets

In notebook3, I downloaded one PISCO dataset's xml file and data frame. From the metadata file I understood the general information of that one PISCO dataset, including the purpose, location, organization, attribute definition, etc. Then I made a few plots of the attributes for that dataset to see how each attribute varies over time.

Merging species and PISCO data

In location.md and sckat.md, I aim to merge species and PISCO data using location and time in order for a relation between species and ocean temperature.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
code		code
data		data
image		image
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basic Info

Exploring KNB

Accessing the data in KNB

Understanding PISCO datasets

Merging species and PISCO data

About

Releases

Packages

cabinetofcuriosity/knb_explore

Folders and files

Latest commit

History

Repository files navigation

Basic Info

Exploring KNB

Accessing the data in KNB

Understanding PISCO datasets

Merging species and PISCO data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages