Skip to content

Commit

Permalink
Merge pull request #283 from brandynlucca/input_file_docs
Browse files Browse the repository at this point in the history
Amendments to input files documentation
  • Loading branch information
brandynlucca authored Oct 18, 2024
2 parents 7fdd86a + 14121a6 commit 0c5feb3
Showing 1 changed file with 80 additions and 110 deletions.
190 changes: 80 additions & 110 deletions docs/input_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,164 +5,134 @@ Input files used in an Echopop run, grouped by data type. The tables below descr

Biological data are always separated into US vs Canada files. All other data files combine US and Canadian data.

To minimize duplication in the data file description tables below, additional definitions and information for some variables found in multiple files is provided here, especiallly for column names:

- `haul_num`: Haul number. Identifies the haul the collected data come from. A haul is usally described as a collection of trawls for a certain section of the survey.
- `transect_num`: Transect number.
- `species_id`: Species identification code (ID). Identifies what species is associated with the collected data. Pacific hake is 22500.
- `N/P`: Empty value Not Permitted.
- `nmi`: Nautical miles.
- `Old name`: Column name used previously in the Matlab EchoPro program
:::{admonition} Dataset structures
:class: note
*See page for [configuration dataset file organization](implementation/preprocessing_data) for more details.*
:::

```{contents}
:local:
:depth: 3
```

:::{admonition} `EchoPro` column names
:class: note
There may be some inconsistencies in the columns used by files in `EchoPro` for previous years. These column names are based on the files associated with the 2019 survey.
:::

## Biological (trawl) data

**Current base directory** used with the sample files: `Biological`

Data files from the US and Canada are found in subdirectories `US` and `CAN`, respectively.
## Biological data

### Length

**Current sample file (US data)** relative to base directory: `US/2019_biodata_length.xls`, sheet `biodata_length`

Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
haul_num | Haul | integer | | N/P | Haul number
species_id | Species_Code | integer | | N/P | Species identification code (ID)
sex | Sex | integer | | N/P | Sex of the animal. 1=Male, 2=Female, 3=Unknown/Not determined ("unsexed")
length | Length | float | cm | | Length of the animal
length_count | Frequency | float | | empty (blank) | Number of animals in the haul, of a particular species, and of a certain sex and length. For example, we have 5 Hake from haul 1 that are males with length 20cm
`Echopop` column | `EchoPro` column | Data type | Units | Description
--- | --- | --- | ----- | ---
haul_num | Haul | integer<br>float | | Haul number. <br> Rows with missing values are removed
species_id | Species_Code | integer<br>float<br>string | | Species identification code (ID)<br>Rows with missing values are removed
sex | Sex | integer | | Sex of the animal <br> Male: `1`/`"m"`/`"male"`, Female: `2`/`"f"`/`"female"`, Unsexed: `3`/`"u"`/`"unsexed"` <br>Missing values are replaced with `"unsexed"`
length | Length | float | cm <br> (0.0, ∞) | Animal fork length <br> Missing values are replaced with `NaN`
length_count | Frequency | integer | count <br> [0, ∞) | Number of animals with the corresponding binned fork length <br> Missing values are replaced with `0`

### Specimen

**Current sample file (US data)** relative to base directory: `US/2019_biodata_specimen_AGES.xls`, sheet `biodata_specimen`

Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
haul_num | Haul | integer | | N/P | Haul number
species_id | Species_Code | integer | | N/P | Species identification code (ID)
sex | Sex | integer | | N/P | Sex of the animal. 1=Male, 2=Female, 3=Unknown/Not determined ("unsexed")
length | Length | float | cm | | Length of the animal
weight | Weight | float | kg | empty (blank) | Weight of the animal
age | Age | float | years | empty (blank) | Age of the animal
`Echopop` column | `EchoPro` column | Data type | Units &nbsp; | Description
--- | --- | --- | --- | ---
haul_num | Haul | integer | | Haul number. <br> Rows with missing values are removed
species_id | Species_Code | integer | | Species identification code (ID) <br> Rows with missing values are removed
sex | Sex | integer | | Sex of the animal <br> Male: `1`/`"m"`/`"male"`, Female: `2`/`"f"`/`"female"`, Unsexed: `3`/`"u"`/`"unsexed"` <br> Missing values are replaced with `"unsexed"`
length | Length | float | cm <br> (0.0, ∞) | Animal fork length <br> Missing values are replaced with `NaN`
weight | Weight | float | kg <br> (0.0, ∞) | Specimen weight <br> Missing values are replaced with `NaN`
age | Age | float <br> integer | years <br> [0.0, ∞) | Age of the animal <br> Missing values are replaced with `NaN`

### Catch

**Current sample file (US data)** relative to base directory: `US/2019_biodata_catch.xls`, sheet `biodata_catch`

Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
haul_num | Haul | integer | | N/P | Haul number
species_id | Species_Code | integer | | N/P | Species identification code (ID)
haul_weight | Weight_In_Haul | float | kg | N/P | Haul weight

### Haul vs transect

File containing the mapping between hauls and transects. This is a new file that replaces the sole information that was being used from the gear file. Note that rows with empty `transect_num` must be omitted.

**Current sample file (US data)** relative to base directory: `US/haul_to_transect_mapping_2019.xls`, single sheet
Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
haul_num | Haul | integer | | N/P | Haul number
transect_num | Transect | integer | | N/P | Transect number

`Echopop` column | `EchoPro` column | Data type | Units | Description
--- | --- | --- | --- | ---
haul_num | Haul | integer | | Haul number <br> Rows with missing values are removed
species_id | Species_Code | integer | | Species identification code (ID) <br> Rows with missing values are removed
haul_weight | Weight_In_Haul | float | kg <br> [0.0, ∞) | Haul weight <br> Rows with missing values are removed

## Stratification

**Current base directory** used with the sample files: `Stratification`

Strata may be based on age-length (`KS`, Kolmogorov-Smirnov test) or regional (`INPFC`, International North Pacific Fisheries Commission) stratifications. Each file contains two tabs, one for each strata type.

### Strata

File that relates the stratification to the haul.

**Current sample file (US data)** relative to base directory: `US_CAN strata 2019_final.xlsx`, sheets `Base KS` and `INPC`

Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
stratum_num | Cluster name / INPFC | integer | | N/P | Stratum number for KS or INPC strata (`Base KS` or `INPC` tab, respectively). For `Base KS`, 0 = Low sample size. The Old names listed are for the `Base KS` and `INPFC` tabs, respectively
haul_num | Haul | integer | | N/P | Haul number
fraction_hake | wt | float | 0-1 | N/P | Fraction of the haul weight that is hake
`Echopop` column | `EchoPro` column | Data type | Units | Description
--- | --- | --- | --- | ---
stratum_num | Strata index | integer | | Index/grouping representing the stratum identifier based on <br> either length (KS) or latitude (INPFC) <br> Rows with missing values are removed
haul_num | Haul | integer | | Haul number <br> Rows with missing values are removed
fraction_hake | wt | float | proportion <br> [0.0-1.0] | Fraction of the haul weight that is hake <br> Missing values are replaced with `0.0`

### Geo-strata
### Geostrata

File that defines the geographic definition of strata.

**Current sample file (US data)** relative to base directory: `Stratification_geographic_Lat_2019_final.xlsx`, sheets `stratification1` and `INPC`

Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
stratum_num | Strata index | integer | | N/P | Stratum number for KS or INPC strata (`stratification1` or `INPC` tab, respectively)
northlimit_latitude | Latitude (upper limit) | float | decimal degrees | N/P | Northern limit of stratum
`Echopop` column | `EchoPro` column | Data type | Units | Description
--- | --- | --- | --- | ---
stratum_num | Strata index | integer | | Index/grouping representing the stratum identifier based on <br> either length (KS) or latitude (INPFC) <br> Rows with missing values are removed
northlimit_latitude | Latitude (upper limit) | float | decimal degrees <br> [-90.0, 90.0] | Northern limit of stratum <br> Rows with missing values are removed


## NASC

**Current base directory** used with the sample files: `Exports`

### No Age 1

NASC (Nautical Area Scattering Coefficient) values that do not include age1 values. Values are defined along transects at cells with an approximately 0.5 nmi spacing,

**Current sample file (US data)** relative to base directory: `US_CAN_detailsa_2019_table2y+_ALL_final - updated.xlsx`, single sheet

Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
transect_num | Transect | integer | | N/P | Transect number
vessel_log_start | VL start | float | nmi | N/P | Vessel log cumulative distance at start of transect cell
vessel_log_end | VL end | float | nmi | N/P | Vessel log cumulative distance at end of transect cell
latitude | Latitude | float | decimal degrees | N/P | Transect cell center latitude
longitude | Longitude | float | decimal degrees | N/P | Transect cell center longitude
stratum_num | Stratum | integer | | N/P | Base KS stratum number
transect_spacing | Spacing | float | nmi | N/P | Distance (spacing) between transects
NASC | NASC | float | m<sup>2</sup> nmi<sup>-2</sup> | N/P | Nautical Area Scattering Coefficient
haul_num | Assigned haul | integer | | N/P | Haul number. A value of 0 is used for transect cells where a haul was not present or used.
### No age-1 fish

The following columns are currently not used in core computations. They are used in reports and in some plots (plots not implemented yet). The column names are the original names and have not been "sanitized".
NASC (Nautical Area Scattering Coefficient) values that do not include age-1 values. Values are defined along transects at cells with an approximately 0.5 nmi spacing,

Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
Region ID | Region ID | int | | |
Bottom depth | Bottom depth | float | meters? | |
Layer mean depth | Layer mean depth | float | meters? | |
Layer height | Layer height | float | meters? | |
`Echopop` column | `EchoPro` column | Data type | Units | Description
--- | --- | --- | --- | ---
transect_num | Transect | integer <br> float | | Transect number <br> Rows with missing values are removed
vessel_log_start | VL start | float | nmi <br> [0.0, ∞) | Vessel log cumulative distance at start of transect interval <br> Missing values are replaced with `NaN`
vessel_log_end | VL end | float | nmi <br> [0.0, ∞) | Vessel log cumulative distance at end of transect interval <br> Missing values are replaced with `NaN`
latitude | Latitude | float | decimal degrees <br> [-90.0, 90.0] | Transect interval center latitude <br> Missing values are replaced with `NaN`
longitude | Longitude | float | decimal degrees <br> [-180.0, 180.0] | Transect interval center longitude <br> Missing values are replaced with `NaN`
transect_spacing | Spacing | float | nmi <br> [0.0, ∞) | Distance (spacing) between transects <br> Missing values are replaced with the user-defined <br> value for `max_transect_spacing` in the <br>`initialization_config.yml` configuration file
NASC | NASC | float | m<sup>2</sup> nmi<sup>-2</sup> <br> [0.0, ∞) | Nautical area scattering strength ($\textit{NASC}$) <br> Missing values are replaced with `0.0`
haul_num | Haul | integer <br> float | | Assigned haul number <br> A value of `0` is used for transect intervals where no haul <br> was present or used <br> Rows with missing values are removed

### All ages

NASC values that include all ages. The file structure is the same as for the "No Age 1" file.

**Current sample file (US data)** relative to base directory: `US_CAN_detailsa_2019_table1y+_ALL_final - updated.xlsx`, single sheet
### All aged fish

NASC values that include all ages. The file structure is the same as for the files excluding age-1 fish.

## Kriging

**Current base directory** used with the sample files: `Kriging_files`

### Mesh

The "Mesh" file containing the centroids of the Kriging grid cells. Grid size is 2.5 nmi by 2.5 nmi.
The "Mesh" file containing the centroids of the Kriging grid cells. The default grid size is 2.5 nmi by 2.5 nmi.

**Current sample file (US data)** relative to base directory: `Kriging_grid_files/krig_grid2_5nm_cut_centroids_2013.xlsx`, sheet `krigedgrid2_5nm_forChu`

Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
centroid_latitude | Latitude of centroid | float | decimal degrees | N/P | Cell centroid latitude
centroid_longitude | Longitude of centroid | float | decimal degrees | N/P | Cell centroid longitude
fraction_cell_in_polygon | Cell portion | float | 0-1 | N/P | Fraction of mesh cell that is within the interpolation polygon that delineates the mesh
`Echopop` column | `EchoPro` column | Data type | Units | Description
--- | --- | --- | --- | ---
centroid_latitude | Latitude of centroid | float | decimal degrees <br> [-90.0, 90.0] | Mesh cell centroid latitude <br> Rows with missing values are removed
centroid_longitude | Longitude of centroid | float | decimal degrees <br> [-180.0, 180.0] | Mesh cell centroid longitude <br> Rows with missing values are removed
fraction_cell_in_polygon | Cell portion | float | proportion <br> [0.0-1.0] | Fraction of mesh cell that is within the interpolation polygon that delineates the mesh <br> Missing values are replaced with `0.0`

### Smoothed shelf-break contour

Smoothed isobath contour used to transform the mesh points. A set of point locations delineating the 200 meter bathymetric contour that represents the shelf break.

**Current sample file (US data)** relative to base directory: `Kriging_grid_files/transformation_isobath_coordinates.xlsx`, sheet `Smoothing_EasyKrig`

Column name | Old name | Data type | Units | Empty value | Description
--- | --- | --- | --- | --- | ---
latitude | Latitude | float | decimal degrees | N/P | Point latitude
longitude | Longitude | float | decimal degrees | N/P | Point longitude
`Echopop` column | `EchoPro` column | Data type | Units | Description
--- | --- | --- | --- | ---
latitude | Latitude | float | decimal degrees <br> [-90.0, 90.0] | Isobath latitude <br> Rows with missing values are removed
longitude | Longitude | float | decimal degrees <br> [-180.0, 180.0] | Isobath longitude <br> Rows with missing values are removed

### Kriging and variogram parameters

This file comprises two columns: 1) the parameter names and 2) their associated values.

`Echopop` parameter | `EchoPro` parameter | Data type | Valid range | Description
--- | --- | --- | --- | ---
correlation_range | vario.lscl | float | [0.0, ∞) | The relative length scale, or range at which the correlation between points becomes approximately constant
sill | vario.sill | float | [0.0, ∞) | The total variance where the change autocorrelation reaches (or nears) 0.0
nugget | vario.nugt | float | [0.0, ∞) | The $y$-intercept of the variogram representing the short-scale (i.e. smaller than the lag resolution) variance
decay_power | vario.pwr | float | [0.0, ∞) | The exponent used for variogram models with exponentiated spatial decay terms
hole_effect_range | vario.hole | float | [0.0, ∞) | Length scale or range of the hole effect
lag_resolution | vario.res | float | (0.0, ∞) | The (scaled) distance between lags
anisotropy | vario.ratio | float | (0.0, ∞) | The directional aspect ratio of anisotropy
search_radius | krig.srad | float | (0.0, ∞) | The adaptive search radius used for kriging
kmin | kmin | integer | (0, ∞) | The minimum number of nearest kriging points
kmax | kmax | integer | (kmin, ∞) | The maximum number of nearest kriging points

0 comments on commit 0c5feb3

Please sign in to comment.