Merging current refactored EchoPro code for software renaming #199

brandynlucca · 2024-03-06T18:19:26Z

No description provided.

Updated `core.py` to contain the expected nested dictionary data structures of the imported data attributes. The initialization of `Survey` includes importing the configuration files (`initialization_config.yml` and `survey_year_2019_config.yml`) and all associated data. This includes a utility function `populate_tree` that maps out the current data attribute dictionary paths (e.g. where all of the biological data is stored within the `Survey` class object). This is called by the user via `Survey.summary`. Minor adjustments were made to the configuration files. Some support scripts have been added within `data_structure_utils.py` that aid in pushing/pulling files from nested dictionaries.

- Reduced the number of recursive functions -- Maintained the `populate_tree()` function solely for debugging purposes. Its functionality is completely independent of the rest of the module in its current state. --- This concerns: `PARASITE_TREE`, `pushed_nested_dict`, `add_trees`, `populate_tree` - Factored out the column validation -- Now located at `EchoPro.utils.data_file_validation` --- This concerns: `validate_data_columns` - Factored out data type validation and import --- This concerns: `read_validated_data` --- This can probably be further consolidated with a little more hard-coding given that it started from a recursive-heavy state.

* `core.py` --- Amended `LAYER_NAME_MAP` API to include a template with expected data attribute dictionary paths * `survey.py` --- Reframed the `load_survey_day` loop to iterate by expected data attribute rather than encountered file name --- Updated arguments for `read_validated_data()` --- Adjustments made for filepath handling with `load_configuration()` * `data_file_validation` --- Reworked argument inputs into `validate_data_columns()` to match the updated arguments for `read_validated_data()`

…oader_changes Update core.py and begin refactoring data loader

* Built up the `strata_mean_sigma_bs` and `impute_missing_sigma_bs` functions - `self.strata_mean_sigma_bs` --- This adds the `sigma_bs` dictionary to `self.acoustics` --- Within this dictionary, three sets of values are stored: 1) `length_binned`: `sigma_bs` values from `specimen_df` and `length_df` 2) `haul_mean`: mean `sigma_bs` across each region-specific haul ID 3) `strata_mean`: mean `sigma_bs` across each stratum layer - `self.impute_missing_sigma_bs` --- This imputes the mean `sigma_bs` from the closest strata values in cases where strata are missing

The length and age bin parameters were originally moved within the config attribute via `biometric_distributions`. This has now been moved to `load_configuration`.

Amended typos, grammar, etc., in the function doc strings and in-line comments

Amended functions like `discretize_variable` to be simpler and directly describe the actual outputs (or intended outputs) since thee functions themselves serve very specific tasks. For instance, `quantize_variable` in fact describes what the function broadly does, but the actual functionality is very narrow in scope. This also aligns with the argument names.

There was an issue with how `sigma_bs_impute` was being constructed when concatenating the original `strata_mean` dataframe with a newly generated dataframe containing the missing strata with `np.nan` as place holders for the missing `mean_sigma_bs` values.

Reformatted the "noun_modifier" formats for variable/column naming in generated dataframes for consistency with the rest of the module.

…a-WIP-refactor-compute_transect_results Brandynlucca-WIP-refactor-compute_transect_results

There was an issue where I had tested the code using an already defined global version of a certain variable and function that allowed the code to run. However, when running in a clean instance the code does not work (as expected). This commit has amended that issue, specifically for `strata_age_binned_weight_proportions`.

…a-WIP-age_weighting Refactor apportioning of weights/counts to age, sex, and intersecting age-sex bins

Incorporated the EPSG datum into initialization_config that is used for defining the projection and other spatial features for georeferenced NASC measurements.

* add test_data folder, pytest skip all existing tests * add skeleton test_data_loader * rename test to test_load_configuration * add test_data/temp to .gitignore * fix potential problem with test_data/temp not existing * use pytest.tmp_path for temp re-written config_init * note: test_data/input_files does not exist yet

* Create pr.yaml for running tests on PR * update requirements to see how pip does * remove nb_conda_kernels from requirements * add scipy

* move all test_*.py out from subfolders * rename old test modules with _OLD

A new function (`stretch`) has been added to `operations.py` to reduce the amount of cluttered and repetitive code contained within the `nasc_to_biomass_conversion` function. I expect this function to be re-used elsewhere, as well. The `stretch` function leverages the built-in `pandas.wide_to_long` 'gather/melt' method that ultimately re-indexes the data by consolidating the separate data columns (e.g `rho_a` for `male`, `female`, `unsexed`, and `total`) into a single index (e.g. `sex`) and data (e.g. `rho_a`) column. This can help provide a more intuitive way of filtering out specific groups/contrasts in downstream functions and methods.

The previous commit/push missed the doc string defined for the `stretch` function.

An additional utility function `group_merge` has been added to reduce the amount of repetition in cases where multiple dataframes are being merged in the same step/pipeline/chain. This doesn't change the previous output/result of the code, but it is expected to be used for later calculations/steps that will enable more consistent formatting and ensuring that the grouped merges are being performed in the same way every time. This is particularly important so the 'how' and 'on' arguments are appropriately applied and are less vulnerable to errant typos.

The `load_configuration` function was previously included as a static method within `Survey`; however, this isn't necessary since `load_configuration` never uses `self` as an argument. Consequently, it has been moved to `EchoPro.utils.data_file_validation`.

Various changes were made to enable the INPFC strata from the `INPFC` sheet to be validated (alongside `stratification1`), read, and incorporated into the `Survey` object. This replaces the previous hard-coded `pandas.DataFrame` that was generated in the `stratified_summary` method. In `survry_year_2019_config.yml`, this is represented by `sheetname: [ INPFC , stratification1 ]` associated with the `geo_strata` configuration setting. So now the data validating and reading functions can handle multiple `.xlsx` sheetnames from the same file.

As mentioned in Issue OSOceanAcoustics#177 that changes the location of `load_configuration` within `EchoPro`. When ran locally, the test passes. This commit also pushes changes to included test-related files that worked from this branch.

…_biomass_plus_jolly_hampton

The line `from functools import reduce` was missing from `operations.py` to enable the `reduce(...)` function used within `group_merge(...)`.

Co-authored-by: Wu-Jung Lee <[email protected]>

Renamed `calculate_bounds` to `calculate_start_end_coordinates` to reflect that the function is not drawing a true geospatial boundary box/rectangle around the transect coordinates.

Renamed dataframe the column with strata numbers within `self.biology[ 'weight' ][ 'weight_strata_df' ]` from `stratum` to `stratum_num`.

Code within the `nasc_to_biomass_conversion(...)` function were refactored to create `index_sex_weight_proportions(...)` and `index_transect_age_sex_proportions(...)`. These functions will yield the following variables: `sex_indexed_weight_proportions` and `nasc_fraction_total_df`.

Added preliminary doc strings to `index_sex_weight_proportions` and `index_transect_age_sex_proportions`. Small edits were also made to the corresponding `nasc_to_biomass_conversion(...)` code and imported modules in `biology.py`.

Missing modules located in `EchoPro.computation.biology` were appropriately added into `survey.py`.

Amended the doc string associated with `calculate_start_end_coordinates`

…ps://github.com/uw-echospace/EchoPro into brandynlucca-nasc_to_biomass_plus_jolly_hampton

…a-nasc_to_biomass_plus_jolly_hampton Add population-level calculations and stratified statistics

Added semivariogram functions defined in original Matlab committed (cases 1 through 13 ). These have not been fully fleshed out and have not yet been fully documented. WIP.

- Added folder for doc images - Added placeholder markdown files for documentation discussing theoretical information and mathematical equations - Added example image to `core_data_structure` which was renamed - `glossary.md` was added to contain a list of symbols and variable names that can be found throughout the workflow of the software both programmatically and in the mathematical equations

leewujung · 2024-03-06T18:24:44Z

Thanks @brandynlucca -- awesome work. Super excited to get this merged!

brandynlucca and others added 30 commits January 5, 2024 13:14

Merge pull request OSOceanAcoustics#161 from uw-echospace/core_data_l…

699c6c8

…oader_changes Update core.py and begin refactoring data loader

Design workflow for computing transect results

81dd42b

Modifications for data preparation/processing

828adb2

Moved length/age bin config formatting

ac9e9d4

The length and age bin parameters were originally moved within the config attribute via `biometric_distributions`. This has now been moved to `load_configuration`.

Edits to documentation/comments

fd3cd0b

Amended typos, grammar, etc., in the function doc strings and in-line comments

Amend variable/column names

40731f9

Reformatted the "noun_modifier" formats for variable/column naming in generated dataframes for consistency with the rest of the module.

Merge pull request OSOceanAcoustics#164 from uw-echospace/brandynlucc…

faff01f

…a-WIP-refactor-compute_transect_results Brandynlucca-WIP-refactor-compute_transect_results

Added sex- and age-stratified proportions

ac43d9c

Merge pull request OSOceanAcoustics#168 from uw-echospace/brandynlucc…

556489e

…a-WIP-age_weighting Refactor apportioning of weights/counts to age, sex, and intersecting age-sex bins

Added geospatial transformation to config

a207806

Incorporated the EPSG datum into initialization_config that is used for defining the projection and other spatial features for georeferenced NASC measurements.

Merge branch 'main-upstream' into brandynlucca-WIP-refactoring

4c38591

Add tests on PR (OSOceanAcoustics#174)

9883e92

* Create pr.yaml for running tests on PR * update requirements to see how pip does * remove nb_conda_kernels from requirements * add scipy

Reorganize old tests (OSOceanAcoustics#175)

f79b6d5

* move all test_*.py out from subfolders * rename old test modules with _OLD

Population metrics and stratified stats

6ad39ce

Added docstring to stretch function

1a9e729

The previous commit/push missed the doc string defined for the `stretch` function.

Amend test_data_loader

67348b5

As mentioned in Issue OSOceanAcoustics#177 that changes the location of `load_configuration` within `EchoPro`. When ran locally, the test passes. This commit also pushes changes to included test-related files that worked from this branch.

Merge branch 'brandynlucca-WIP-refactoring' into brandynlucca-nasc_to…

bd3cccc

…_biomass_plus_jolly_hampton

Missing import for group_merge

14b94f1

The line `from functools import reduce` was missing from `operations.py` to enable the `reduce(...)` function used within `group_merge(...)`.

brandynlucca and others added 26 commits February 9, 2024 09:49

Update EchoPro/tests/test_data_loader.py

8e0080c

Co-authored-by: Wu-Jung Lee <[email protected]>

Rename calculate_bounds to match operation

22bf7e1

Renamed `calculate_bounds` to `calculate_start_end_coordinates` to reflect that the function is not drawing a true geospatial boundary box/rectangle around the transect coordinates.

Renamed stratum colname w/ stratum_num

9296e87

Renamed dataframe the column with strata numbers within `self.biology[ 'weight' ][ 'weight_strata_df' ]` from `stratum` to `stratum_num`.

Small edits to new biology.py funcs

38c153e

Added preliminary doc strings to `index_sex_weight_proportions` and `index_transect_age_sex_proportions`. Small edits were also made to the corresponding `nasc_to_biomass_conversion(...)` code and imported modules in `biology.py`.

Missing function imports in survey.py

17da6f6

Missing modules located in `EchoPro.computation.biology` were appropriately added into `survey.py`.

Added missing import (numpy)

a0aedeb

Updated calculate_start_end_coordinates docs str

895e2ae

Amended the doc string associated with `calculate_start_end_coordinates`

Amendments to stratified_transect_statistics

dae83c0

Merge branch 'brandynlucca-nasc_to_biomass_plus_jolly_hampton' of htt…

1d11497

…ps://github.com/uw-echospace/EchoPro into brandynlucca-nasc_to_biomass_plus_jolly_hampton

Merge pull request OSOceanAcoustics#176 from uw-echospace/brandynlucc…

c442d17

…a-nasc_to_biomass_plus_jolly_hampton Add population-level calculations and stratified statistics

Added variogram functions

c604db9

Added semivariogram functions defined in original Matlab committed (cases 1 through 13 ). These have not been fully fleshed out and have not yet been fully documented. WIP.

Addressed poor performance issues w/ georef funcs

d3b499c

Adjustments to georeferencing and spatial methods

0d5256b

Updated variogram models and function location

270db3b

Added sliding search window functions/gridding

e6f2168

Amendments to kriging workflow

2157de8

Update to kriging functionality

4e5c9a1

Add .readthedocs.yml

88b09a6

Tweaks to proportion calculations and organization

d4474b6

Update age weight proportion weights for NASC

ad74b77

General changes to variable names for kriging

6f1a6b9

Additional changes to survey for kriging

08157a2

Added length_weight_age calculation for proportion

9315793

Merge branch 'main-upstream' into test-branch

e649958

leewujung merged commit f18321c into OSOceanAcoustics:main Mar 6, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging current refactored EchoPro code for software renaming #199

Merging current refactored EchoPro code for software renaming #199

brandynlucca commented Mar 6, 2024

leewujung commented Mar 6, 2024

Merging current refactored EchoPro code for software renaming #199

Merging current refactored EchoPro code for software renaming #199

Conversation

brandynlucca commented Mar 6, 2024

leewujung commented Mar 6, 2024