-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging current refactored EchoPro code for software renaming #199
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Updated `core.py` to contain the expected nested dictionary data structures of the imported data attributes. The initialization of `Survey` includes importing the configuration files (`initialization_config.yml` and `survey_year_2019_config.yml`) and all associated data. This includes a utility function `populate_tree` that maps out the current data attribute dictionary paths (e.g. where all of the biological data is stored within the `Survey` class object). This is called by the user via `Survey.summary`. Minor adjustments were made to the configuration files. Some support scripts have been added within `data_structure_utils.py` that aid in pushing/pulling files from nested dictionaries.
- Reduced the number of recursive functions -- Maintained the `populate_tree()` function solely for debugging purposes. Its functionality is completely independent of the rest of the module in its current state. --- This concerns: `PARASITE_TREE`, `pushed_nested_dict`, `add_trees`, `populate_tree` - Factored out the column validation -- Now located at `EchoPro.utils.data_file_validation` --- This concerns: `validate_data_columns` - Factored out data type validation and import --- This concerns: `read_validated_data` --- This can probably be further consolidated with a little more hard-coding given that it started from a recursive-heavy state.
* `core.py` --- Amended `LAYER_NAME_MAP` API to include a template with expected data attribute dictionary paths * `survey.py` --- Reframed the `load_survey_day` loop to iterate by expected data attribute rather than encountered file name --- Updated arguments for `read_validated_data()` --- Adjustments made for filepath handling with `load_configuration()` * `data_file_validation` --- Reworked argument inputs into `validate_data_columns()` to match the updated arguments for `read_validated_data()`
…oader_changes Update core.py and begin refactoring data loader
* Built up the `strata_mean_sigma_bs` and `impute_missing_sigma_bs` functions - `self.strata_mean_sigma_bs` --- This adds the `sigma_bs` dictionary to `self.acoustics` --- Within this dictionary, three sets of values are stored: 1) `length_binned`: `sigma_bs` values from `specimen_df` and `length_df` 2) `haul_mean`: mean `sigma_bs` across each region-specific haul ID 3) `strata_mean`: mean `sigma_bs` across each stratum layer - `self.impute_missing_sigma_bs` --- This imputes the mean `sigma_bs` from the closest strata values in cases where strata are missing
The length and age bin parameters were originally moved within the config attribute via `biometric_distributions`. This has now been moved to `load_configuration`.
Amended typos, grammar, etc., in the function doc strings and in-line comments
Amended functions like `discretize_variable` to be simpler and directly describe the actual outputs (or intended outputs) since thee functions themselves serve very specific tasks. For instance, `quantize_variable` in fact describes what the function broadly does, but the actual functionality is very narrow in scope. This also aligns with the argument names.
There was an issue with how `sigma_bs_impute` was being constructed when concatenating the original `strata_mean` dataframe with a newly generated dataframe containing the missing strata with `np.nan` as place holders for the missing `mean_sigma_bs` values.
Reformatted the "noun_modifier" formats for variable/column naming in generated dataframes for consistency with the rest of the module.
…a-WIP-refactor-compute_transect_results Brandynlucca-WIP-refactor-compute_transect_results
There was an issue where I had tested the code using an already defined global version of a certain variable and function that allowed the code to run. However, when running in a clean instance the code does not work (as expected). This commit has amended that issue, specifically for `strata_age_binned_weight_proportions`.
…a-WIP-age_weighting Refactor apportioning of weights/counts to age, sex, and intersecting age-sex bins
Incorporated the EPSG datum into initialization_config that is used for defining the projection and other spatial features for georeferenced NASC measurements.
* add test_data folder, pytest skip all existing tests * add skeleton test_data_loader * rename test to test_load_configuration * add test_data/temp to .gitignore * fix potential problem with test_data/temp not existing * use pytest.tmp_path for temp re-written config_init * note: test_data/input_files does not exist yet
* Create pr.yaml for running tests on PR * update requirements to see how pip does * remove nb_conda_kernels from requirements * add scipy
* move all test_*.py out from subfolders * rename old test modules with _OLD
A new function (`stretch`) has been added to `operations.py` to reduce the amount of cluttered and repetitive code contained within the `nasc_to_biomass_conversion` function. I expect this function to be re-used elsewhere, as well. The `stretch` function leverages the built-in `pandas.wide_to_long` 'gather/melt' method that ultimately re-indexes the data by consolidating the separate data columns (e.g `rho_a` for `male`, `female`, `unsexed`, and `total`) into a single index (e.g. `sex`) and data (e.g. `rho_a`) column. This can help provide a more intuitive way of filtering out specific groups/contrasts in downstream functions and methods.
The previous commit/push missed the doc string defined for the `stretch` function.
An additional utility function `group_merge` has been added to reduce the amount of repetition in cases where multiple dataframes are being merged in the same step/pipeline/chain. This doesn't change the previous output/result of the code, but it is expected to be used for later calculations/steps that will enable more consistent formatting and ensuring that the grouped merges are being performed in the same way every time. This is particularly important so the 'how' and 'on' arguments are appropriately applied and are less vulnerable to errant typos.
The `load_configuration` function was previously included as a static method within `Survey`; however, this isn't necessary since `load_configuration` never uses `self` as an argument. Consequently, it has been moved to `EchoPro.utils.data_file_validation`.
Various changes were made to enable the INPFC strata from the `INPFC` sheet to be validated (alongside `stratification1`), read, and incorporated into the `Survey` object. This replaces the previous hard-coded `pandas.DataFrame` that was generated in the `stratified_summary` method. In `survry_year_2019_config.yml`, this is represented by `sheetname: [ INPFC , stratification1 ]` associated with the `geo_strata` configuration setting. So now the data validating and reading functions can handle multiple `.xlsx` sheetnames from the same file.
As mentioned in Issue OSOceanAcoustics#177 that changes the location of `load_configuration` within `EchoPro`. When ran locally, the test passes. This commit also pushes changes to included test-related files that worked from this branch.
…_biomass_plus_jolly_hampton
The line `from functools import reduce` was missing from `operations.py` to enable the `reduce(...)` function used within `group_merge(...)`.
Co-authored-by: Wu-Jung Lee <[email protected]>
Renamed `calculate_bounds` to `calculate_start_end_coordinates` to reflect that the function is not drawing a true geospatial boundary box/rectangle around the transect coordinates.
Renamed dataframe the column with strata numbers within `self.biology[ 'weight' ][ 'weight_strata_df' ]` from `stratum` to `stratum_num`.
Code within the `nasc_to_biomass_conversion(...)` function were refactored to create `index_sex_weight_proportions(...)` and `index_transect_age_sex_proportions(...)`. These functions will yield the following variables: `sex_indexed_weight_proportions` and `nasc_fraction_total_df`.
Added preliminary doc strings to `index_sex_weight_proportions` and `index_transect_age_sex_proportions`. Small edits were also made to the corresponding `nasc_to_biomass_conversion(...)` code and imported modules in `biology.py`.
Missing modules located in `EchoPro.computation.biology` were appropriately added into `survey.py`.
Amended the doc string associated with `calculate_start_end_coordinates`
…ps://github.com/uw-echospace/EchoPro into brandynlucca-nasc_to_biomass_plus_jolly_hampton
…a-nasc_to_biomass_plus_jolly_hampton Add population-level calculations and stratified statistics
Added semivariogram functions defined in original Matlab committed (cases 1 through 13 ). These have not been fully fleshed out and have not yet been fully documented. WIP.
- Added folder for doc images - Added placeholder markdown files for documentation discussing theoretical information and mathematical equations - Added example image to `core_data_structure` which was renamed - `glossary.md` was added to contain a list of symbols and variable names that can be found throughout the workflow of the software both programmatically and in the mathematical equations
Thanks @brandynlucca -- awesome work. Super excited to get this merged! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.