Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency version cleanups etc. for release of version 2.0.1 #560

Merged
merged 4 commits into from
Sep 19, 2024

Conversation

monocongo
Copy link
Owner

@monocongo monocongo commented Sep 19, 2024

Edits for version updates, case-sensitivity of numpy NaN -> nan where used, and new synospsis docstring for main processing script

Summary by Sourcery

Update the codebase for version 2.0.1 by enhancing type annotations and docstrings, refactoring the main processing logic, and updating test cases to use np.nan instead of np.NaN. Add a comprehensive synopsis docstring to the main processing script.

Enhancements:

  • Add type annotations and docstrings to functions in the main processing script to improve code clarity and maintainability.
  • Refactor the main processing logic into a new function process_climate_indices to separate concerns and improve readability.

Documentation:

  • Add a detailed synopsis docstring to the main processing script, explaining the steps involved in processing climate indices.

Tests:

  • Update test cases to use np.nan instead of np.NaN for consistency with numpy's preferred usage.

Copy link

sourcery-ai bot commented Sep 19, 2024

Reviewer's Guide by Sourcery

This pull request focuses on dependency version updates, case-sensitivity corrections for numpy NaN to nan, and the addition of a new synopsis docstring for the main processing script. The changes are primarily maintenance and documentation improvements, with minor code adjustments for consistency and clarity.

File-Level Changes

Change Details Files
Updated numpy NaN to nan for case-sensitivity
  • Changed np.NaN to np.nan throughout the codebase
  • Updated related test cases to use np.nan instead of np.NaN
tests/test_compute.py
src/climate_indices/compute.py
tests/test_indices.py
tests/test_eto.py
tests/test_utils.py
Added type hints and improved function signatures
  • Added return type hint for _drop_data_into_shared_arrays_grid function
  • Added type hints for _drop_data_into_shared_arrays_divisions function parameters
  • Added docstring for _drop_data_into_shared_arrays_divisions function
src/climate_indices/__main__.py
Refactored main function and added new process_climate_indices function
  • Extracted core logic from main function into process_climate_indices function
  • Updated main function to call process_climate_indices
  • Added type hints and docstring for process_climate_indices function
src/climate_indices/__main__.py
Added comprehensive synopsis docstring for main processing script
  • Included detailed explanation of the program's workflow
  • Described the quasi 'map-reduce' model used for parallel computation
  • Outlined key steps: initialization, multiprocessing setup, data preparation, parallel computation, result aggregation, and output writing
src/climate_indices/__main__.py

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @monocongo - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟡 Testing: 3 issues found
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

Comment on lines +1611 to +1618
# Extract arguments
# index = args['index']
# periodicity = args['periodicity']
# scales = args['scales']
# calibration_start_year = args['calibration_start_year']
# calibration_end_year = args['calibration_end_year']
# netcdf_precip = args['netcdf_precip']
# var_name_precip = args['var_name_precip']
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider removing or updating commented-out code

Commented-out code can become outdated and confusing. If this is meant as a guide for future implementation, consider adding a TODO comment explaining its purpose. Otherwise, it might be better to remove it.

Suggested change
# Extract arguments
# index = args['index']
# periodicity = args['periodicity']
# scales = args['scales']
# calibration_start_year = args['calibration_start_year']
# calibration_end_year = args['calibration_end_year']
# netcdf_precip = args['netcdf_precip']
# var_name_precip = args['var_name_precip']
# TODO: Implement argument extraction
# Expected arguments: index, periodicity, scales, calibration_start_year,
# calibration_end_year, netcdf_precip, var_name_precip

@@ -43,7 +43,7 @@ def test_transform_fitted_gamma(
"""

# confirm that an input array of all NaNs results in the same array returned
all_nans = np.full(precips_mm_monthly.shape, np.NaN)
all_nans = np.full(precips_mm_monthly.shape, np.nan)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consistent use of np.nan instead of np.NaN

This change improves consistency with NumPy's preferred lowercase 'nan'. Consider updating all occurrences throughout the test suite for uniformity.

Suggested change
all_nans = np.full(precips_mm_monthly.shape, np.nan)
"""
# confirm that an input array of all NaNs results in the same array returned
all_nans = np.full(precips_mm_monthly.shape, np.nan)

@@ -45,7 +45,7 @@ def test_pet(
np.testing.assert_raises(ValueError, indices.pet, temps_celsius, None, data_year_start_monthly)

# confirm that a missing/None latitude value raises an error
np.testing.assert_raises(ValueError, indices.pet, temps_celsius, np.NaN, data_year_start_monthly)
np.testing.assert_raises(ValueError, indices.pet, temps_celsius, np.nan, data_year_start_monthly)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Updated np.NaN to np.nan in test case

This change aligns with the NumPy convention. Ensure all similar occurrences are updated for consistency across the test suite.

):
    # confirm that an input temperature array of only NaNs
    # results in the same all NaNs array being returned
    all_nan_temps = np.full(temps_celsius.shape, np.nan)
    computed_pet = indices.pet(all_nan_temps, latitude_degrees, data_year_start_monthly)
    np.testing.assert_equal(
        computed_pet,
    np.testing.assert_raises(ValueError, indices.pet, temps_celsius, None, data_year_start_monthly)

    # confirm that a missing/None latitude value raises an error
    np.testing.assert_raises(ValueError, indices.pet, temps_celsius, np.nan, data_year_start_monthly)



# ------------------------------------------------------------------------------
def test_sunset_hour_angle():
# make sure that an invalid latitude value raises an error
pytest.raises(ValueError, eto._sunset_hour_angle, np.deg2rad(-100.0), np.deg2rad(0.0))
pytest.raises(ValueError, eto._sunset_hour_angle, np.NaN, np.deg2rad(0.0))
pytest.raises(ValueError, eto._sunset_hour_angle, np.nan, np.deg2rad(0.0))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consistent use of np.nan in eto tests

This change improves consistency. Consider adding a test case to ensure the function handles np.nan inputs correctly, if not already present.

    pytest.raises(TypeError, eto.eto_thornthwaite, temps_celsius, None, data_year_start_monthly)

    # make sure that an invalid latitude value (NaN) raises an error
    pytest.raises(ValueError, eto.eto_thornthwaite, temps_celsius, np.nan, data_year_start_monthly)


def test_sunset_hour_angle():
    # make sure that an invalid latitude value raises an error
    pytest.raises(ValueError, eto._sunset_hour_angle, np.deg2rad(-100.0), np.deg2rad(0.0))
    pytest.raises(ValueError, eto._sunset_hour_angle, np.nan, np.deg2rad(0.0))
    pytest.raises(ValueError, eto._sunset_hour_angle, np.deg2rad(0.0), np.nan)

raise


def process_climate_indices(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring the main function to improve code structure and reduce complexity.

The introduction of process_climate_indices() increases complexity without clear benefits. To improve code structure and readability, consider fully refactoring the processing logic into this function:

  1. Move all climate indices processing into process_climate_indices().
  2. Remove global state dependencies.
  3. Return the processing results instead of relying on side effects.

Here's a simplified example of how to restructure the code:

def main():
    start_datetime = datetime.now()
    _logger.info("Start time:    %s", start_datetime)

    try:
        arguments = parse_arguments()
        results = process_climate_indices(arguments)
        write_results(results, arguments)
    except Exception:
        _logger.exception("Failed to complete", exc_info=True)
        raise
    finally:
        end_datetime = datetime.now()
        _logger.info("End time:      %s", end_datetime)
        _logger.info("Elapsed time:  %s", end_datetime - start_datetime)

def process_climate_indices(args: argparse.Namespace) -> Dict[str, Any]:
    input_type = _validate_args(args)
    _setup_multiprocessing(args)

    data = _load_data(args)
    indices = _compute_indices(data, args)

    return indices

def _setup_multiprocessing(args: argparse.Namespace) -> None:
    global _NUMBER_OF_WORKER_PROCESSES
    # ... (existing multiprocessing setup logic)

def _load_data(args: argparse.Namespace) -> Dict[str, np.ndarray]:
    # ... (existing data loading logic)

def _compute_indices(data: Dict[str, np.ndarray], args: argparse.Namespace) -> Dict[str, np.ndarray]:
    # ... (existing index computation logic)

def write_results(results: Dict[str, np.ndarray], args: argparse.Namespace) -> None:
    # ... (existing result writing logic)

This structure separates concerns more clearly, reduces global state usage, and makes the flow of data and control more explicit. It's easier to understand, test, and maintain.

Copy link

@monocongo monocongo merged commit 51174e3 into master Sep 19, 2024
10 checks passed
@monocongo monocongo deleted the issue_535_dependency_updates branch September 19, 2024 18:15
@mvdebolskiy
Copy link

What's the reason for changing scipy>=1.10 to scipy==1.10.1?

@monocongo
Copy link
Owner Author

I have no recollection. What's your reason for asking? If there's a good reason it needs to be changed to another version number then please enlighten me. Thanks in advance!

@mvdebolskiy
Copy link

@monocongo I am IT-support for a course, we have a setup with a quite complicated conda environment where scipy>=1.13 is pinned, a student asked me to add this package, so I installed it with -no-deps and tested with the example repo (there is another problem with that (see #567). So I was just wondering why in this PR it was fixed to 1.10.1 instead of >=1.10.1.

@monocongo
Copy link
Owner Author

That makes perfect sense. There are pros and cons to pinning specific version numbers, so maybe at the time I was opting to err on the side of caution? There have been instances in the past where the code around the L-moments and Pearson III distribution went wonky due to breaking changes in scipy, so that might have been what led me to that version pin. I'll do some testing with other versions, and feel free to do the same on your end if you have a list of versions that you think are better, as I'm happy to consider other options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants