-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert loads module to xarray #279
Convert loads module to xarray #279
Conversation
I'm currently working through a few issues with getting the tests to pass. There is a minor bug in the standard deviation calculation in The original code uses pandas input. In MHKiT-Python/mhkit/loads/general.py Lines 59 to 70 in 4eeb598
However, one issue with this set-up--If a signal within a Pandas DataFrame is sliced and only has one value (i.e. for i=22 with the test data), the result is a Pandas Series, not another Pandas DataFrame. The std of a pandas Series returns a float, whereas the std of a pandas DataFrame returns a DataFrame of length 1. This results in data that should have std=0 to error in line 68 and results in std=nan on Line 70 instead. The original data being tested against was created with this error and led to the test failing. I updated the test data to reflect the correct standard deviation (0) for those bins with only one data point.
Instead of calculating the standard deviation manually, I suggest we use |
@ssolson This PR is ready for review. |
Thanks Adam, I will not be able to complete my review of this till Wednesday this week due to travel. I will tackle as soon as I can. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akeeste this looks good overall I have a couple of comments and questions
mhkit/loads/extreme.py
Outdated
freq = freq_hz * (2*np.pi) | ||
# change from Hz to rad/s | ||
wave_spectrum = wave_spectrum.iloc[:, 0].values / (2*np.pi) | ||
key_name = list(wave_spectrum.keys())[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assuming the first key is the key we need? Could this be an issue or does the way we pass the xarray to this function make this always correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this function to take in Pandas series, dataframe or xarray dataarray or dataset. however if a DataFrame or Dataset is input, they are simplified to a series/dataArray, and so can only contain one variable. This should eliminate the issue of calling out the correct variable name with xarray
mhkit/loads/extreme.py
Outdated
coord_names = list(mler.coords) | ||
freq = mler.coords[coord_names[0]].values * 2*np.pi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again is the 0 index coord_name an issue for confusion or unexpected results?
Thanks @ssolson I'll go through the review and make updates early next week. |
TODO from discussion with @ssolson today:
also
|
@ssolson I addressed your review comments and added the option for a user to specify the time dimension for >1D variables. But I don't know a robust way to verify that the user's specified dimension is for example a timeseries, so I think we should leave that for another update. The user is also limited to inputting a single variable for some functions, like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akeeste I think this looks great. I have two minor changes and then I think this is ready to go. First I think "dimension" should be time "time_dimension" to increase the clarity without the need to read the docstring (I also made a small suggestion that we mention the kwarg is not applied to pandas inputs bc while obvious to us I think it may not be to a new user.). Secondly was simply to make a small change to a type check.
mhkit/loads/extreme.py
Outdated
dimension: string (optional) | ||
Name of the xarray dimension corresponding to time. | ||
If not supplied, time is assumed to be the first dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is always time then I think "time_dimension" is a more descriptive name and increases the clarity for the user to the kwargs intended use without reading the docstring.
I think the docstring should explicitly state that this is only used in xarray (it is certainty implied). So I am thinking along the lines of _~"time_dimesion has no impact on pandas type inputs"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. That makes sense and is a good idea. Thanks for clarifying; I misunderstand in our last discussion. I'll rename the "dimension" input here and in the other relevant functions
Thanks @ssolson. I'll make those changes then merge this PR |
@akeeste the CDIP Threads Server seems to be down right now. The test will not pass until it is back up. |
Threads is back up. |
# MHKiT v0.8.0 We're excited to announce the release of MHKiT v0.8.0, which brings a host of new features, enhancements, and bug fixes across various modules, ensuring compatibility with Python 3.10 and 3.11, and introducing full xarray support for more flexible data handling. Significant updates in the Wave and DOLfYN modules improve functionality and extend capabilities. ## Python 3.10 & 3.11 Support MHKiT now supports python 3.10 and 3.11. Support for 3.12 will follow in the next minor update. - #240 ## Wave Module ### Enhancements: **Automatic Threshold Calculation for Peaks-Over-Threshold**: We've introduced a new feature that automatically calculates the "best" threshold for identifying significant wave events. This method, originally developed by Neary, V. S., et al. in their 2020 study, has now been translated from Matlab to Python, enhancing our existing peaks-over-threshold functionality. **Wave Heights Analysis**: A new function, `wave_heights`, has been added to extract the heights of individual waves from a time series. This function uses zero up-crossing analysis to accurately measure wave heights, improving upon our previous methods which only provided the maximum value between up-crossings. **Enhanced Zero Crossing Analysis**: Building on the above, the zero crossing code previously embedded in `global_peaks` has been isolated into a helper function. This modular approach not only refines the codebase but also supports new functionalities such as calculating wave heights, zero crossing periods, and identifying crests. ### Bug Fixes: **Contour Sampling Error in Wave Contours**: A bug identified in `mhkit.wave.contours.samples_contour` has been resolved. The issue occurred when period samples defined using the maximum period resulted in values outside the interpolation range of the contour data. This was corrected by ensuring that all sampling points are within the interpolation range and adjusting the contour data selection process accordingly. - #268 - #252 - #278 ## Xarray Support MHKiT functions now fully support the use of xarray for passing and returning data. - #279 - #282 - #285 - #302 - #310 ## DOLfYN Thanks to the many user contributions and users of MHKiT the DOLFYN module include a significant number of enhancements and bug fixes. ### Enhancements: **Altimeter Support**: Enhanced the Nortek Signature Reader to add capability for reading ADCP dual profile configurations. **Data Handling Improvements**: Introduced logic to skip messy header data that can accumulate during measurements collected via Nortek software on PCs and Macs. **Instrument Noise Subtraction**: Added a function to subtract instrument noise from turbulence intensity estimation using RMS calculations, providing results that differ by approximately 1% from the existing standard deviation-based "I" property. **Improved File Handling**: Updates for RDI files to handle changing "number of cells" and variable "cell sizes," which are now bin-averaged into the largest cell size. ### Bug Fixes: **Power Spectra Calculation**: Fixed a bug where a given noise value was not being subtracted from the power spectra, and noise was inadvertently added as an input to dissipation rate calculations. **Improved Header Handling**: Allowed RDI reader to skip junk headers effectively. **Nortek Reader C Types Update**: Adjusted C types in the Nortek reader to handle below-zero water temperatures and to allow pitch and roll values to go negative. - #280 - #289 - #290 - #292 - #293 - #294 - #299 ## River & Tidal: D3D Added limits to `variable_interpolation` and added 3 array input capability to `create_points` - #271 ## Developer Experience ### Black formatting Black formatting is now enforced on all MHKiT files. This ensures consistent formatting across the MHKiT package. - #281 ### Linting & Type Hints MHKiT is in the process of enforcing pylint and adding type hints to all functions. Currently this has been achieved and is enforced in the Loads and Power modules. - #288 - #296 ### CI/CD This release introduces significant reduction in testing time for development. This is achieved by reducing the number of tests for pulls against the develop branch and only running hindcast test when changes are made to it. A bug in the hindcast CI was fixed which only ran on changes to the hindcast tests instead of the hindcast module. Additionally the wave and wind hindcast needed to be separated in 2 jobs due to the excessive time taken to run a wind cache. This created a number of follow on PRs around solidifying the logic of these job. A special case for Python 3.8, pip, and Mac OS was added to use homebrew to install NetCDF and HDF5 to get tests to pass. - #241 - #270 - #306 - #311 - #317 - #318 - #319 - #320 - #324 ### Clean Up MHKiT fixed an implementation error where functions used assert instead of built in errors for type and value checking. Additionally these PRs removed unused files, fixed typos, and created an argument which allows users to run CDIP API calls silently. - #276 - #272 - #273
Loads/general
Loads/extreme