v0.8.0 bug fixes and doc string fixes #327

akeeste · 2024-05-14T21:27:27Z

This PR addresses several bug and doc string fixes in v0.8.0. I think it's critical that we make these fixes before our workshop next Thursday.

I tested all of our Python examples on a clean install of MHKiT with Python 3.11. There are several examples with errors

ADCP_Delft3D_TRTS_example.ipynb - Scipy interpolation error
cdip_example.ipynb - Error indexing by YYYY-MM. Passes in 3.9 but not 3.11
directional_waves.ipynb - Error reading cached data. Passes in 3.9 but not 3.11
metocean_example.ipynb - Pandas date indexing error. Passes in 3.9 but not 3.11
PacWave_resource_characterization_example.ipynb - Folium map error
qc_example.ipynb - mhkit.utils.index_to_datetime is removed/renamed
tidal_example.ipynb - Error slicing by time. Passes in 3.9 but not 3.11
wave_example.ipynb - Warning about pd.csv_read infering a date format. Error defining the "mean" statistic
WPTO_hindcast_example.ipynb - error getting the variable significant_wave_height_0

Incorporates several minor documentation fixes:

updates links in NOAA functions
fixes formatting in the mooring example
mitigate warnings due to doc string issues, spacing, etc --> this will be addressed in future linting

akeeste · 2024-05-15T15:32:34Z

@ssolson is it possible we can change our automated pypi release to be triggered by Github releases instead of commits to master? The current automation prevents us from making minor documentation updates after a release. We typically don't get to this in time and the documentation is built on master which we only see after a release.

akeeste · 2024-05-15T17:56:50Z

@ssolson is it possible we can change our automated pypi release to be triggered by Github releases instead of commits to master? The current automation prevents us from making minor documentation updates after a release. We typically don't get to this in time and the documentation is built on master which we only see after a release.

Fortunately I found a work around that allows us to get our documentation updated quickly before OREC. So we don't need these changes pulled in immediately. But long term we should still look at how our development workflows on each repo tie into doc updates.

akeeste · 2024-05-17T17:40:13Z

@ssolson I identified issues within several python examples and work on getting these fixed ASAP. I reworked this PR to focus on bugs, not just minor documentation items. I'd like to make this a bugfix release as soon as possible.

The notebook vs script runs may have inadvertently been an environment issue (3.9 vs 3.11). Checking on that now.

akeeste · 2024-05-17T18:49:15Z

Update on WPTO_hindcast_example.ipynb:
In #220 there was a slight change to the variable naming: variable_{i} --> {variable_{gid}. Github was not comparing files well in that PR and we probably missed that change. Unless there's a good reason to keep gid appended onto the name, we should revert back to appending i. The old tests reflect that. Unknown why they passed in that PR but broke because of that renaming later on.

akeeste · 2024-05-17T19:53:21Z

Update on qc_example.ipynb:
In #276 index_to_datetime was accidentally removed from mhkit.utils.time_utils. It has been re-added, fixing the qc example

ssolson · 2024-05-20T14:09:42Z

@ssolson is it possible we can change our automated pypi release to be triggered by Github releases instead of commits to master? The current automation prevents us from making minor documentation updates after a release. We typically don't get to this in time and the documentation is built on master which we only see after a release.

The pypi.yml release currently works exactly the way you request it to work here.

on push we create a release to test.pypi.org
on release we create a release on pypi.org

ssolson · 2024-05-20T15:23:55Z

directional_waves.ipynb works fine for me in python 3.11

I have cleared cache and re run multiple times.

ssolson · 2024-05-20T15:55:01Z

PacWave_resource_characterization_example.ipynb - Folium map error

This is not an MHKiT error

You just need to pip install folium

folium is only used to add context to the example and is not a requirement of MHKiT

ssolson · 2024-05-20T16:07:51Z

PacWave_resource_characterization_example.ipynb - Folium map error

This is not an MHKiT error

You just need to pip install folium

folium is only used to add context to the example and is not a requirement of MHKiT

Apologies and Never mind.

I see the issue is the basemap was showing up gray. I have now fixed this in akeeste#4

ssolson · 2024-05-21T08:57:35Z

I take responsibility for not catching this but I did not realize that the wave module xarray conversion made the functions so slow they are now unusable.

E.g. in the Pacwave resource characterization this block now takes greater than 30 minutes... unless you convert Tp back to use pandas which will only take 9 minutes.

@akeeste I know the wave module is still in a bit of transition but do you expect the time gains to be significantly reduced when completed? I am questioning our approach to always convert pandas to xarray.

# Initialize empty lists to store the results from each year
Hm0_list = []
Te_list = []
J_list = []
Tp_list = []
Tz_list = []

# Iterate over each year and save the result in the initalized dictionary
for year in ndbc_data:
    data_raw = ndbc_data[year]
    year_data = data_raw[data_raw != 999.0].dropna()
    Hm0_list.append(resource.significant_wave_height(year_data.T))
    Te_list.append(resource.energy_period(year_data.T))
    J_list.append(resource.energy_flux(year_data.T, h=399.0))

    # Tp_list.append(resource.peak_period(year_data.T))
    fp = year_data.T.idxmax(axis=0)
    Tp = 1/fp
    Tp = pd.DataFrame(Tp, index=year_data.T.columns, columns=["Tp"])
    Tp_list.append(Tp)

    Tz_list.append(resource.average_zero_crossing_period(year_data.T))

# Concatenate list of Series into a single DataFrame
Te = pd.concat(Te_list, axis=0)
Tp = pd.concat(Tp_list, axis=0)
Hm0 = pd.concat(Hm0_list, axis=0)
J = pd.concat(J_list, axis=0)
Tz = pd.concat(Tz_list, axis=0)
data = pd.concat([Hm0, Te, Tp, J, Tz], axis=1)

# Calculate wave steepness
data["Sm"] = data.Hm0 / (9.81 / (2 * np.pi) * data.Tz**2)

# Drop any NaNs created from the calculation of Hm0 or Te
data.dropna(inplace=True)
# Sort the DateTime index
data.sort_index(inplace=True)

akeeste · 2024-05-21T14:08:59Z

@ssolson is it possible we can change our automated pypi release to be triggered by Github releases instead of commits to master? The current automation prevents us from making minor documentation updates after a release. We typically don't get to this in time and the documentation is built on master which we only see after a release.

The pypi.yml release currently works exactly the way you request it to work here.

on push we create a release to test.pypi.org

on release we create a release on pypi.org

@ssolson Thanks for clarifying. I missed that when skimming the pypi.yml workflow

* fix cdip * D3D TRTS eg fix * fix pylint error * pylint * fix too many branches

akeeste · 2024-05-21T14:50:38Z

@ssolson thanks for addressing the cdip and D3D examples in akeeste#4.

I was also able to get the directional_waves example to run when the cached was cleared. Maybe there's an issue using pickle to read data into Python 3.11 when it was cached in 3.9? That's the situation I had locally. Not something we need to address, but something to be aware of.

On the note of computational speed--
In my experience the delay is not coming from performing calculations on xarray data vs on pandas data, but rather from the actual conversion between pandas<-->xarray. Some methods are faster than others. I noticed that using xr.Dataset(data) is significantly faster than ds = data.to_xarray(). From digging into it briefly, the to_xarray method calls a large number of checks on the data being converted from pandas, which becomes expensive when there are lots of variables and timesteps--as is the case when we pull wave spectra over long periods of time.

I implemented a few changes to our type handling to address this, but perhaps there's still an instance that uses to_xarray() on pandas data. Have you noticed a specific part of the PacWave example that is slowing down?

ssolson · 2024-05-21T15:02:39Z

@ssolson thanks for addressing the cdip and D3D examples in akeeste#4.

I was also able to get the directional_waves example to run when the cached was cleared. Maybe there's an issue using pickle to read data into Python 3.11 when it was cached in 3.9? That's the situation I had locally. Not something we need to address, but something to be aware of.

On the note of computational speed-- In my experience the delay is not coming from performing calculations on xarray data vs on pandas data, but rather from the actual conversion between pandas<-->xarray. Some methods are faster than others. I noticed that using xr.Dataset(data) is significantly faster than ds = data.to_xarray(). From digging into it briefly, the to_xarray method calls a large number of checks on the data being converted from pandas, which becomes expensive when there are lots of variables and timesteps--as is the case when we pull wave spectra over long periods of time.

I implemented a few changes to our type handling to address this, but perhaps there's still an instance that uses to_xarray() on pandas data. Have you noticed a specific part of the PacWave example that is slowing down?

Apologies my previous message was not clear.

Yes there are issues with the computation (not just the conversion) on the converted xarrays. Specifically with the calculation of Tp.

This morning I created the following script and found Tp to be the primary bottle neck. I commented out the built in MHKiT Tp and used MHKiT v0.7.0 pandas version inline and found significant speed increases ~60% faster.

Time results using built in Tp

Average time for Hm0: 3.2555 seconds
Average time for Te: 4.0896 seconds
Average time for J: 3.3126 seconds
Average time for Tp: 17.2465 seconds
Average time for Tz: 4.1233 seconds
Average time to calculate metrics in a year: 32.0306 seconds
Total time for all metrics: 160.1376 seconds

Time results using previous pandas Tp

Look at the results for the Tp calculation vs the others:

Average time for Hm0: 3.2567 seconds
Average time for Te: 4.0718 seconds
Average time for J: 3.2766 seconds
Average time for Tp: 0.0022 seconds
Average time for Tz: 4.1557 seconds
Average time to calculate metrics in a year: 14.7664 seconds
Total time for all metrics: 73.8152 seconds

import mhkit
from mhkit.wave import resource, performance, graphics, contours
from mhkit.wave.io import ndbc
import matplotlib.pyplot as plt
import numpy as np
import time
import xarray as xr
import pandas as pd


# Get buoy metadata
buoy_number = "46050"
buoy_metadata = ndbc.get_buoy_metadata(buoy_number)

# Spectral wave density for buoy 46050
parameter = "swden"

# Request list of available files
ndbc_available_data = ndbc.available_data(parameter, buoy_number)

# Pass file names to NDBC and request the data
filenames = ndbc_available_data["filename"]
ndbc_requested_data = ndbc.request_data(parameter, filenames)

ndbc_data = {}
# Create a Datetime Index and remove NOAA date columns for each year
for year in ndbc_requested_data:
    year_data = ndbc_requested_data[year]
    ndbc_data[year] = ndbc.to_datetime_index(parameter, year_data)


# Initialize empty lists to store the results from each year
Hm0_list = []
Te_list = []
J_list = []
Tp_list = []
Tz_list = []

# Dictionaries to store the time taken by each function
timing_data = {
    "Hm0": [],
    "Te": [],
    "J": [],
    "Tp": [],
    "Tz": []
}

# List to store the total time for each year
yearly_times = []

for year in list(ndbc_data.keys())[:5]:
    year_data_xr = xr.Dataset(year_data.T)
    year_start_time = time.time()  # Start time for the year

    data_raw = ndbc_data[year]
    year_data = data_raw[data_raw != 999.0].dropna()
    
    start_time = time.time()
    Hm0_list.append(resource.significant_wave_height(year_data.T))
    timing_data["Hm0"].append(time.time() - start_time)
    
    start_time = time.time()
    Te_list.append(resource.energy_period(year_data.T))
    timing_data["Te"].append(time.time() - start_time)
    
    start_time = time.time()
    J_list.append(resource.energy_flux(year_data.T, h=399.0))
    timing_data["J"].append(time.time() - start_time)

    start_time = time.time()
    # Tp_list.append(resource.peak_period(year_data.T))
    fp = year_data.T.idxmax(axis=0)
    Tp = 1/fp
    Tp = pd.DataFrame(Tp, index=year_data.T.columns, columns=["Tp"])
    timing_data["Tp"].append(time.time() - start_time)
    
    start_time = time.time()
    Tz_list.append(resource.average_zero_crossing_period(year_data.T))
    timing_data["Tz"].append(time.time() - start_time)
   
    yearly_times.append(time.time() - year_start_time) 

# Print the timing data for each function
total_time = 0
for func_name, times in timing_data.items():
    avg_time = np.mean(times)
    sum_time = np.sum(times)
    print(f"Average time for {func_name}: {avg_time:.4f} seconds")
    # print(f"Total time for {func_name}: {sum_time:.4f} seconds")
    total_time += sum_time



# Calculate and print the average time to calculate metrics in a year
avg_yearly_time = np.mean(yearly_times)
print(f"Average time to calculate metrics in a year: {avg_yearly_time:.4f} seconds")

# Print the total time for all metrics
print(f"Total time for all metrics: {total_time:.4f} seconds")

akeeste · 2024-05-21T15:13:49Z

@ssolson Okay understood. Seems odd as the methodology is essentially the same now, but applied on an xarray dataset (and with type handling). I wonder if the delay is mostly in the type conversion or xarray's idxmax function.

How does the expense compare if the spectra is already an xarray Dataset (type handling eliminated) and we calculate Tp with:

   fp = S.idxmax(dim=frequency_dimension)  # Hz
   Tp = 1 / fp

ssolson · 2024-05-21T15:31:27Z

@ssolson Okay understood. Seems odd as the methodology is essentially the same now, but applied on an xarray dataset (and with type handling). I wonder if the delay is mostly in the type conversion or xarray's idxmax function.

How does the expense compare if the spectra is already an xarray Dataset (type handling eliminated) and we calculate Tp with:
   fp = S.idxmax(dim=frequency_dimension)  # Hz
   Tp = 1 / fp

That is what I found. Finding the argmax is a very expensive operation on the xarray.

I spent some time trying to fix this but I realized I was getting outside the scope of this PR pretty quickly.

So I wanted to circle back with you and see if time gains from a full implementation of xarray are to be expected or if we should reconsider our approach?

akeeste · 2024-05-22T14:52:44Z

I don't expect a significant improvement in computational expense in the future just by using xarray. Let's discuss approaches at our next monthly meeting.

ssolson · 2024-05-22T17:49:47Z

@akeeste can you merge the fixes in akeeste#5?

* fix cdip * D3D TRTS eg fix * fix pylint error * pylint * fix too many branches * fix map and Tp * folium fix & Tp using pandas * up to python 3.10 * py version as string * fix variable check for new xarray * undo changes

ssolson · 2024-05-23T13:58:54Z

mhkit/wave/performance.py

+    LM = convert_to_dataset(LM).to_array()
+    JM = convert_to_dataset(JM).to_array()
+    frequency = convert_to_dataset(frequency).to_array()


This change is causing the tests to fail. I'm looking into it but if you are around @akeeste I know you are more familiar with these changes.

@akeeste can you tell me the reasoning behind this change? Is it simply a performance thing to go Dataset to array instead just to array?

The easiest solution is to skip the conversion if it is already a DataArray. Otherwise we are going to need to pass the name around to fix this.

Do you have a preference?

@ssolson I fixed these few lines. To be honest, I can't recall at the moment why I made this change, but I reverted it and the power_performance_workflow tests are passing now.

jmcvey3 · 2024-06-04T17:00:41Z

If I can jump in, a part of this problem might be because yall are using Datasets for single variables rather than DataArrays. This line fp = S.idxmax(dim=frequency_dimension) will run a lot faster as a DataArray using fp = S["Frequency"].where(S==S.max("Frequency"), drop=True).

If you want to keep the DataSets that line gets more complicated, but is still a touch faster: fp = S["0"]["Frequency"].where(S["0"]==S["0"].max("Frequency"), drop=True)

ssolson · 2024-06-11T13:25:38Z

If I can jump in, a part of this problem might be because yall are using Datasets for single variables rather than DataArrays. This line fp = S.idxmax(dim=frequency_dimension) will run a lot faster as a DataArray using fp = S["Frequency"].where(S==S.max("Frequency"), drop=True).

If you want to keep the DataSets that line gets more complicated, but is still a touch faster: fp = S["0"]["Frequency"].where(S["0"]==S["0"].max("Frequency"), drop=True)

@jmcvey3 thank you for jumping in. I have not tried this yet but looks promising.

From our meeting yesterday @akeeste is going to address the outstanding issues to pass tests and get these bug fixes through.

Because the proposed solution changes the general approach we are going to address this in a follow on PR focused on the larger xarray strategy.

…nto doc_fixes

akeeste · 2024-06-13T17:20:03Z

There's still a couple tests failing. I'm not sure why the mooring test is failing since I only updated the docstring (for better formatting in the built docs). I'll continue looking into this and narrow in on an issue in the metocean example.

akeeste · 2024-06-13T18:57:09Z

Part of matplotlib, unrelated to this PR, that is called in the mooring animation is depreciated in 3.9+, causing the mooring_test to fail here. I'm guessing that if we re-run tests on dev, it would fail there too. @hivanov-nrel any insight on how we can fix this?

hivanov-nrel · 2024-06-13T20:19:40Z

Hi Adam, it looks like the issue is on line 89 and line 129. The inputs needs to be a sequence so the fix for this is putting brackets around them like so:

line.set_data([x_data], [y_data])

Do you mind implementing this on your PR or should I do this fix separately?

akeeste · 2024-06-13T20:25:55Z

Hi Adam, it looks like the issue is on line 89 and line 129. The inputs needs to be a sequence so the fix for this is putting brackets around them like so:

line.set_data([x_data], [y_data])

Do you mind implementing this on your PR or should I do this fix separately?

Thanks for taking a quick look @hivanov-nrel. That fixed mooring_test, I'll add it to this PR.

ssolson

Adam this looks good to me. If this is ready to merge it would help me bc #330 depends on these changes and I am being forced to run hindcast tests on that PR currently.

# MHKiT v0.8.1 MHKiT v0.8.1, includes bug fixes in the example notebooks and fixes the dependencies to requirements updates prior to Numpy 2.0.0. Fixes MHKIT v0.8.0 installation issues (#334) by fixing the dependencies. - #335 Fixes bugs in MHKiT example notebooks - #327

akeeste added 2 commits May 14, 2024 15:55

fix formatting in mooring_example.ipynb

3fbb93e

update links in noaa.py

b74789e

akeeste added the documentation Improvements or additions to documentation label May 14, 2024

akeeste changed the base branch from master to develop May 14, 2024 21:27

akeeste added 4 commits May 15, 2024 14:27

fix heading in metocean example

c612823

clean up module docstrings

9064e45

add notebook and python <3.12 to dependencies

afffea8

clarify warning in mooring example

a2b861c

akeeste changed the title ~~Minors fixes for doc strings items~~ v0.8.0 bug fixes and doc string fixes May 17, 2024

akeeste added the bug Something isn't working label May 17, 2024

akeeste added 3 commits May 17, 2024 13:54

update variable naming in hindcast.py

b12cedd

move notebook to a pip install in yaml file

a6d8784

re-add pecos index_to_datetime function to mhkit.utils

3b83724

akeeste added 4 commits May 17, 2024 14:56

update np.product to np.prod per the depreciation warning

bd11f44

fix type checking in wave.performance capture length and maep matrix

44459c9

ensure data is sorted in tidal_example

b722e62

black formatting

71dcd1f

ssolson mentioned this pull request May 20, 2024

Fix Notebooks akeeste/MHKiT-Python#4

Merged

3 tasks

Fix Notebooks (#4)

63f502a

* fix cdip * D3D TRTS eg fix * fix pylint error * pylint * fix too many branches

akeeste marked this pull request as ready for review May 22, 2024 14:52

ssolson mentioned this pull request May 22, 2024

Pacwave Notebook fixes & Tidal Test fixes akeeste/MHKiT-Python#5

Merged

Pacwave Notebook fixes & Tidal Test fixes (#5)

34dd1d5

* fix cdip * D3D TRTS eg fix * fix pylint error * pylint * fix too many branches * fix map and Tp * folium fix & Tp using pandas * up to python 3.10 * py version as string * fix variable check for new xarray * undo changes

ssolson reviewed May 23, 2024

View reviewed changes

akeeste added 5 commits June 11, 2024 10:07

fix valueerror in tidal.graphics

d08a726

slight speedup for utils.type_handling

63a17d8

Merge branch 'doc_fixes' of https://github.com/akeeste/MHKiT-Python i…

8f0ed6b

…nto doc_fixes

revert minor utils type_handling change

5d1bfee

revert dataarray conversion in power_performance_workflow

7c89d7b

fix mooring test

f3852b6

ssolson approved these changes Jun 17, 2024

View reviewed changes

ssolson mentioned this pull request Jun 18, 2024

Xarray Strategy causing significant MHKiT computation times (MHKiT slow) #331

Closed

ssolson merged commit 37254f7 into MHKiT-Software:develop Jun 18, 2024
20 checks passed

ssolson mentioned this pull request Jun 28, 2024

MHKiT v0.8.1 #337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.0 bug fixes and doc string fixes #327

v0.8.0 bug fixes and doc string fixes #327

akeeste commented May 14, 2024 •

edited

Loading

akeeste commented May 15, 2024 •

edited

Loading

akeeste commented May 15, 2024

akeeste commented May 17, 2024 •

edited

Loading

akeeste commented May 17, 2024

akeeste commented May 17, 2024

ssolson commented May 20, 2024 •

edited

Loading

ssolson commented May 20, 2024 •

edited

Loading

ssolson commented May 20, 2024 •

edited

Loading

ssolson commented May 20, 2024

ssolson commented May 21, 2024

akeeste commented May 21, 2024

akeeste commented May 21, 2024

ssolson commented May 21, 2024 •

edited

Loading

akeeste commented May 21, 2024 •

edited by ssolson

Loading

ssolson commented May 21, 2024

akeeste commented May 22, 2024

ssolson commented May 22, 2024

ssolson May 23, 2024

ssolson May 23, 2024

ssolson May 23, 2024

akeeste Jun 13, 2024

jmcvey3 commented Jun 4, 2024 •

edited

Loading

ssolson commented Jun 11, 2024 •

edited

Loading

akeeste commented Jun 13, 2024

akeeste commented Jun 13, 2024

hivanov-nrel commented Jun 13, 2024

akeeste commented Jun 13, 2024

ssolson left a comment

v0.8.0 bug fixes and doc string fixes #327

v0.8.0 bug fixes and doc string fixes #327

Conversation

akeeste commented May 14, 2024 • edited Loading

akeeste commented May 15, 2024 • edited Loading

akeeste commented May 15, 2024

akeeste commented May 17, 2024 • edited Loading

akeeste commented May 17, 2024

akeeste commented May 17, 2024

ssolson commented May 20, 2024 • edited Loading

ssolson commented May 20, 2024 • edited Loading

ssolson commented May 20, 2024 • edited Loading

ssolson commented May 20, 2024

ssolson commented May 21, 2024

akeeste commented May 21, 2024

akeeste commented May 21, 2024

ssolson commented May 21, 2024 • edited Loading

Time results using built in Tp

Time results using previous pandas Tp

akeeste commented May 21, 2024 • edited by ssolson Loading

ssolson commented May 21, 2024

akeeste commented May 22, 2024

ssolson commented May 22, 2024

ssolson May 23, 2024

Choose a reason for hiding this comment

ssolson May 23, 2024

Choose a reason for hiding this comment

ssolson May 23, 2024

Choose a reason for hiding this comment

akeeste Jun 13, 2024

Choose a reason for hiding this comment

jmcvey3 commented Jun 4, 2024 • edited Loading

ssolson commented Jun 11, 2024 • edited Loading

akeeste commented Jun 13, 2024

akeeste commented Jun 13, 2024

hivanov-nrel commented Jun 13, 2024

akeeste commented Jun 13, 2024

ssolson left a comment

Choose a reason for hiding this comment

akeeste commented May 14, 2024 •

edited

Loading

akeeste commented May 15, 2024 •

edited

Loading

akeeste commented May 17, 2024 •

edited

Loading

ssolson commented May 20, 2024 •

edited

Loading

ssolson commented May 20, 2024 •

edited

Loading

ssolson commented May 20, 2024 •

edited

Loading

ssolson commented May 21, 2024 •

edited

Loading

akeeste commented May 21, 2024 •

edited by ssolson

Loading

jmcvey3 commented Jun 4, 2024 •

edited

Loading

ssolson commented Jun 11, 2024 •

edited

Loading