Speed up wave.resource module #352

akeeste · 2024-09-17T19:07:12Z

@ssolson This is a follow-up to my other wave PRs and resolves #331. Handling the various edge cases robustly in pure numpy is difficult, so I want to first resolve #331 by using DataArrays throughout the wave resource functions instead of Datasets.

Similar to Ryan's testing mentioned in #331, I found that using DataArrays/Pandas has a 1000x speed up vs Datasets for very large input data. This should restore MHKiT's speed to it's previous state. Using a pure numpy base would have an additional 5-10x speed up from DataArrays, but I think the current work with DataArrays will:

be sufficient for our users
be easier to develop with
be easier to handle edge cases

…ay base

This reverts commit a2d5f61.

…aframes and 2+ var datasets

akeeste · 2024-10-10T17:26:09Z

@ssolson This PR is ready for review. Tests should pass now

With some modifications to the type handling functions, and an appropriate frequency_dimension input if required, the wave.resource functions should handle Pandas Series, Pandas DataFrames, and xarray DataArrays regardless of input shape, dimensions names, dimension order, etc. I largely moved away from converting data to xarraay Datasets because they are slow and more difficult to work with.

akeeste · 2024-10-10T17:31:43Z

mhkit.loads.graphics is unchanged, so I'm not sure why the pylint loads test is now failing on the number of positional arguments there. This branch is up to date with develop

ssolson · 2024-10-14T10:53:46Z

mhkit.loads.graphics is unchanged, so I'm not sure why the pylint loads test is now failing on the number of positional arguments there. This branch is up to date with develop

Pylint added new warnings around the way it wants users to handle positional arguments. I'll address it.

ssolson · 2024-10-14T14:30:05Z

mhkit.loads.graphics is unchanged, so I'm not sure why the pylint loads test is now failing on the number of positional arguments there. This branch is up to date with develop

Pylint added new warnings around the way it wants users to handle positional arguments. I'll address it.

Addressed in #357

Let's merge #357 then make sure the tests pass here.

akeeste · 2024-10-14T14:48:43Z

Thanks @ssolson. I'll merge that in here and fix a couple minor items with some examples

akeeste · 2024-10-14T15:20:27Z

@akeeste TODO:

fix a couple last issues with example notebooks
reduce time required for example notebooks to enforce speed back to previous benchmark

into speed_up_wave

akeeste · 2024-10-14T18:49:59Z

@ssolson this PR is now ready for review and all tests are passing. I tightened up the timing on the environmental contours, 3 extreme response, and PacWave examples.

A straight forward test case on the difference in computational expense is using a wave resource function (e.g. energy_period) with a year of NDBC spectral data, or repeating Ryan's script in #331

ssolson

@akeeste overall this addresses the issue. Thanks for putting this together. I have just a couple questions and a few minor clean up items.

ssolson · 2024-10-16T14:12:55Z

mhkit/tests/loads/test_loads.py

@@ -442,11 +440,9 @@ def test_mler_export_time_series(self):
        mler["WaveSpectrum"] = self.mler["Norm_Spec"].values
        mler["Phase"] = self.mler["phase"].values
        k = resource.wave_number(wave_freq, 70)
-        k = k.fillna(0)
+        np.nan_to_num(k, 0)


This returns k so it would need to be:
k=nan_to_num(k,0)

However k has no nans so I don't think this is needed.

The zero frequency in wave_freq results in a nan value. The call to np.nan_to_num() updates the input k in my testing, which is useful if k is not a numpy array as the input data is both updated and retains its type. If it's more clear for this particular test, I can update to redefine k

This is not a limiting factor to getting the PR through but are you saying np.nan_to_num modifies k inplace when you run it?

The docs say the function returns the modified array and that was my experience when I paused the code here:
https://numpy.org/doc/2.0/reference/generated/numpy.nan_to_num.html

ssolson · 2024-10-16T14:53:31Z

mhkit/tests/wave/test_resource_metrics.py

@@ -95,7 +95,8 @@ def test_kfromw(self):

            expected = self.valdata1[i]["k"]
            k = wave.resource.wave_number(f, h, rho)
-            calculated = k.loc[:, "k"].values
+            # calculated = k.loc[:, "k"].values


ssolson · 2024-10-16T14:57:50Z

mhkit/utils/type_handling.py

+            # Rename to "variable" to match how multiple Dataset variables get converted into a DataArray dimension
+            data = xr.DataArray(data)
+            if data.dims[1] == "dim_1":
+                # Slight chance their is already a name for the columns


their => there

ssolson · 2024-10-16T14:59:58Z

mhkit/wave/performance.py

+    LM: pandas DataFrame, xarray DatArray, or xarray Dataset
        Capture length
-    JM: pandas DataFrame or xarray Dataset
+    JM: pandas DataFrame, xarray DatArray, or xarray Dataset
        Wave energy flux
-    frequency: pandas DataFrame or xarray Dataset
+    frequency: pandas DataFrame, xarray DatArray, or xarray Dataset


xararray "DataArray" not "DatArray"

ssolson · 2024-10-16T15:37:10Z

mhkit/wave/resource.py

@@ -87,26 +86,24 @@ def elevation_spectrum(
            + "temporal spacing for eta."
        )

-    S = xr.Dataset()
-    for var in eta.data_vars:


This for loop allowed users to process multiple wave heights in a DataFrame ect. Removing it means the user can only process one eta at a time.

Does this update require use to remove this functionality? Would it make sense to be able to parse each variable in a DataSet into a DataArray? Or to have the user create any needed loop outside of the funtion for simplicity?

E.g. the following test will fail. Can we make this work?

def test_elevation_spectrum_multiple_variables(self): time = np.linspace(0, 100, 1000) eta1 = np.sin(2 * np.pi * 0.1 * time) eta2 = np.sin(2 * np.pi * 0.2 * time) eta3 = np.sin(2 * np.pi * 0.3 * time) eta_dataset = xr.Dataset({ 'eta1': (['time'], eta1), 'eta2': (['time'], eta2), 'eta3': (['time'], eta3) }, coords={'time': time}) sample_rate = 10 nnft = 256 spectra = wave.resource.elevation_spectrum( eta_dataset, sample_rate, nnft )

If so lets finish out this test and add this to the test suite.

Users can still input Datasets, but right now all variables must have the same dimensions, it will be converted to a DataArray up front. The function then returns a DataArray.

I'll look at reinstating a dataset/dataframe loop so that those types are returned

I added this loop again. Our previous slowdown of large pandas DataFrames --> xarray datasets could occur in these two functions now, but I don't think that is a typical use case. For example in the case described in #331, it's unlikely a user would have thousands of different wave elevation time series and convert them all to wave spectra (likewise for thousands of distinct spectra being converted to elevation time series). If that case does come up, the slow down should not be due to our implementation but the large quantity of data involved.

ssolson · 2024-10-17T16:18:04Z

mhkit/wave/resource.py


    omega = xr.DataArray(
        data=2 * np.pi * f, dims=frequency_dimension, coords={frequency_dimension: f}
    )

-    eta = xr.Dataset()
-    for var in S.data_vars:


Same as above removed the ability to iterate of multiple columns, but we still accept DataSets and multi column pandas

ssolson · 2024-10-17T16:22:00Z

mhkit/wave/resource.py

@@ -1153,7 +1164,7 @@ def wave_number(f, h, rho=1025, g=9.80665, to_pandas=True):
    """
    if isinstance(f, (int, float)):
        f = np.asarray([f])
-    f = convert_to_dataarray(f)
+    # f = convert_to_dataarray(f)


…levation_spectrum

…vation

akeeste · 2024-10-17T20:12:09Z

@ssolson I addressed all your comments and again allowed datasets into wave.resource.surface_elevation and wave.resource.elevation_spectrum

akeeste added 4 commits September 12, 2024 14:50

fix assignment in type_handling

fcc910e

temporary testing file

b19217c

initial conversion of energy_period and frequency_moment to DataArray

5addc17

energy_period working with variety of types and converting to dataArr…

ac91b92

…ay base

akeeste requested a review from ssolson September 17, 2024 19:07

akeeste added 13 commits September 24, 2024 14:34

extend xr.dataarray basis to all wave.resource functions

5d914a3

remove testing script

8683b88

black formatting

e860034

fix most test formatting

3382825

use dataarrays instead of datasets in wave.performance

c5241af

revert surface_elevation function back to datasets

a2d5f61

Revert "revert surface_elevation function back to datasets"

e016fff

This reverts commit a2d5f61.

allow datasets, 2d dataframes. Update test formatting appropriately

a719620

simplify and improve robustness of convert_to_dataarray for 1-var dat…

d585cb5

…aframes and 2+ var datasets

update test formatting

ac5b436

clean up frequency_bin and method checks in elevation_surface

afc7f8c

update and annotate type_handling

8f1647f

black formatting

c0d72d0

akeeste marked this pull request as ready for review October 2, 2024 18:34

akeeste marked this pull request as draft October 2, 2024 18:44

akeeste added 5 commits October 2, 2024 14:08

minor type fix

8dddf42

update type references in loads

3a170ff

update type references in loads - v2

42a85d8

black formatting

28b847b

fix call to fillna() in MLER example

2f04e87

akeeste marked this pull request as ready for review October 10, 2024 17:24

fix references to k in MLER example

d3fcc02

add variable names to Hm0, Te, and Tp for DataFrame creation

07b5033

akeeste added 2 commits October 10, 2024 13:53

update pd.Series naming in examples

4811202

fix typo in pacwave notebook

77c1980

akeeste added 7 commits October 14, 2024 10:43

add variable names after data conversion

6dbbc18

Merge branch 'develop' of https://github.com/MHKiT-Software/MHKiT-Python

939d730

into speed_up_wave

update pacwave example

664d2e9

add type check to all wave.resource variable naming

dad0d0a

update wave example with new data types

853521d

pull buoy name from metadata in cdip example

28e67f6

tighten up example timing

69e694b

ssolson assigned akeeste Oct 16, 2024

ssolson added wave module Clean Up Improve code consistency and readability labels Oct 16, 2024

ssolson reviewed Oct 17, 2024

View reviewed changes

akeeste added 5 commits October 17, 2024 12:27

address minor review comments, add some type checking tests

243a999

complicated dataset and dataframe handling in surface_elevation and e…

a7241f3

…levation_spectrum

restore and simplify dataset input to elevation_spectrum, surface_ele…

d13bb22

…vation

black formatting

0b30366

update missed docstring

0c5f5be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up wave.resource module #352

Speed up wave.resource module #352

akeeste commented Sep 17, 2024 •

edited

Loading

akeeste commented Oct 10, 2024

akeeste commented Oct 10, 2024

ssolson commented Oct 14, 2024

ssolson commented Oct 14, 2024

akeeste commented Oct 14, 2024

akeeste commented Oct 14, 2024 •

edited

Loading

akeeste commented Oct 14, 2024 •

edited

Loading

ssolson left a comment

ssolson Oct 16, 2024

akeeste Oct 17, 2024

ssolson Oct 18, 2024 •

edited

Loading

ssolson Oct 16, 2024

ssolson Oct 16, 2024

ssolson Oct 16, 2024

ssolson Oct 16, 2024

akeeste Oct 17, 2024

akeeste Oct 17, 2024

ssolson Oct 17, 2024

ssolson Oct 17, 2024

akeeste commented Oct 17, 2024

Speed up wave.resource module #352

Are you sure you want to change the base?

Speed up wave.resource module #352

Conversation

akeeste commented Sep 17, 2024 • edited Loading

akeeste commented Oct 10, 2024

akeeste commented Oct 10, 2024

ssolson commented Oct 14, 2024

ssolson commented Oct 14, 2024

akeeste commented Oct 14, 2024

akeeste commented Oct 14, 2024 • edited Loading

akeeste commented Oct 14, 2024 • edited Loading

ssolson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssolson Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akeeste commented Oct 17, 2024

akeeste commented Sep 17, 2024 •

edited

Loading

akeeste commented Oct 14, 2024 •

edited

Loading

akeeste commented Oct 14, 2024 •

edited

Loading

ssolson Oct 18, 2024 •

edited

Loading