Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acoustics Module #349

Open
wants to merge 73 commits into
base: develop
Choose a base branch
from
Open

Conversation

jmcvey3
Copy link
Contributor

@jmcvey3 jmcvey3 commented Aug 29, 2024

Functions and example notebook for reading passive acoustics data from hydrophones.

jmcvey3 and others added 30 commits May 30, 2024 18:30
# MHKiT v0.8.1
MHKiT v0.8.1, includes bug fixes in the example notebooks and fixes the dependencies to requirements updates prior to Numpy 2.0.0.

Fixes MHKIT v0.8.0 installation issues (MHKiT-Software#334) by fixing the dependencies.
- MHKiT-Software#335

Fixes bugs in MHKiT example notebooks
- MHKiT-Software#327
…-Software#326)

* tke updates

* Fix shear velocity functions

* More detail for tke shear production

* Don't rotate heading beyond 360 degrees

* Fix some typos in notebook

* Rename deprecated function
…#340)

* Test: Determine method using input frequency index

* Feat: Use sum of sines if ifft is not computable

This change allows `surface_elevation` to return a result if the user
inputs a spectrum with a frequency index that does not have a zero
frequency.

If the non zero frequency index condition is found when the method is
`ifft` we warn the user and change the method to `sum_of_sines`

* Fix: Use previously found frequency index

S.index may not exist for some input datasets, but f[0] does and we
should use the value of f[0] here.

* Test: Warn when using ifft with a non zero frequency

* Lint
This PR adds a Github action to test the example notebooks as part of or CD pipeline. Additionally a timeout is added to which notebooks will fail if they exceed the given time.
This PR addresses MHKiT-Software#315 by:
- Changes the data called by the Wind Toolkit tests so that they run faster
- Updates the .csv files that the tests compare against
- Updates a few descriptions in the metocean example, fixes a sorting issue, reduces the data downloaded there
- Tests in hindcast match the notebooks and use the same cache.
Bug fix by removing matplotlib version check needed for < 3.8.0
@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Sep 17, 2024

What would you like to do about the "R" error codes? The number of variables and lines used per function are arbitrary at best.

I am going to try to address the too many locals by using a dictionary.

I am still thinking about the too many arguments. What do you think about keeping the current way the function works but adding the ability for the user to pass a dictionary e.g. {'quantile': 0.25} or {'min': None} ect.?

I have not looked into the too many statements yet.

A dictionary could work in some cases, and so could updating a variable name. I think it becomes a case of "when am I making the code harder to read or more complicated than it needs to be for the sake of an arbitrary test?". Solving "too many variables" will probably break "too many statements" and/or vice versa.

A better coded way of solving "too many arguments" is to use *args and **kwargs, but these concepts may be difficult for new users to understand.

@ssolson
Copy link
Contributor

ssolson commented Sep 17, 2024

What would you like to do about the "R" error codes? The number of variables and lines used per function are arbitrary at best.

I am going to try to address the too many locals by using a dictionary.
I am still thinking about the too many arguments. What do you think about keeping the current way the function works but adding the ability for the user to pass a dictionary e.g. {'quantile': 0.25} or {'min': None} ect.?
I have not looked into the too many statements yet.

A dictionary could work in some cases, and so could updating a variable name. I think it becomes a case of "when am I making the code harder to read or more complicated than it needs to be for the sake of an arbitrary test?". Solving "too many variables" will probably break "too many statements" and/or vice versa.

A better coded way of solving "too many arguments" is to use *args and **kwargs, but these concepts may be difficult for new users to understand.

I resolved the issues without ignores in a new PR on your branch. I need to write tests still for some of the functions I broke out.

Dictionaries make the code harder to read imo but should also force you to group like variables.

In my experience trying to follow pylint is annoying, and takes longer but ultimately makes the code more modular and easier to work with long term. We're fully the realm of preferences with the "Refactor" codes so there is no right or wrong. My goal is to not use inline ignores unless there is no other way. Then with global ignores I think it is hard to justify that there should not be some limit on too many locals and too many arguments. So at that point it just boils down to what the limit should be and again there is no right answer.

Copy link
Contributor

@ssolson ssolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmcvey3 here are a couple of preliminary thoughts I had while trying to fix the pylint issues. I plan to give another pass at this after we get jmcvey3#6 merged. I attempted to fix some but not all of these in my PR against your branch.

f.read(4) # wave_key
list_key = f.read(4)

# Skip metadata if it exits (don't know how to parse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this "don't know how to parse" should be removed here and added to the docstring... I don't know what LIST metadata is but something like:

    Read .wav file from a hydrophone. Returns voltage timeseries if sensitivity not
    provided, returns pressure timeseries if it is provided. If present "LIST" metadata is not parsed.

Then update this comment to just say # Skip metadata if it exits

peak_voltage=None,
sensitivity=None,
gain=0,
start_time="2024-06-06T00:00:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of this start time? Should it be None or epoch time zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date is random; it shows the correct time format to input cause people don't always look at docstrings

Comment on lines 171 to 173
def band_average(
spsdl, octave=3, fmin=10, fmax=100000, method="median", method_arg=None
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function poorly named?

Its called band average but defaults to median... but it can also process min , max, quantile ect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, yes it is then. I'll have to think about that one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called test_base or should base be called analysis?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote: warning: File examples/data/acoustics/RBW_6661_20240601_053114.wav is 87.89 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB

Any way we could get this a little smaller and maintain a good example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote: warning: File examples/data/acoustics/6247.230204150508.wav is 54.93 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB

Any way we could get this a little smaller and maintain a good example?

list_key = f.read(4)

# Skip metadata if it exits (don't know how to parse)
if "LIST" in list_key.decode():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the files we are testing have have "LIST" in them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIST is an optional header keyword in a .wav file, if I remember right. I've only used Ocean Sonics and Sountrap hydrophones, and of those two just the former has a LIST keyword.


# Read data using scipy cause that's easier (will auto drop as int16 or int32)
fs, raw = wavfile.read(filename)
length = raw.shape[0] // fs # length of recording in seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check that the number of samples is larger than the sampling frequency otherwise this integer division will return 0 and throw unclear errors because the user's sample time is less than 1s.

Copy link
Contributor Author

@jmcvey3 jmcvey3 Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it shouldn't matter the total time segment is less than 1 second


# Use dolfyn PSD
binner = VelBinner(n_bin=win, fs=fs, n_fft=win)
# Always 50% overlap if numbers reshape perfectly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this comment. I had to look this up but here is my understanding:

  • Overlapping refers to the technique where consecutive windows share some portion of their data.

So if I create a 10s dataset with 100 samples with no overlap this should return a len of 10.

If I use the same dataset with a 50% overlap I would expect this to return 19 values, however, only 10 samples are being return indicating to me there is no overlap.

Am I misunderstanding the use of overlapping here or missing something?

        # Create some sample pressure data
        time = np.arange(0, 10, 0.1)
        data = np.sin(time)
        pressure = xr.DataArray(data, coords=[time], dims=["time"], attrs={"units": "Pa", "fs": 100})
        
        fs = len(time)
        window = 0.1  # seconds
        win_samples = int(window * fs)
        
        # Calculate expected number of segments
        overlap = 0.5
        step = int(win_samples * (1 - overlap))
        expected_segments = (len(pressure) - win_samples) // step + 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you'd have 19 segments, from which you're calculating periodograms for each. Segments and windows are different concepts.
I'm not an expert in fast fourier transforms, but here is the original document for Welch's method for power spectra, which is the mathematical explanation for that dolfyn function: https://ieeexplore.ieee.org/abstract/document/1161901

Copy link
Contributor

@ssolson ssolson Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay if it should be 19 then I don't think the windowing in the function is working correctly.

Can you review this test?

def test_sound_pressure_spectral_density(self):
    """
    Test sound pressure spectral density calculation.
    """
    # Create some sample pressure data
    time = np.arange(0, 10, 0.1)
    data = np.sin(time)
    pressure = xr.DataArray(data, coords=[time], dims=["time"], attrs={"units": "Pa", "fs": 100})

    # Adjust window size to get multiple segments
    fs = 100
    window = 0.1  # seconds
    win_samples = int(window * fs)

    # Run the spectral density function
    spsd = acoustics.sound_pressure_spectral_density(pressure, fs=fs, window=window)

    # Assert that output is an xarray DataArray with expected dimensions
    self.assertIsInstance(spsd, xr.DataArray)
    self.assertIn("freq", spsd.dims)
    self.assertIn("time", spsd.dims)
    self.assertEqual(spsd.attrs["units"], "Pa^2/Hz")
    self.assertEqual(spsd.attrs["window"], f"{window}s")

    # Calculate expected number of segments
    overlap = 0.5
    step = int(win_samples * (1 - overlap))
    expected_segments = (len(pressure) - win_samples) // step + 1

    self.assertEqual(spsd.shape[0], expected_segments)

======================================================================
FAIL: test_sound_pressure_spectral_density (main.TestAnalysis.test_sound_pressure_spectral_density)
Test sound pressure spectral density calculation.

Traceback (most recent call last):
File "C:\Users\sterl\Codes\MHKiT-Python\mhkit\tests\acoustics\test_analysis.py", line 196, in > test_sound_pressure_spectral_density
self.assertEqual(spsd.shape[0], expected_segments)
AssertionError: 10 != 19

Copy link
Contributor Author

@jmcvey3 jmcvey3 Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, segments in this case are not "windows". It also looks like I mispoke it seems. It's been a long time since I've dug into it, but "windowing" is the term used to describe smoothing functions (or filters) of a specified length that are applied atop of the signal. The 50% part comes in based on the amount of data the window, i.e. smoothing function, covers.

I'm digging into textbooks at this point to try and explain it to you.

For these functions the word "window" is equivalent to "bin size" or "ensemble", or the number of points to discretize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced all the instances of "window" with "bin" and updated the docstring to try and avoid confusion

@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Sep 19, 2024

@ssolson Git wouldn't let me push to your branch, so I merged your PR and then committed my changes. A couple "dry" changes, and some other edits that are relatively minor but better for readability.

@ssolson
Copy link
Contributor

ssolson commented Sep 26, 2024

@jmcvey3 could you take a look at my comments above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acoustics Acoustics Module enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants