-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acoustics Module #349
base: develop
Are you sure you want to change the base?
Acoustics Module #349
Conversation
# MHKiT v0.8.1 MHKiT v0.8.1, includes bug fixes in the example notebooks and fixes the dependencies to requirements updates prior to Numpy 2.0.0. Fixes MHKIT v0.8.0 installation issues (MHKiT-Software#334) by fixing the dependencies. - MHKiT-Software#335 Fixes bugs in MHKiT example notebooks - MHKiT-Software#327
…-Software#326) * tke updates * Fix shear velocity functions * More detail for tke shear production * Don't rotate heading beyond 360 degrees * Fix some typos in notebook * Rename deprecated function
…#340) * Test: Determine method using input frequency index * Feat: Use sum of sines if ifft is not computable This change allows `surface_elevation` to return a result if the user inputs a spectrum with a frequency index that does not have a zero frequency. If the non zero frequency index condition is found when the method is `ifft` we warn the user and change the method to `sum_of_sines` * Fix: Use previously found frequency index S.index may not exist for some input datasets, but f[0] does and we should use the value of f[0] here. * Test: Warn when using ifft with a non zero frequency * Lint
This PR adds a Github action to test the example notebooks as part of or CD pipeline. Additionally a timeout is added to which notebooks will fail if they exceed the given time.
This PR addresses MHKiT-Software#315 by: - Changes the data called by the Wind Toolkit tests so that they run faster - Updates the .csv files that the tests compare against - Updates a few descriptions in the metocean example, fixes a sorting issue, reduces the data downloaded there - Tests in hindcast match the notebooks and use the same cache.
Bug fix by removing matplotlib version check needed for < 3.8.0
A dictionary could work in some cases, and so could updating a variable name. I think it becomes a case of "when am I making the code harder to read or more complicated than it needs to be for the sake of an arbitrary test?". Solving "too many variables" will probably break "too many statements" and/or vice versa. A better coded way of solving "too many arguments" is to use |
I resolved the issues without ignores in a new PR on your branch. I need to write tests still for some of the functions I broke out. Dictionaries make the code harder to read imo but should also force you to group like variables. In my experience trying to follow pylint is annoying, and takes longer but ultimately makes the code more modular and easier to work with long term. We're fully the realm of preferences with the "Refactor" codes so there is no right or wrong. My goal is to not use inline ignores unless there is no other way. Then with global ignores I think it is hard to justify that there should not be some limit on too many locals and too many arguments. So at that point it just boils down to what the limit should be and again there is no right answer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhkit/acoustics/io.py
Outdated
f.read(4) # wave_key | ||
list_key = f.read(4) | ||
|
||
# Skip metadata if it exits (don't know how to parse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this "don't know how to parse" should be removed here and added to the docstring... I don't know what LIST metadata is but something like:
Read .wav file from a hydrophone. Returns voltage timeseries if sensitivity not
provided, returns pressure timeseries if it is provided. If present "LIST" metadata is not parsed.
Then update this comment to just say # Skip metadata if it exits
mhkit/acoustics/io.py
Outdated
peak_voltage=None, | ||
sensitivity=None, | ||
gain=0, | ||
start_time="2024-06-06T00:00:00", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the significance of this start time? Should it be None or epoch time zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The date is random; it shows the correct time format to input cause people don't always look at docstrings
mhkit/acoustics/base.py
Outdated
def band_average( | ||
spsdl, octave=3, fmin=10, fmax=100000, method="median", method_arg=None | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function poorly named?
Its called band average but defaults to median... but it can also process min , max, quantile ect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, yes it is then. I'll have to think about that one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called test_base or should base be called analysis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote: warning: File examples/data/acoustics/RBW_6661_20240601_053114.wav is 87.89 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
Any way we could get this a little smaller and maintain a good example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote: warning: File examples/data/acoustics/6247.230204150508.wav is 54.93 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
Any way we could get this a little smaller and maintain a good example?
mhkit/acoustics/io.py
Outdated
list_key = f.read(4) | ||
|
||
# Skip metadata if it exits (don't know how to parse) | ||
if "LIST" in list_key.decode(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the files we are testing have have "LIST" in them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LIST is an optional header keyword in a .wav file, if I remember right. I've only used Ocean Sonics and Sountrap hydrophones, and of those two just the former has a LIST keyword.
mhkit/acoustics/io.py
Outdated
|
||
# Read data using scipy cause that's easier (will auto drop as int16 or int32) | ||
fs, raw = wavfile.read(filename) | ||
length = raw.shape[0] // fs # length of recording in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check that the number of samples is larger than the sampling frequency otherwise this integer division will return 0 and throw unclear errors because the user's sample time is less than 1s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it shouldn't matter the total time segment is less than 1 second
mhkit/acoustics/base.py
Outdated
|
||
# Use dolfyn PSD | ||
binner = VelBinner(n_bin=win, fs=fs, n_fft=win) | ||
# Always 50% overlap if numbers reshape perfectly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this comment. I had to look this up but here is my understanding:
- Overlapping refers to the technique where consecutive windows share some portion of their data.
So if I create a 10s dataset with 100 samples with no overlap this should return a len of 10.
If I use the same dataset with a 50% overlap I would expect this to return 19 values, however, only 10 samples are being return indicating to me there is no overlap.
Am I misunderstanding the use of overlapping here or missing something?
# Create some sample pressure data
time = np.arange(0, 10, 0.1)
data = np.sin(time)
pressure = xr.DataArray(data, coords=[time], dims=["time"], attrs={"units": "Pa", "fs": 100})
fs = len(time)
window = 0.1 # seconds
win_samples = int(window * fs)
# Calculate expected number of segments
overlap = 0.5
step = int(win_samples * (1 - overlap))
expected_segments = (len(pressure) - win_samples) // step + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you'd have 19 segments, from which you're calculating periodograms for each. Segments and windows are different concepts.
I'm not an expert in fast fourier transforms, but here is the original document for Welch's method for power spectra, which is the mathematical explanation for that dolfyn function: https://ieeexplore.ieee.org/abstract/document/1161901
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay if it should be 19 then I don't think the windowing in the function is working correctly.
Can you review this test?
def test_sound_pressure_spectral_density(self):
"""
Test sound pressure spectral density calculation.
"""
# Create some sample pressure data
time = np.arange(0, 10, 0.1)
data = np.sin(time)
pressure = xr.DataArray(data, coords=[time], dims=["time"], attrs={"units": "Pa", "fs": 100})
# Adjust window size to get multiple segments
fs = 100
window = 0.1 # seconds
win_samples = int(window * fs)
# Run the spectral density function
spsd = acoustics.sound_pressure_spectral_density(pressure, fs=fs, window=window)
# Assert that output is an xarray DataArray with expected dimensions
self.assertIsInstance(spsd, xr.DataArray)
self.assertIn("freq", spsd.dims)
self.assertIn("time", spsd.dims)
self.assertEqual(spsd.attrs["units"], "Pa^2/Hz")
self.assertEqual(spsd.attrs["window"], f"{window}s")
# Calculate expected number of segments
overlap = 0.5
step = int(win_samples * (1 - overlap))
expected_segments = (len(pressure) - win_samples) // step + 1
self.assertEqual(spsd.shape[0], expected_segments)
======================================================================
FAIL: test_sound_pressure_spectral_density (main.TestAnalysis.test_sound_pressure_spectral_density)
Test sound pressure spectral density calculation.Traceback (most recent call last):
File "C:\Users\sterl\Codes\MHKiT-Python\mhkit\tests\acoustics\test_analysis.py", line 196, in > test_sound_pressure_spectral_density
self.assertEqual(spsd.shape[0], expected_segments)
AssertionError: 10 != 19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, segments in this case are not "windows". It also looks like I mispoke it seems. It's been a long time since I've dug into it, but "windowing" is the term used to describe smoothing functions (or filters) of a specified length that are applied atop of the signal. The 50% part comes in based on the amount of data the window, i.e. smoothing function, covers.
I'm digging into textbooks at this point to try and explain it to you.
For these functions the word "window" is equivalent to "bin size" or "ensemble", or the number of points to discretize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced all the instances of "window" with "bin" and updated the docstring to try and avoid confusion
Acoustics module
@ssolson Git wouldn't let me push to your branch, so I merged your PR and then committed my changes. A couple "dry" changes, and some other edits that are relatively minor but better for readability. |
…ython into acoustics_module
@jmcvey3 could you take a look at my comments above? |
…ython into acoustics_module
Functions and example notebook for reading passive acoustics data from hydrophones.