Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept of "buffer_description_api" and xarray reference API bridge #1513

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

samuelgarcia
Copy link
Contributor

Given the idea of @bendichter and this gist

Here a very very early stage of a very very proof of concept of the "buffer_description_api" for analogsignal chunk.

The idea/goal is:

  • have a simple buffer description for analogsignal for binary or hdf5 cases (maybe more)
  • this description can be exported to json
  • this description can be easily transformed to the xarray/zarr reference external API
  • make so file cloud ready for streaming chunk of traces

See also kerchunk

rfs = dict()
rfs["version"] = 1
rfs["refs"] = dict()
rfs["refs"][".zgroup"] = json.dumps(dict(zarr_format=2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the dumps command anymore

neo/rawio/xarray_utils.py Outdated Show resolved Hide resolved
@bendichter
Copy link
Contributor

Looks good!

@apdavison apdavison added this to the future milestone Jul 26, 2024
…_buffer_description_api=True

This should also solve the memmap and memmory leak problem.
doc/source/rawio.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me.

doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>
Copy link
Contributor

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first reading. I think that this PR and the future documentation would benefit greatly if we have the schema of the buffer description somewhere. It does not have to be a formal schema (although that would be great) it could be just a description on the documentation or a python data class with types. Something that I can reference to see what should I expect to fill when I am doing a buffer like this.

For reading analog signals **neo.rawio** has 2 important concepts:

1. The **signal_stream** : it is a group of channels that can be read together using :func:`get_analog_signal_chunk()`.
This group of channels is guaranteed to have the same sampling rate, and the same duration per segment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have logical channels, this should be same units as well, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment not yet. The API do not enforce and ensure this.
See this https://github.com/NeuralEnsemble/python-neo/blob/master/neo/rawio/baserawio.py#L118
Ideally units should be add but some IO are mixing maybe the units in the same stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add it as an ideal? Or would you rather first do the changes and then change it here?

I can close this for example:

#1133

doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Outdated Show resolved Hide resolved
doc/source/rawio.rst Show resolved Hide resolved
@@ -115,7 +116,7 @@ def _parse_header(self):
head_offset = info["sections"]["DataSection"]["uBlockIndex"] * BLOCKSIZE
totalsize = info["sections"]["DataSection"]["llNumEntries"]

self._raw_data = np.memmap(self.filename, dtype=sig_dtype, mode="r", shape=(totalsize,), offset=head_offset)
# self._raw_data = np.memmap(self.filename, dtype=sig_dtype, mode="r", shape=(totalsize,), offset=head_offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be commented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You lean removed ?
Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah, we can get stuff from the diff.

@@ -149,6 +151,9 @@ class BaseRawIO:

rawmode = None # one key from possible_raw_modes

# If true then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not clear, if what is True?

@@ -1284,6 +1298,7 @@ def _get_signal_size(self, block_index: int, seg_index: int, stream_index: int):

All channels indexed must have the same size and t_start.
"""
# must NOT be implemented if has_buffer_description_api=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be ignored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is done in the BaseRawWithBufferApiIO sub class


def get_memmap_chunk_from_opened_file(fid, num_channels, start, stop, dtype, file_offset=0):
"""
Utility fonction to get a chunk as a memmap array directly from an opened file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo fonction

Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple more comments from me too.

doc/source/rawio.rst Show resolved Hide resolved
self._buffer_descriptions[0][seg_index][buffer_id] = {
"type" : "raw",
"file_path" : str(self.filename),
"dtype" : str(sig_dtype),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for jsonification? rather than store a np.dtype()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a data layout is split with a structure theorically we could handle it in zarr giving the list of chunks.
This could be done a the futur but this will not be in the actual PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str(sig_dtype) is for jsonification yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some without this safety step. So if we really want to be safe we should do this for all uses of dtype and Path no?

neo/rawio/baserawio.py Outdated Show resolved Hide resolved
for seg_index in self._hdf5_analogsignal_buffers[block_index].keys():
for buffer_id, h5_file in self._hdf5_analogsignal_buffers[block_index][seg_index].items():
h5_file.close()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no del here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oups yes.
tahnks

self._buffer_descriptions[0][0][buffer_id] = {
"type" : "raw",
"file_path" : binary_filename,
"dtype" : sig_dtype,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you didn't stringify like above. Why?

@samuelgarcia
Copy link
Contributor Author

Merci beaucoup @zm711 and @h-mayorquin for the review

samuelgarcia and others added 3 commits October 21, 2024 14:09
Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>
Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants