Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new spatial foundational types #219

Merged
merged 24 commits into from
Sep 26, 2024
Merged

Add new spatial foundational types #219

merged 24 commits into from
Sep 26, 2024

Conversation

jp-dark
Copy link
Collaborator

@jp-dark jp-dark commented Sep 13, 2024

New classes:

  • SpatialDataFrame: Abstract base class for spatial data frames (point clouds and geometry/polygon data frames).
    • PointCloud: A spatial data frame for point data
    • GeometryDataFrame: A spatial data for polygon data
  • MultiscaleImage: An image class that can contain multiple image levels.
    • ImageProperties: A protocol with the required properties for image levels.
  • SpatialRead: New dataclass for returning read data with coordinate space information.

Note: This PR is ready for review, but will be merged after PR #218

Copy link
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a draft set of changes to the abstract spec detailing the new foundational types (e.g., PointCloud)?

(similar question for the composed types introduced or modified in #220 and #221)

In particular, the SpatialDataFrame introduces some significant new concepts, such as an object that is indexed by something other than a joinid. We need to document these concepts and nail down terminology.

Copy link
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concerns with code, but would very much like to see a companion PR that includes a draft revision of the design (aka abstract) spec.

Base automatically changed from dark/coordinate-space to main September 18, 2024 20:50
- Add new MultiscaleImage
- Add new SpatialDataframe abstract base class
- Add PointCloud subclass of the SpatialDataFrame
- Add GeometryDataFrame subclass of the SpatialDataFrame
- Add shapely as a dependency
* Update docstrings
* Add ValueError error messages in `SpatialRead`
* Rename method `read_level`->`read_region`
@jp-dark
Copy link
Collaborator Author

jp-dark commented Sep 18, 2024

No concerns with code, but would very much like to see a companion PR that includes a draft revision of the design (aka abstract) spec.

I added a new (WIP) PR #223 here. It should be moving out of draft form soon.

@jp-dark jp-dark marked this pull request as ready for review September 18, 2024 21:19
The individual implementations may find a shared base class useful, but
it doesn't need to be included in the abstract specification.
Copy link

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should SpatialDataFrame and PointCloud have a coordinate_space property? MultiscaleImage has this.

To me, it's more important for the "coordinate" types since they can have units other than pixels.

python-spec/src/somacore/__init__.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatialdata.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatialdata.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatialdata.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatialdata.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatialdata.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatialdata.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatialdata.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatialdata.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatialdata.py Outdated Show resolved Hide resolved
Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I have a few specific comments, questions, and suggestions.

Also wondering if we could/should hold off on adding the GeometryDataFrame class? Unless I'm mistaken, I don't believe it's needed for the spot-based assays we're targeting for the initial release? If we do leave it in, I think the docstrings for PointCloud and GeometryDataFrame should make it clearer how they differ from SOMADataFrame and from each other.

uri: str,
*,
schema: pa.Schema,
index_column_names: Sequence[str] = (options.SOMA_JOINID, "x", "y"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of point clouds, do you think it makes sense to always index by the defined axes (and joinid) to facilitate efficient spatial queries? User-defined indices make sense for SOMADataFrames, which are more general-purpose, but for PointClouds, it seems like indexing by the axes is the most common use case and what distinguishes a PointCloud from a DataFrame.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spatial axes need to be index columns (see doc string), but this is still an argument so there is flexibility with:

  1. The order of the index columns.
  2. The option to include or exclude soma_joinid (or other columns) as index columns.

python-spec/src/somacore/spatial.py Outdated Show resolved Hide resolved

@property
@abc.abstractmethod
def coordinate_space(self) -> Optional[coordinates.CoordinateSpace]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should create() have an argument to define the coordinate space at create tiem?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generated from the axis names. We could switch the parameter to coordinate space instead. How would you feel about leaving that as a question for alpha users?

raise NotImplementedError()


class GeometryDataFrame(base.SOMAObject, metaclass=abc.ABCMeta):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to align naming of the PointCloud and GeometryDataFrame classes. Both are essentially specialized DataFrames so the <specialization>DataFrame naming scheme makes sense. Although PointCloudDataFrame is a bit of a mouthful. I don't have strong feelings about which approach we use but I do think we should be consistent, unless there's a good reason not to.

Comment on lines +280 to +283
index_column_names: Sequence[str] = (
options.SOMA_JOINID,
options.SOMA_GEOMETRY,
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too I'm not sure if we need to allow users to define the index columns on these specialized dataframes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. The user needs to include the geometry as an index column. An error is thrown if they do not. But there is flexibility in the other columns they may want to include.

python-spec/src/somacore/spatial.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatial.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatial.py Show resolved Hide resolved
python-spec/src/somacore/spatial.py Outdated Show resolved Hide resolved
python-spec/src/somacore/spatial.py Outdated Show resolved Hide resolved
* Expand PointCloud doc string

* Expand GeometryDataFrame docstring

* Double back ticks
Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@jp-dark jp-dark merged commit 3d9e34f into main Sep 26, 2024
6 checks passed
@jp-dark jp-dark deleted the dark/spatial-datatypes branch September 26, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants