Spatialdata-Docs-Local Skill
Comprehensive assistance with spatialdata-docs-local development, generated from official documentation.
When to Use This Skill
This skill should be triggered when:
- Working with spatialdata-docs-local
- Asking about spatialdata-docs-local features or APIs
- Implementing spatialdata-docs-local solutions
- Debugging spatialdata-docs-local code
- Learning spatialdata-docs-local best practices
Quick Reference
Common Patterns
Pattern 1: Installation API SpatialData object Input/Output Operations Transformations Transformations utils Datasets Data Loaders Models Models utils Testing utilities Data formats (advanced) Tutorials Intro to SpatialData Transformations and coordinate systems Spatial query/subsampling Annotating regions of interest with napari Use landmark annotations to align multiple -omics layers I. Use SpatialData with your data: the SpatialData object. II. Use SpatialData with your data: SpatialElements and tables Working with annotations in SpatialData Create SpatialData objects from scratch Integrate/aggregate signals across spatial layers Interchangeability between raster and vector representations Squidpy integration Deep learning example on image tiles Technology focus: 10x Genomics Visium Technology focus: MIBI-TOF Technology focus: MERFISH Technology focus: CosMx Technology focus: Visium HD Technology focus: Xenium Technology focus: SpaceM Implicit performance improvements when plotting raster data Raster data and translations (technical topic) Spatial omics datasets Glossary Design document for SpatialData Contributing guide Changelog References .ipynb .pdf Working with annotations in SpatialData Contents Introduction Store annotations in a SpatialElement Labels Shapes (circles, polygons/multipolygons) Points Images Store annotations with AnnData tables Table metadata (annotation targets) Retrieving annotations with get_values() Where to store the annotation Operations on tables Find which elements a table is annotating Change the annotation target of table Construct a table annotating 0 SpatialElements Construct a table annotating 1 or more SpatialElements Performing SQL like joins A note on joins A use case for table Working with annotations in SpatialData# Introduction# The spatialdata framework supports both the representation of SpatialElements (images, labels, points, shapes) and of annotations for these elements. As we will explore in this notebook, some types of SpatialElements can contain some annotatoins within themselves, but the general approach we take is to represent SpatialElements and annotations in separate objects, which allows or more granular control and composability. For storing annotations we rely on the AnnData data structure, and we refer to this object also with the term table. In this notebook we will show the following: • Introduction • Store annotations in a SpatialElement • Labels • Shapes (circles, polygons/multipolygons) • Points • Images • Store annotations with AnnData tables • Table metadata (annotation targets) • Retrieving annotations with get_values() • Where to store the annotation • Operations on tables • Find which elements a table is annotating • Change the annotation target of table • Construct a table annotating 0 SpatialElements • Construct a table annotating 1 or more SpatialElements • Performing SQL like joins • A note on joins • A use case for table Store annotations in a SpatialElement# We will consider an example datasets, blobs, which contains a copy for each type of SpatialElement we support. from spatialdata.datasets import blobs sdata = blobs() sdata SpatialData object with: ├── Images │ ├── 'blobs_image': SpatialImage[cyx] (3, 512, 512) │ └── 'blobs_multiscale_image': MultiscaleSpatialImage[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128) ├── Labels │ ├── 'blobs_labels': SpatialImage[yx] (512, 512) │ └── 'blobs_multiscale_labels': MultiscaleSpatialImage[yx] (512, 512), (256, 256), (128, 128) ├── Points │ └── 'blobs_points': DataFrame with shape: (, 4) (2D points) ├── Shapes │ ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes) │ ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes) │ └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes) └── Tables └── 'table': AnnData (26, 3) with coordinate systems: ▸ 'global', with elements: blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes) Labels# Labels object cannot contain annotations within themselves; instead, a table is needed for that. Let’s see how to compute their indices. 2D and 3D labels have respectively dimensions ('y', 'x') and ('z', 'y', 'x'). Given a labels object, the instances are defined by all the pixels whose value is equal to a particular index. In order to identify the indices one can therefore compute the unique values in the labels object (for a multiscale labels the largest resolution should be considered). This is conveniently done by the private function _get_unique_label_values_as_index(); in one of the next releases we will include this function as a special case of the get_values() function, discussed later in the notebook. Here we show how to use this function to identify the indices of the instances encoded by the labels object, and also how to manually compute this information. import numpy as np from spatialdata import get_element_instances print(get_element_instances(sdata["blobs_multiscale_labels"])) print(np.unique(sdata["blobs_multiscale_labels"]["scale0"]["image"].data.compute())) Index([ 0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 29, 30], dtype='int16') [ 0 1 2 3 4 5 6 8 9 10 11 12 13 15 16 17 18 19 20 22 23 24 25 26 27 29 30] Shapes (circles, polygons/multipolygons)# Shapes are GeoDataFrame object, a subclass of GeoDataFrame where the geometric information is specified in the geometry column. Since they are dataframes, they in particular contain the index, which can be accessed with sdata["blobs_multipolygons"].index RangeIndex(start=0, stop=2, step=1) Circles require a column named radius and the rows in the columns geometry need to be of type shapely.Point. Polygons/multipolygons require the type of the rows in the column geometry to be shapely.Polygon/shapely.MultiPolygon. These are the only requirements, and the extra columns can be used for annotation. sdata["blobs_circles"]["my_value"] = [0, 0, 0, 0, 1] sdata["blobs_circles"] geometry radius my_value 0 POINT (291.062 197.065) 51 0 1 POINT (259.026 371.319) 51 0 2 POINT (194.973 204.414) 51 0 3 POINT (149.926 188.623) 51 0 4 POINT (369.422 258.900) 51 1 import spatialdata_plot # noqa: F401 sdata.pl.render_shapes("blobs_circles", color="my_value").pl.show() Points# Points are represented as “lazy” (using Dask) pandas dataframes. Similarly as above, they have an index. The spatial information is specified by the x and y columns (for 2D points) and the x, y, z columns (for 3D points). The rest of the columns can be used to store annotations. The blobs dataset already stores some annotations in those columns. sdata["blobs_points"].compute() x y genes instance_id 0 46 395 gene_b 9 1 334 224 gene_b 7 2 221 438 gene_b 3 3 44 356 gene_a 9 4 103 49 gene_b 4 ... ... ... ... ... 195 381 92 gene_a 8 196 188 306 gene_b 5 197 368 447 gene_a 7 198 23 101 gene_a 6 199 144 159 gene_a 6 200 rows × 4 columns sdata.pl.render_points(color="genes").pl.show() Images# 2D and 3D images have respectively dimensions ('c', 'y', 'x') and ('c', 'z', 'y', 'x'). Unlike for the SpatialElements above, images do not contain information on instances and in particular they do not have an index. Nevertheless, the coordinates of the dimension c of images can be specified, thus giving a name for each channel. Channels can be accessed using the utils function get_channels(), which operates on both single-scale and multi-scale images. from spatialdata.models import get_channels get_channels(sdata["blobs_multiscale_image"]) [0, 1, 2] The channels can also be accessed directly from the xarray.DataArray object. For instance for a single-scale image we have sdata["blobs_image"].c.values array([0, 1, 2]) To set the channels we recommend to use the parsers Image2DModel.parse() and Image3DModel.parse(). For instance from spatialdata.models import Image2DModel sdata["blobs_image"] = Image2DModel.parse(sdata["blobs_image"], c_coords=["r", "g", "b"]) get_channels(sdata["blobs_image"]) ['r', 'g', 'b'] Store annotations with AnnData tables# Tables can annotate any SpatialElement that have an index, so they can annotate labels, shapes and points. Tables can also be specified without defining the annotation target; images do not have an index so they cannot be annotated by a table, but one could use a table without an annotation target to effectively annotate the channels of the image. Tables cannot annotate tables. An AnnData table offers a more powerful way to store annotations as opposed to the methods described before. The AnnData object can store sparse and dense matrices, multiple layers (=variants) of these matrices, and can store annotations for the rows and columns for these matrices in terms of tensors, dataframes and matrices of pairwise relationships. We refer to the anndata documentation (https://anndata.readthedocs.io/en/latest/) for a full description of this and here we limit to the following graphical schematic. Table metadata (annotation targets)# Each table contains some metadata that defines the annotation target (if any) of the table. This metadata is constituted by: the region, region_key and instance_key triplet, stored in table.uns; two columns of the DataFrame stored in table.obs. A table that doesn’t annotate any element doesn’t contain any of the above metadata. Instead, in order to define the annotation targets of a table, the above metadata is defined as follows: region is either a string either a list of strings, and describes the list of targets, i.e. the SpatialElements that the table annotates. region_key is a string, and is the name of a column of the dataframe table.obs which describes, for each row, which is the SpatialElement that the rows refers to; instance_key is a string, and is the name of a column of the dataframe table.obs which describes, for each row, which is the index of the particular annotated instance (which is inside the SpatialElement annotated by the row). A few more details on the two dataframe columns. The column named after region_key must have string either categorical type; it should nevertheless be categorical and the framework will warn the user to convert it as categorical before performing computationally expensive operations. The column named after instance_key must be either have numerical either string type. Finally, you may notice that the information contained in region is redunant (as it can be computed as the list of unique values of the column named after region_key) and it may even get out-of-sync (for instance after subsetting the rows). For these reasons, in a future release we will simplify the metadata by dropping region, deriving it automatically instead. Anyway, currently we are maintaining it for legacy reasons. Warning! Please note that we haven’t mentioned the index of the table. In fact, we do not use the index of the table! The above information is summarized in the following graphical slide. Retrieving annotations with get_values()# We provide a function get_values() that can be useful to retrive one or more columns of annotations for an element, independently if the annotations is stored in a table or in the elment. Here is an example. sdata["blobs_points"].compute() x y genes instance_id 0 46 395 gene_b 9 1 334 224 gene_b 7 2 221 438 gene_b 3 3 44 356 gene_a 9 4 103 49 gene_b 4 ... ... ... ... ... 195 381 92 gene_a 8 196 188 306 gene_b 5 197 368 447 gene_a 7 198 23 101 gene_a 6 199 144 159 gene_a 6 200 rows × 4 columns from spatialdata import get_values get_values(value_key="genes", element=sdata["blobs_points"]) genes 0 gene_b 1 gene_b 2 gene_b 3 gene_a 4 gene_b ... ... 195 gene_a 196 gene_b 197 gene_a 198 gene_a 199 gene_a 200 rows × 1 columns sdata["table"].var_names Index(['channel_0_sum', 'channel_1_sum', 'channel_2_sum'], dtype='object') get_values(value_key="channel_0_sum", sdata=sdata, element_name="blobs_labels", table_name="table") channel_0_sum instance_id 1 1309.369255 2 1535.995388 3 855.965478 4 614.497990 5 212.404587 6 482.633650 8 518.004680 9 258.186892 10 159.661750 11 3.438841 12 80.604196 13 155.678618 15 230.130425 16 150.663043 17 466.641687 18 2.271171 19 1.270513 20 118.423858 22 85.710077 23 285.418051 24 24.794733 25 220.373520 26 1.278647 27 113.662394 29 100.544497 30 59.460201 Where to store the annotation# Here are some recommendation on when to store the annotations in the SpatialElement and when in the AnnData table: we suggest to use the AnnData table when the annotations are complex, require the use of dense or sparse matrices and when there could be the need to reuse them between elements; for small annotations, like storing the name or some comments for some regions of interest sometimes elements annotation are more handy; when a point element is large (from 100 million rows), it may be worth, from a performance perspective, to store the annotation information directly in the point element instead that using an external AnnData table. Operations on tables# Find which elements a table is annotating# Let’s now see an example. Blobs contains one table. Let’s see what it is annotating by looking at the region, region_key and instance_key metadata, which can be accessed using the utils function get_region_key(). from spatialdata.models import get_table_keys region, region_key, instance_key = get_table_keys(sdata["table"]) print(region, region_key, instance_key) blobs_labels region instance_id We can see that this information is present in sdata["table"].obs. Here region are the names of the SpatialElements being annotated, the region_key corresponds to the column in .obs containing per row which SpatialElement is annotated by that row and instance_key specifies the column with per row the information which index of the SpatialElement the table is annotating. sdata["table"].obs.head() instance_id region 1 1 blobs_labels 2 2 blobs_labels 3 3 blobs_labels 4 4 blobs_labels 5 5 blobs_labels If we want to retrieve either of the three, there are three helper functions that allow for this, namely get_annotated_regions, get_region_key_column and get_instance_key_column. Either of these take only the AnnDatatable as an argument. Below we show an example: from spatialdata import SpatialData as sd regions = sd.get_annotated_regions(sdata["table"]) print(regions) blobs_labels region_column = sd.get_region_key_column(sdata["table"]) print(region_column.head()) 1 blobs_labels 2 blobs_labels 3 blobs_labels 4 blobs_labels 5 blobs_labels Name: region, dtype: category Categories (1, object): ['blobs_labels'] Change the annotation target of table# We have a helper function, set_table_annotates_spatialelement to change the metadata regarding the annotation target of table in a SpatialData object. This function takes as arguments the table_name, region and optionally the region_key and instance_key. The latter two don’t have to necessarily be specified if the table is already annotating a SpatialElement. The current values will be reused if not specified. For any of the arguments specified, they must be present at their respective location in the SpatialDataobject or table. Here we will create a new circles element, with a circle for each instance of the labels element, and we will make the table annotate this new element. Let’s plot the labels with their annotations, let’s set the table to annotate the new element and then let’s also plot it. sdata.table.var_names Index(['channel_0_sum', 'channel_1_sum', 'channel_2_sum'], dtype='object') sdata.pl.render_labels("blobs_labels", color="channel_0_sum", fill_alpha=1.0).pl.show() from spatialdata import to_circles sdata["labels_as_circles"] = to_circles(sdata["blobs_labels"]) sdata["table"].obs["region"] = "labels_as_circles" # no need to reassign the instance_key column 'instance_id' sicne the instances are the same as for labels sdata.set_table_annotates_spatialelement("table", region="labels_as_circles") sdata.pl.render_shapes("labels_as_circles", color="channel_0_sum").pl.show() Construct a table annotating 0 SpatialElements# Tables in Spatialdata are AnnData tables. Creating a table that does not annotate a SpatialElement is as simple as constructing an Anndata table. In this case the table should not contain region, region_key and instance_key metadata. Here an example of a table storing a dummy codebook: from anndata import AnnData from spatialdata.models import TableModel codebook_table = AnnData(obs={"Gene": ["Gene1", "Gene2", "Gene3"], "barcode": ["03210", "23013", "30121"]}) # We don't specify arguments related to metadata as we don't annotate a SpatialElement. sdata_table = TableModel.parse(codebook_table) sdata["codebook"] = sdata_table sdata["codebook"].obs Gene barcode 0 Gene1 03210 1 Gene2 23013 2 Gene3 30121 Construct a table annotating 1 or more SpatialElements# Now let us create a table that annotates multiple SpatialElements. Note that the order of the indices does not have to match the order of the indices in the SpatialElement. To showcase this we perform a permutation of the indices. Also, the dtypeof the index column of the SpatialElement must match the dtype of the instance_key column in the table. If this is not the case this will result in an error when trying to add the table to the SpatialData object. Lastly, not every index of the SpatialElement has to be present in the instance_key column of the SpatialData table and vice versa. We will later show functions to deal with these cases. polygon_index = list(sdata.shapes["blobs_polygons"].index) # We have to do a compute here as points are lazily loaded using dask. point_index = list(sdata["blobs_points"].index.compute()) region_column = ["blobs_polygons"] * len(polygon_index) + ["blobs_points"] * len(point_index) instance_id_column = polygon_index + point_index import numpy as np RNG = np.random.default_rng() table = AnnData( X=np.zeros((len(region_column), 1)), obs={"region": region_column, "instance_id": instance_id_column}, ) table = table[RNG.permutation(table.obs.index), :].copy() # Now we have to specify all 3 annotation metadata fields. sdata_table = TableModel.parse( table, region=["blobs_polygons", "blobs_points"], region_key="region", instance_key="instance_id" ) # When adding the table now, it is validated for presence of annotation targets in the sdata object. sdata["annotations"] = sdata_table Performing SQL like joins# In order to retrieve matching and non-matching parts of a SpatialElement and the annotating tables we can perform SQL like join operations on the table. For this, we have the function join_spatialelement_table. It takes as arguments: the SpatialData object spatial_element_name as either a str or a list of str the table name as a str the parameter how which indicates what kind of SQL like operation to perform (left, left_exclusive, inner, right or right_exclusive). the match_rows argument, that you can use to match either to the indices of the SpatialElement (left) or the instance_key column of the table (right). The default here is no, so no matching. It is also possible to avoid passing the SpatialData object and directly pass a list of SpatialElements and a list of their names, please see the documentation for all the details. Let us now showcase the function by first removing some indices from blobs_polygonsand then performing a join. from spatialdata import join_spatialelement_table # This leaves the element with 3 shapes sdata["blobs_polygons"] = sdata["blobs_polygons"][:3] # We can now do a join with one spatial element element_dict, table = join_spatialelement_table( sdata=sdata, spatial_element_names="blobs_polygons", table_name="annotations", how="left" ) print(element_dict["blobs_polygons"]) table.obs geometry 0 POLYGON ((340.197 258.214, 316.177 197.065, 29... 1 POLYGON ((284.141 420.454, 267.249 371.319, 25... 2 POLYGON ((203.195 229.528, 285.506 204.414, 19... region instance_id 1 blobs_polygons 1 2 blobs_polygons 2 0 blobs_polygons 0 Above we see that the table only contains those annotations corresponding to shapes that are still in blobs_polygons. The element_dict only contains SpatialElements used in the join. Let us now repeat the join but with the table rows matching the indices of blobs_polygons. element_dict, table = join_spatialelement_table( sdata=sdata, spatial_element_names="blobs_polygons", table_name="annotations", how="left", match_rows="left" ) print(element_dict["blobs_polygons"]) table.obs geometry 0 POLYGON ((340.197 258.214, 316.177 197.065, 29... 1 POLYGON ((284.141 420.454, 267.249 371.319, 25... 2 POLYGON ((203.195 229.528, 285.506 204.414, 19... region instance_id 0 blobs_polygons 0 1 blobs_polygons 1 2 blobs_polygons 2 Let us now add the filtered annotations back to the SpatialData object. Since the new table is a subset of the original one, the region value may now have gone out of sync. We provide a convenience function that recompute the correct value for region. When in a future release we remove the use of region this extra step will not be needed. table = sd.update_annotated_regions_metadata(table) sdata["filtered_annotations_blobs_polygons"] = table Lastly, we can also do the join operation on multiple SpatialElements at the same time. element_dict, table = join_spatialelement_table( sdata=sdata, spatial_element_names=["blobs_polygons", "blobs_points"], table_name="annotations", how="left", match_rows="left", ) sdata["multi_filtered_table"] = table table.obs region instance_id 0 blobs_polygons 0 1 blobs_polygons 1 2 blobs_polygons 2 5 blobs_points 0 6 blobs_points 1 ... ... ... 200 blobs_points 195 201 blobs_points 196 202 blobs_points 197 203 blobs_points 198 204 blobs_points 199 203 rows × 2 columns from pprint import pprint # This tells us which tables we have in the SpatialData object pprint(sdata.tables) {'annotations': AnnData object with n_obs × n_vars = 205 × 1 obs: 'region', 'instance_id' uns: 'spatialdata_attrs', 'codebook': AnnData object with n_obs × n_vars = 3 × 0 obs: 'Gene', 'barcode', 'filtered_annotations_blobs_polygons': AnnData object with n_obs × n_vars = 3 × 1 obs: 'region', 'instance_id' uns: 'spatialdata_attrs', 'multi_filtered_table': AnnData object with n_obs × n_vars = 203 × 1 obs: 'region', 'instance_id' uns: 'spatialdata_attrs', 'table': AnnData object with n_obs × n_vars = 26 × 3 obs: 'instance_id', 'region' uns: 'spatialdata_attrs'} So far we have shown that you can do join operations when the elements and the table are both in a SpatialData object. However, you can also perform this operation on SpatialElements / tables outside SpatialData objects: element_dict, table = join_spatialelement_table( spatial_element_names=["blobs_polygons", "blobs_points"], spatial_elements=[sdata["blobs_polygons"], sdata["blobs_points"]], table=sdata["annotations"], how="left", match_rows="left", ) sdata["multi_filtered_table"] = table table.obs region instance_id 0 blobs_polygons 0 1 blobs_polygons 1 2 blobs_polygons 2 5 blobs_points 0 6 blobs_points 1 ... ... ... 200 blobs_points 195 201 blobs_points 196 202 blobs_points 197 203 blobs_points 198 204 blobs_points 199 203 rows × 2 columns A note on joins# For the joins on Shapes and Points elements any type of join is supported and also any kind of matching is supported. For Labels elements however, only the left join is supported and also only no and left are supported for the argument match_rows. This because for Labels the SQL like join behaviour would otherwise be complex to implement due to masking out of particular labels potentially being an expensive operation. We also did not forsee a usecase for this. In case you have a use case for this, please get in touch with us by either opening an issue on github or via our discourse. A use case for table# We will now end the presentation on annotations with a high-level discussion of a use case that could be addressed with our framework. Suppose to have a microscopy image showing cells and to derive two cell segmentation masks from it, using two different algorithms. The segmentation masks may be similar but not identical: some cells may greatly overlap but others may be present in one segmentation but not in the other. Also, you may be interested in storing some annotations for the cells of the two segmentation masks, such as gene expression and image features. You may want to compare the two segmentation masks based on spatial overlap and match/subset the tables to the common instances. This use case can be addresed using the SOPA, a tool relying on the spatialdata infrastructure. In particular, the use case described above require the following steps, available in spatialdata and/or in SOPA: store the microscopy image and the two segmentation masks (labels objects) store the gene expression and image feature information for each of the two segmentation masks (implemented in SOPA) convert the segmentation masks to shapes element (GeoDataFrame objects of polygons/multipolygons) change the annotation target of the tables from the labels to the newly created shapes elements (implemented in SOPA) match the polygonal segmentation masks based on spatial overlap using the geopandas APIs filter/match the tables with the new set of common cells using the spatialdata join operations previous II. Use SpatialData with your data: SpatialElements and tables next Create SpatialData objects from scratch Contents Introduction Store annotations in a SpatialElement Labels Shapes (circles, polygons/multipolygons) Points Images Store annotations with AnnData tables Table metadata (annotation targets) Retrieving annotations with get_values() Where to store the annotation Operations on tables Find which elements a table is annotating Change the annotation target of table Construct a table annotating 0 SpatialElements Construct a table annotating 1 or more SpatialElements Performing SQL like joins A note on joins A use case for table By scverse © Copyright 2025, scverse.
napari
Pattern 2: .ipynb .pdf Working with annotations in SpatialData Contents Introduction Store annotations in a SpatialElement Labels Shapes (circles, polygons/multipolygons) Points Images Store annotations with AnnData tables Table metadata (annotation targets) Retrieving annotations with get_values() Where to store the annotation Operations on tables Find which elements a table is annotating Change the annotation target of table Construct a table annotating 0 SpatialElements Construct a table annotating 1 or more SpatialElements Performing SQL like joins A note on joins A use case for table Working with annotations in SpatialData# Introduction# The spatialdata framework supports both the representation of SpatialElements (images, labels, points, shapes) and of annotations for these elements. As we will explore in this notebook, some types of SpatialElements can contain some annotatoins within themselves, but the general approach we take is to represent SpatialElements and annotations in separate objects, which allows or more granular control and composability. For storing annotations we rely on the AnnData data structure, and we refer to this object also with the term table. In this notebook we will show the following: • Introduction • Store annotations in a SpatialElement • Labels • Shapes (circles, polygons/multipolygons) • Points • Images • Store annotations with AnnData tables • Table metadata (annotation targets) • Retrieving annotations with get_values() • Where to store the annotation • Operations on tables • Find which elements a table is annotating • Change the annotation target of table • Construct a table annotating 0 SpatialElements • Construct a table annotating 1 or more SpatialElements • Performing SQL like joins • A note on joins • A use case for table Store annotations in a SpatialElement# We will consider an example datasets, blobs, which contains a copy for each type of SpatialElement we support. from spatialdata.datasets import blobs sdata = blobs() sdata SpatialData object with: ├── Images │ ├── 'blobs_image': SpatialImage[cyx] (3, 512, 512) │ └── 'blobs_multiscale_image': MultiscaleSpatialImage[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128) ├── Labels │ ├── 'blobs_labels': SpatialImage[yx] (512, 512) │ └── 'blobs_multiscale_labels': MultiscaleSpatialImage[yx] (512, 512), (256, 256), (128, 128) ├── Points │ └── 'blobs_points': DataFrame with shape: (, 4) (2D points) ├── Shapes │ ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes) │ ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes) │ └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes) └── Tables └── 'table': AnnData (26, 3) with coordinate systems: ▸ 'global', with elements: blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes) Labels# Labels object cannot contain annotations within themselves; instead, a table is needed for that. Let’s see how to compute their indices. 2D and 3D labels have respectively dimensions ('y', 'x') and ('z', 'y', 'x'). Given a labels object, the instances are defined by all the pixels whose value is equal to a particular index. In order to identify the indices one can therefore compute the unique values in the labels object (for a multiscale labels the largest resolution should be considered). This is conveniently done by the private function _get_unique_label_values_as_index(); in one of the next releases we will include this function as a special case of the get_values() function, discussed later in the notebook. Here we show how to use this function to identify the indices of the instances encoded by the labels object, and also how to manually compute this information. import numpy as np from spatialdata import get_element_instances print(get_element_instances(sdata["blobs_multiscale_labels"])) print(np.unique(sdata["blobs_multiscale_labels"]["scale0"]["image"].data.compute())) Index([ 0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 29, 30], dtype='int16') [ 0 1 2 3 4 5 6 8 9 10 11 12 13 15 16 17 18 19 20 22 23 24 25 26 27 29 30] Shapes (circles, polygons/multipolygons)# Shapes are GeoDataFrame object, a subclass of GeoDataFrame where the geometric information is specified in the geometry column. Since they are dataframes, they in particular contain the index, which can be accessed with sdata["blobs_multipolygons"].index RangeIndex(start=0, stop=2, step=1) Circles require a column named radius and the rows in the columns geometry need to be of type shapely.Point. Polygons/multipolygons require the type of the rows in the column geometry to be shapely.Polygon/shapely.MultiPolygon. These are the only requirements, and the extra columns can be used for annotation. sdata["blobs_circles"]["my_value"] = [0, 0, 0, 0, 1] sdata["blobs_circles"] geometry radius my_value 0 POINT (291.062 197.065) 51 0 1 POINT (259.026 371.319) 51 0 2 POINT (194.973 204.414) 51 0 3 POINT (149.926 188.623) 51 0 4 POINT (369.422 258.900) 51 1 import spatialdata_plot # noqa: F401 sdata.pl.render_shapes("blobs_circles", color="my_value").pl.show() Points# Points are represented as “lazy” (using Dask) pandas dataframes. Similarly as above, they have an index. The spatial information is specified by the x and y columns (for 2D points) and the x, y, z columns (for 3D points). The rest of the columns can be used to store annotations. The blobs dataset already stores some annotations in those columns. sdata["blobs_points"].compute() x y genes instance_id 0 46 395 gene_b 9 1 334 224 gene_b 7 2 221 438 gene_b 3 3 44 356 gene_a 9 4 103 49 gene_b 4 ... ... ... ... ... 195 381 92 gene_a 8 196 188 306 gene_b 5 197 368 447 gene_a 7 198 23 101 gene_a 6 199 144 159 gene_a 6 200 rows × 4 columns sdata.pl.render_points(color="genes").pl.show() Images# 2D and 3D images have respectively dimensions ('c', 'y', 'x') and ('c', 'z', 'y', 'x'). Unlike for the SpatialElements above, images do not contain information on instances and in particular they do not have an index. Nevertheless, the coordinates of the dimension c of images can be specified, thus giving a name for each channel. Channels can be accessed using the utils function get_channels(), which operates on both single-scale and multi-scale images. from spatialdata.models import get_channels get_channels(sdata["blobs_multiscale_image"]) [0, 1, 2] The channels can also be accessed directly from the xarray.DataArray object. For instance for a single-scale image we have sdata["blobs_image"].c.values array([0, 1, 2]) To set the channels we recommend to use the parsers Image2DModel.parse() and Image3DModel.parse(). For instance from spatialdata.models import Image2DModel sdata["blobs_image"] = Image2DModel.parse(sdata["blobs_image"], c_coords=["r", "g", "b"]) get_channels(sdata["blobs_image"]) ['r', 'g', 'b'] Store annotations with AnnData tables# Tables can annotate any SpatialElement that have an index, so they can annotate labels, shapes and points. Tables can also be specified without defining the annotation target; images do not have an index so they cannot be annotated by a table, but one could use a table without an annotation target to effectively annotate the channels of the image. Tables cannot annotate tables. An AnnData table offers a more powerful way to store annotations as opposed to the methods described before. The AnnData object can store sparse and dense matrices, multiple layers (=variants) of these matrices, and can store annotations for the rows and columns for these matrices in terms of tensors, dataframes and matrices of pairwise relationships. We refer to the anndata documentation (https://anndata.readthedocs.io/en/latest/) for a full description of this and here we limit to the following graphical schematic. Table metadata (annotation targets)# Each table contains some metadata that defines the annotation target (if any) of the table. This metadata is constituted by: the region, region_key and instance_key triplet, stored in table.uns; two columns of the DataFrame stored in table.obs. A table that doesn’t annotate any element doesn’t contain any of the above metadata. Instead, in order to define the annotation targets of a table, the above metadata is defined as follows: region is either a string either a list of strings, and describes the list of targets, i.e. the SpatialElements that the table annotates. region_key is a string, and is the name of a column of the dataframe table.obs which describes, for each row, which is the SpatialElement that the rows refers to; instance_key is a string, and is the name of a column of the dataframe table.obs which describes, for each row, which is the index of the particular annotated instance (which is inside the SpatialElement annotated by the row). A few more details on the two dataframe columns. The column named after region_key must have string either categorical type; it should nevertheless be categorical and the framework will warn the user to convert it as categorical before performing computationally expensive operations. The column named after instance_key must be either have numerical either string type. Finally, you may notice that the information contained in region is redunant (as it can be computed as the list of unique values of the column named after region_key) and it may even get out-of-sync (for instance after subsetting the rows). For these reasons, in a future release we will simplify the metadata by dropping region, deriving it automatically instead. Anyway, currently we are maintaining it for legacy reasons. Warning! Please note that we haven’t mentioned the index of the table. In fact, we do not use the index of the table! The above information is summarized in the following graphical slide. Retrieving annotations with get_values()# We provide a function get_values() that can be useful to retrive one or more columns of annotations for an element, independently if the annotations is stored in a table or in the elment. Here is an example. sdata["blobs_points"].compute() x y genes instance_id 0 46 395 gene_b 9 1 334 224 gene_b 7 2 221 438 gene_b 3 3 44 356 gene_a 9 4 103 49 gene_b 4 ... ... ... ... ... 195 381 92 gene_a 8 196 188 306 gene_b 5 197 368 447 gene_a 7 198 23 101 gene_a 6 199 144 159 gene_a 6 200 rows × 4 columns from spatialdata import get_values get_values(value_key="genes", element=sdata["blobs_points"]) genes 0 gene_b 1 gene_b 2 gene_b 3 gene_a 4 gene_b ... ... 195 gene_a 196 gene_b 197 gene_a 198 gene_a 199 gene_a 200 rows × 1 columns sdata["table"].var_names Index(['channel_0_sum', 'channel_1_sum', 'channel_2_sum'], dtype='object') get_values(value_key="channel_0_sum", sdata=sdata, element_name="blobs_labels", table_name="table") channel_0_sum instance_id 1 1309.369255 2 1535.995388 3 855.965478 4 614.497990 5 212.404587 6 482.633650 8 518.004680 9 258.186892 10 159.661750 11 3.438841 12 80.604196 13 155.678618 15 230.130425 16 150.663043 17 466.641687 18 2.271171 19 1.270513 20 118.423858 22 85.710077 23 285.418051 24 24.794733 25 220.373520 26 1.278647 27 113.662394 29 100.544497 30 59.460201 Where to store the annotation# Here are some recommendation on when to store the annotations in the SpatialElement and when in the AnnData table: we suggest to use the AnnData table when the annotations are complex, require the use of dense or sparse matrices and when there could be the need to reuse them between elements; for small annotations, like storing the name or some comments for some regions of interest sometimes elements annotation are more handy; when a point element is large (from 100 million rows), it may be worth, from a performance perspective, to store the annotation information directly in the point element instead that using an external AnnData table. Operations on tables# Find which elements a table is annotating# Let’s now see an example. Blobs contains one table. Let’s see what it is annotating by looking at the region, region_key and instance_key metadata, which can be accessed using the utils function get_region_key(). from spatialdata.models import get_table_keys region, region_key, instance_key = get_table_keys(sdata["table"]) print(region, region_key, instance_key) blobs_labels region instance_id We can see that this information is present in sdata["table"].obs. Here region are the names of the SpatialElements being annotated, the region_key corresponds to the column in .obs containing per row which SpatialElement is annotated by that row and instance_key specifies the column with per row the information which index of the SpatialElement the table is annotating. sdata["table"].obs.head() instance_id region 1 1 blobs_labels 2 2 blobs_labels 3 3 blobs_labels 4 4 blobs_labels 5 5 blobs_labels If we want to retrieve either of the three, there are three helper functions that allow for this, namely get_annotated_regions, get_region_key_column and get_instance_key_column. Either of these take only the AnnDatatable as an argument. Below we show an example: from spatialdata import SpatialData as sd regions = sd.get_annotated_regions(sdata["table"]) print(regions) blobs_labels region_column = sd.get_region_key_column(sdata["table"]) print(region_column.head()) 1 blobs_labels 2 blobs_labels 3 blobs_labels 4 blobs_labels 5 blobs_labels Name: region, dtype: category Categories (1, object): ['blobs_labels'] Change the annotation target of table# We have a helper function, set_table_annotates_spatialelement to change the metadata regarding the annotation target of table in a SpatialData object. This function takes as arguments the table_name, region and optionally the region_key and instance_key. The latter two don’t have to necessarily be specified if the table is already annotating a SpatialElement. The current values will be reused if not specified. For any of the arguments specified, they must be present at their respective location in the SpatialDataobject or table. Here we will create a new circles element, with a circle for each instance of the labels element, and we will make the table annotate this new element. Let’s plot the labels with their annotations, let’s set the table to annotate the new element and then let’s also plot it. sdata.table.var_names Index(['channel_0_sum', 'channel_1_sum', 'channel_2_sum'], dtype='object') sdata.pl.render_labels("blobs_labels", color="channel_0_sum", fill_alpha=1.0).pl.show() from spatialdata import to_circles sdata["labels_as_circles"] = to_circles(sdata["blobs_labels"]) sdata["table"].obs["region"] = "labels_as_circles" # no need to reassign the instance_key column 'instance_id' sicne the instances are the same as for labels sdata.set_table_annotates_spatialelement("table", region="labels_as_circles") sdata.pl.render_shapes("labels_as_circles", color="channel_0_sum").pl.show() Construct a table annotating 0 SpatialElements# Tables in Spatialdata are AnnData tables. Creating a table that does not annotate a SpatialElement is as simple as constructing an Anndata table. In this case the table should not contain region, region_key and instance_key metadata. Here an example of a table storing a dummy codebook: from anndata import AnnData from spatialdata.models import TableModel codebook_table = AnnData(obs={"Gene": ["Gene1", "Gene2", "Gene3"], "barcode": ["03210", "23013", "30121"]}) # We don't specify arguments related to metadata as we don't annotate a SpatialElement. sdata_table = TableModel.parse(codebook_table) sdata["codebook"] = sdata_table sdata["codebook"].obs Gene barcode 0 Gene1 03210 1 Gene2 23013 2 Gene3 30121 Construct a table annotating 1 or more SpatialElements# Now let us create a table that annotates multiple SpatialElements. Note that the order of the indices does not have to match the order of the indices in the SpatialElement. To showcase this we perform a permutation of the indices. Also, the dtypeof the index column of the SpatialElement must match the dtype of the instance_key column in the table. If this is not the case this will result in an error when trying to add the table to the SpatialData object. Lastly, not every index of the SpatialElement has to be present in the instance_key column of the SpatialData table and vice versa. We will later show functions to deal with these cases. polygon_index = list(sdata.shapes["blobs_polygons"].index) # We have to do a compute here as points are lazily loaded using dask. point_index = list(sdata["blobs_points"].index.compute()) region_column = ["blobs_polygons"] * len(polygon_index) + ["blobs_points"] * len(point_index) instance_id_column = polygon_index + point_index import numpy as np RNG = np.random.default_rng() table = AnnData( X=np.zeros((len(region_column), 1)), obs={"region": region_column, "instance_id": instance_id_column}, ) table = table[RNG.permutation(table.obs.index), :].copy() # Now we have to specify all 3 annotation metadata fields. sdata_table = TableModel.parse( table, region=["blobs_polygons", "blobs_points"], region_key="region", instance_key="instance_id" ) # When adding the table now, it is validated for presence of annotation targets in the sdata object. sdata["annotations"] = sdata_table Performing SQL like joins# In order to retrieve matching and non-matching parts of a SpatialElement and the annotating tables we can perform SQL like join operations on the table. For this, we have the function join_spatialelement_table. It takes as arguments: the SpatialData object spatial_element_name as either a str or a list of str the table name as a str the parameter how which indicates what kind of SQL like operation to perform (left, left_exclusive, inner, right or right_exclusive). the match_rows argument, that you can use to match either to the indices of the SpatialElement (left) or the instance_key column of the table (right). The default here is no, so no matching. It is also possible to avoid passing the SpatialData object and directly pass a list of SpatialElements and a list of their names, please see the documentation for all the details. Let us now showcase the function by first removing some indices from blobs_polygonsand then performing a join. from spatialdata import join_spatialelement_table # This leaves the element with 3 shapes sdata["blobs_polygons"] = sdata["blobs_polygons"][:3] # We can now do a join with one spatial element element_dict, table = join_spatialelement_table( sdata=sdata, spatial_element_names="blobs_polygons", table_name="annotations", how="left" ) print(element_dict["blobs_polygons"]) table.obs geometry 0 POLYGON ((340.197 258.214, 316.177 197.065, 29... 1 POLYGON ((284.141 420.454, 267.249 371.319, 25... 2 POLYGON ((203.195 229.528, 285.506 204.414, 19... region instance_id 1 blobs_polygons 1 2 blobs_polygons 2 0 blobs_polygons 0 Above we see that the table only contains those annotations corresponding to shapes that are still in blobs_polygons. The element_dict only contains SpatialElements used in the join. Let us now repeat the join but with the table rows matching the indices of blobs_polygons. element_dict, table = join_spatialelement_table( sdata=sdata, spatial_element_names="blobs_polygons", table_name="annotations", how="left", match_rows="left" ) print(element_dict["blobs_polygons"]) table.obs geometry 0 POLYGON ((340.197 258.214, 316.177 197.065, 29... 1 POLYGON ((284.141 420.454, 267.249 371.319, 25... 2 POLYGON ((203.195 229.528, 285.506 204.414, 19... region instance_id 0 blobs_polygons 0 1 blobs_polygons 1 2 blobs_polygons 2 Let us now add the filtered annotations back to the SpatialData object. Since the new table is a subset of the original one, the region value may now have gone out of sync. We provide a convenience function that recompute the correct value for region. When in a future release we remove the use of region this extra step will not be needed. table = sd.update_annotated_regions_metadata(table) sdata["filtered_annotations_blobs_polygons"] = table Lastly, we can also do the join operation on multiple SpatialElements at the same time. element_dict, table = join_spatialelement_table( sdata=sdata, spatial_element_names=["blobs_polygons", "blobs_points"], table_name="annotations", how="left", match_rows="left", ) sdata["multi_filtered_table"] = table table.obs region instance_id 0 blobs_polygons 0 1 blobs_polygons 1 2 blobs_polygons 2 5 blobs_points 0 6 blobs_points 1 ... ... ... 200 blobs_points 195 201 blobs_points 196 202 blobs_points 197 203 blobs_points 198 204 blobs_points 199 203 rows × 2 columns from pprint import pprint # This tells us which tables we have in the SpatialData object pprint(sdata.tables) {'annotations': AnnData object with n_obs × n_vars = 205 × 1 obs: 'region', 'instance_id' uns: 'spatialdata_attrs', 'codebook': AnnData object with n_obs × n_vars = 3 × 0 obs: 'Gene', 'barcode', 'filtered_annotations_blobs_polygons': AnnData object with n_obs × n_vars = 3 × 1 obs: 'region', 'instance_id' uns: 'spatialdata_attrs', 'multi_filtered_table': AnnData object with n_obs × n_vars = 203 × 1 obs: 'region', 'instance_id' uns: 'spatialdata_attrs', 'table': AnnData object with n_obs × n_vars = 26 × 3 obs: 'instance_id', 'region' uns: 'spatialdata_attrs'} So far we have shown that you can do join operations when the elements and the table are both in a SpatialData object. However, you can also perform this operation on SpatialElements / tables outside SpatialData objects: element_dict, table = join_spatialelement_table( spatial_element_names=["blobs_polygons", "blobs_points"], spatial_elements=[sdata["blobs_polygons"], sdata["blobs_points"]], table=sdata["annotations"], how="left", match_rows="left", ) sdata["multi_filtered_table"] = table table.obs region instance_id 0 blobs_polygons 0 1 blobs_polygons 1 2 blobs_polygons 2 5 blobs_points 0 6 blobs_points 1 ... ... ... 200 blobs_points 195 201 blobs_points 196 202 blobs_points 197 203 blobs_points 198 204 blobs_points 199 203 rows × 2 columns A note on joins# For the joins on Shapes and Points elements any type of join is supported and also any kind of matching is supported. For Labels elements however, only the left join is supported and also only no and left are supported for the argument match_rows. This because for Labels the SQL like join behaviour would otherwise be complex to implement due to masking out of particular labels potentially being an expensive operation. We also did not forsee a usecase for this. In case you have a use case for this, please get in touch with us by either opening an issue on github or via our discourse. A use case for table# We will now end the presentation on annotations with a high-level discussion of a use case that could be addressed with our framework. Suppose to have a microscopy image showing cells and to derive two cell segmentation masks from it, using two different algorithms. The segmentation masks may be similar but not identical: some cells may greatly overlap but others may be present in one segmentation but not in the other. Also, you may be interested in storing some annotations for the cells of the two segmentation masks, such as gene expression and image features. You may want to compare the two segmentation masks based on spatial overlap and match/subset the tables to the common instances. This use case can be addresed using the SOPA, a tool relying on the spatialdata infrastructure. In particular, the use case described above require the following steps, available in spatialdata and/or in SOPA: store the microscopy image and the two segmentation masks (labels objects) store the gene expression and image feature information for each of the two segmentation masks (implemented in SOPA) convert the segmentation masks to shapes element (GeoDataFrame objects of polygons/multipolygons) change the annotation target of the tables from the labels to the newly created shapes elements (implemented in SOPA) match the polygonal segmentation masks based on spatial overlap using the geopandas APIs filter/match the tables with the new set of common cells using the spatialdata join operations previous II. Use SpatialData with your data: SpatialElements and tables next Create SpatialData objects from scratch Contents Introduction Store annotations in a SpatialElement Labels Shapes (circles, polygons/multipolygons) Points Images Store annotations with AnnData tables Table metadata (annotation targets) Retrieving annotations with get_values() Where to store the annotation Operations on tables Find which elements a table is annotating Change the annotation target of table Construct a table annotating 0 SpatialElements Construct a table annotating 1 or more SpatialElements Performing SQL like joins A note on joins A use case for table
SpatialElement
Pattern 3: If we want to retrieve either of the three, there are three helper functions that allow for this, namely get_annotated_regions, get_region_key_column and get_instance_key_column. Either of these take only the AnnDatatable as an argument. Below we show an example:
get_annotated_regions
Pattern 4: .md .pdf Models Contents Image2DModel Image2DModel.parse() Image2DModel.validate() Image3DModel Image3DModel.parse() Image3DModel.validate() Labels2DModel Labels2DModel.parse() Labels2DModel.validate() Labels3DModel Labels3DModel.parse() Labels3DModel.validate() ShapesModel ShapesModel.validate() ShapesModel.validate_shapes_not_mixed_types() ShapesModel.parse() PointsModel PointsModel.validate() PointsModel.parse() TableModel TableModel.parse() TableModel.validate() Models# The elements (building-blocks) that constitute SpatialData. class spatialdata.models.Image2DModel(*args, **kwargs)# Bases: RasterSchema classmethod parse(data, dims=None, c_coords=None, transformations=None, scale_factors=None, method=None, chunks=None, **kwargs)# Validate (or parse) raster data. Parameters: data (ndarray[tuple[Any, ...], dtype[floating[Any]]] | DataArray | Array) – Data to validate (or parse). The shape of the data should be c(z)yx for 2D (3D) images and (z)yx for 2D ( 3D) labels. If you have a 2D image with shape yx, you can use numpy.expand_dims() (or an equivalent function) to add a channel dimension. dims (Sequence[str] | None (default: None)) – Dimensions of the data (e.g. [‘c’, ‘y’, ‘x’] for 2D image data). If the data is a xarray.DataArray, the dimensions can also be inferred from the data. If the dimensions are not in the order (c)(z)yx, the data will be transposed to match the order. c_coords (str | list[str] | None) – Channel names of image data. Must be equal to the length of dimension ‘c’. Only supported for Image models. transformations (dict[str, BaseTransformation] | None (default: None)) – Dictionary of transformations to apply to the data. The key is the name of the target coordinate system, the value is the transformation to apply. By default, a single Identity transformation mapping to the "global" coordinate system is applied. scale_factors (Sequence[dict[str, int] | int] | None (default: None)) – Scale factors to apply to construct a multiscale image (datatree.DataTree). If None, a xarray.DataArray is returned instead. Importantly, each scale factor is relative to the previous scale factor. For example, if the scale factors are [2, 2, 2], the returned multiscale image will have 4 scales. The original image and then the 2x, 4x and 8x downsampled images. method (Methods | None (default: None)) – Method to use for multiscale downsampling (default is 'nearest'). Please refer to multiscale_spatial_image.to_multiscale for details.n Note (advanced): the default choice ('nearest') will keep the original scale lazy and compute each downscaled version. On the other hand 'xarray_coarsen' will compute each scale lazily (this implies that each scale will be recomputed each time it is accessed unless .persist() is manually called to cache the intermediate results). Please refer direclty to the source code of to_multiscale() in the multiscale-spatial-image for precise information on how this is handled. chunks (int | tuple[int, ...] | tuple[tuple[int, ...], ...] | Mapping[Any, None | int | tuple[int, ...]] | None (default: None)) – Chunks to use for dask array. kwargs (Any) – Additional arguments for to_spatial_image(). In particular the c_coords kwargs argument (an iterable) can be used to set the channel coordinates for image data. c_coords is not available for labels data as labels do not have channels. Return type: DataArray | DataTree Returns: : xarray.DataArray or datatree.DataTree Notes RGB images If you have an image with 3 or 4 channels and you want to interpret it as an RGB or RGB(A) image, you can use the c_coords argument to specify the channel coordinates as ["r", "g", "b"] or ["r", "g", "b", "a"]. You can also pass the rgb argument to kwargs to automatically set the c_coords to ["r", "g", "b"]. Please refer to to_spatial_image() for more information. Note: if you set rgb=None in kwargs, 3-4 channel images will be interpreted automatically as RGB(A) images. Setting axes / dims In case of the data being a numpy or dask array, there are no named axes yet. In this case, we first try to use the dimensions specified by the user in the dims argument of .parse. These dimensions are used to potentially transpose the data to match the order (c)(z)yx. See the description of the dims argument above. If dims is not specified, the dims are set to (c)(z)yx, dependent on the number of dimensions of the data. validate(data)# Validate data. Parameters: data (Any) – Data to validate. Raises: ValueError – If data is not valid. Return type: None class spatialdata.models.Image3DModel(*args, **kwargs)# Bases: RasterSchema classmethod parse(data, dims=None, c_coords=None, transformations=None, scale_factors=None, method=None, chunks=None, **kwargs)# Validate (or parse) raster data. Parameters: data (ndarray[tuple[Any, ...], dtype[floating[Any]]] | DataArray | Array) – Data to validate (or parse). The shape of the data should be c(z)yx for 2D (3D) images and (z)yx for 2D ( 3D) labels. If you have a 2D image with shape yx, you can use numpy.expand_dims() (or an equivalent function) to add a channel dimension. dims (Sequence[str] | None (default: None)) – Dimensions of the data (e.g. [‘c’, ‘y’, ‘x’] for 2D image data). If the data is a xarray.DataArray, the dimensions can also be inferred from the data. If the dimensions are not in the order (c)(z)yx, the data will be transposed to match the order. c_coords (str | list[str] | None) – Channel names of image data. Must be equal to the length of dimension ‘c’. Only supported for Image models. transformations (dict[str, BaseTransformation] | None (default: None)) – Dictionary of transformations to apply to the data. The key is the name of the target coordinate system, the value is the transformation to apply. By default, a single Identity transformation mapping to the "global" coordinate system is applied. scale_factors (Sequence[dict[str, int] | int] | None (default: None)) – Scale factors to apply to construct a multiscale image (datatree.DataTree). If None, a xarray.DataArray is returned instead. Importantly, each scale factor is relative to the previous scale factor. For example, if the scale factors are [2, 2, 2], the returned multiscale image will have 4 scales. The original image and then the 2x, 4x and 8x downsampled images. method (Methods | None (default: None)) – Method to use for multiscale downsampling (default is 'nearest'). Please refer to multiscale_spatial_image.to_multiscale for details.n Note (advanced): the default choice ('nearest') will keep the original scale lazy and compute each downscaled version. On the other hand 'xarray_coarsen' will compute each scale lazily (this implies that each scale will be recomputed each time it is accessed unless .persist() is manually called to cache the intermediate results). Please refer direclty to the source code of to_multiscale() in the multiscale-spatial-image for precise information on how this is handled. chunks (int | tuple[int, ...] | tuple[tuple[int, ...], ...] | Mapping[Any, None | int | tuple[int, ...]] | None (default: None)) – Chunks to use for dask array. kwargs (Any) – Additional arguments for to_spatial_image(). In particular the c_coords kwargs argument (an iterable) can be used to set the channel coordinates for image data. c_coords is not available for labels data as labels do not have channels. Return type: DataArray | DataTree Returns: : xarray.DataArray or datatree.DataTree Notes RGB images If you have an image with 3 or 4 channels and you want to interpret it as an RGB or RGB(A) image, you can use the c_coords argument to specify the channel coordinates as ["r", "g", "b"] or ["r", "g", "b", "a"]. You can also pass the rgb argument to kwargs to automatically set the c_coords to ["r", "g", "b"]. Please refer to to_spatial_image() for more information. Note: if you set rgb=None in kwargs, 3-4 channel images will be interpreted automatically as RGB(A) images. Setting axes / dims In case of the data being a numpy or dask array, there are no named axes yet. In this case, we first try to use the dimensions specified by the user in the dims argument of .parse. These dimensions are used to potentially transpose the data to match the order (c)(z)yx. See the description of the dims argument above. If dims is not specified, the dims are set to (c)(z)yx, dependent on the number of dimensions of the data. validate(data)# Validate data. Parameters: data (Any) – Data to validate. Raises: ValueError – If data is not valid. Return type: None class spatialdata.models.Labels2DModel(*args, **kwargs)# Bases: RasterSchema classmethod parse(*args, **kwargs)# Validate (or parse) raster data. Parameters: data – Data to validate (or parse). The shape of the data should be c(z)yx for 2D (3D) images and (z)yx for 2D ( 3D) labels. If you have a 2D image with shape yx, you can use numpy.expand_dims() (or an equivalent function) to add a channel dimension. dims – Dimensions of the data (e.g. [‘c’, ‘y’, ‘x’] for 2D image data). If the data is a xarray.DataArray, the dimensions can also be inferred from the data. If the dimensions are not in the order (c)(z)yx, the data will be transposed to match the order. c_coords (str | list[str] | None) – Channel names of image data. Must be equal to the length of dimension ‘c’. Only supported for Image models. transformations – Dictionary of transformations to apply to the data. The key is the name of the target coordinate system, the value is the transformation to apply. By default, a single Identity transformation mapping to the "global" coordinate system is applied. scale_factors – Scale factors to apply to construct a multiscale image (datatree.DataTree). If None, a xarray.DataArray is returned instead. Importantly, each scale factor is relative to the previous scale factor. For example, if the scale factors are [2, 2, 2], the returned multiscale image will have 4 scales. The original image and then the 2x, 4x and 8x downsampled images. method – Method to use for multiscale downsampling (default is 'nearest'). Please refer to multiscale_spatial_image.to_multiscale for details.n Note (advanced): the default choice ('nearest') will keep the original scale lazy and compute each downscaled version. On the other hand 'xarray_coarsen' will compute each scale lazily (this implies that each scale will be recomputed each time it is accessed unless .persist() is manually called to cache the intermediate results). Please refer direclty to the source code of to_multiscale() in the multiscale-spatial-image for precise information on how this is handled. chunks – Chunks to use for dask array. kwargs (Any) – Additional arguments for to_spatial_image(). In particular the c_coords kwargs argument (an iterable) can be used to set the channel coordinates for image data. c_coords is not available for labels data as labels do not have channels. Return type: DataArray | DataTree Returns: : xarray.DataArray or datatree.DataTree Notes RGB images If you have an image with 3 or 4 channels and you want to interpret it as an RGB or RGB(A) image, you can use the c_coords argument to specify the channel coordinates as ["r", "g", "b"] or ["r", "g", "b", "a"]. You can also pass the rgb argument to kwargs to automatically set the c_coords to ["r", "g", "b"]. Please refer to to_spatial_image() for more information. Note: if you set rgb=None in kwargs, 3-4 channel images will be interpreted automatically as RGB(A) images. Setting axes / dims In case of the data being a numpy or dask array, there are no named axes yet. In this case, we first try to use the dimensions specified by the user in the dims argument of .parse. These dimensions are used to potentially transpose the data to match the order (c)(z)yx. See the description of the dims argument above. If dims is not specified, the dims are set to (c)(z)yx, dependent on the number of dimensions of the data. validate(data)# Validate data. Parameters: data (Any) – Data to validate. Raises: ValueError – If data is not valid. Return type: None class spatialdata.models.Labels3DModel(*args, **kwargs)# Bases: RasterSchema classmethod parse(*args, **kwargs)# Validate (or parse) raster data. Parameters: data – Data to validate (or parse). The shape of the data should be c(z)yx for 2D (3D) images and (z)yx for 2D ( 3D) labels. If you have a 2D image with shape yx, you can use numpy.expand_dims() (or an equivalent function) to add a channel dimension. dims – Dimensions of the data (e.g. [‘c’, ‘y’, ‘x’] for 2D image data). If the data is a xarray.DataArray, the dimensions can also be inferred from the data. If the dimensions are not in the order (c)(z)yx, the data will be transposed to match the order. c_coords (str | list[str] | None) – Channel names of image data. Must be equal to the length of dimension ‘c’. Only supported for Image models. transformations – Dictionary of transformations to apply to the data. The key is the name of the target coordinate system, the value is the transformation to apply. By default, a single Identity transformation mapping to the "global" coordinate system is applied. scale_factors – Scale factors to apply to construct a multiscale image (datatree.DataTree). If None, a xarray.DataArray is returned instead. Importantly, each scale factor is relative to the previous scale factor. For example, if the scale factors are [2, 2, 2], the returned multiscale image will have 4 scales. The original image and then the 2x, 4x and 8x downsampled images. method – Method to use for multiscale downsampling (default is 'nearest'). Please refer to multiscale_spatial_image.to_multiscale for details.n Note (advanced): the default choice ('nearest') will keep the original scale lazy and compute each downscaled version. On the other hand 'xarray_coarsen' will compute each scale lazily (this implies that each scale will be recomputed each time it is accessed unless .persist() is manually called to cache the intermediate results). Please refer direclty to the source code of to_multiscale() in the multiscale-spatial-image for precise information on how this is handled. chunks – Chunks to use for dask array. kwargs (Any) – Additional arguments for to_spatial_image(). In particular the c_coords kwargs argument (an iterable) can be used to set the channel coordinates for image data. c_coords is not available for labels data as labels do not have channels. Return type: DataArray | DataTree Returns: : xarray.DataArray or datatree.DataTree Notes RGB images If you have an image with 3 or 4 channels and you want to interpret it as an RGB or RGB(A) image, you can use the c_coords argument to specify the channel coordinates as ["r", "g", "b"] or ["r", "g", "b", "a"]. You can also pass the rgb argument to kwargs to automatically set the c_coords to ["r", "g", "b"]. Please refer to to_spatial_image() for more information. Note: if you set rgb=None in kwargs, 3-4 channel images will be interpreted automatically as RGB(A) images. Setting axes / dims In case of the data being a numpy or dask array, there are no named axes yet. In this case, we first try to use the dimensions specified by the user in the dims argument of .parse. These dimensions are used to potentially transpose the data to match the order (c)(z)yx. See the description of the dims argument above. If dims is not specified, the dims are set to (c)(z)yx, dependent on the number of dimensions of the data. validate(data)# Validate data. Parameters: data (Any) – Data to validate. Raises: ValueError – If data is not valid. Return type: None class spatialdata.models.ShapesModel# Bases: object classmethod validate(data)# Validate data. Parameters: data (GeoDataFrame) – geopandas.GeoDataFrame to validate. Return type: None Returns: : None classmethod validate_shapes_not_mixed_types(gdf)# Check that the Shapes element is either composed of Point or Polygon/MultiPolygon. Parameters: gdf (GeoDataFrame) – The Shapes element. Raises: ValueError – When the geometry column composing the object does not satisfy the type requirements. Return type: None Notes This function is not called by ShapesModel.validate() because computing the unique types by default could be expensive. classmethod parse(data, **kwargs)# Parse shapes data. Parameters: data (Any) – Data to parse: If numpy.ndarray, it assumes the shapes are parsed as ragged arrays, in case of shapely.Polygon or shapely.MultiPolygon. Therefore additional arguments offsets and geometry must be provided if Path or str, it’s read as a GeoJSON file. If geopandas.GeoDataFrame, it’s validated. The object needs to have a column called geometry which is a geopandas.GeoSeries or shapely objects. Valid options are combinations of shapely.Polygon or shapely.MultiPolygon or shapely.Point. If the geometries are Point, there must be another column called radius. geometry – Geometry type of the shapes. The following geometries are supported: 0: Circles 3: Polygon 6: MultiPolygon offsets – In the case of shapely.Polygon or shapely.MultiPolygon shapes, in order to initialize the shapes from their ragged array representation, the offsets of the polygons must be provided. Alternatively you can call the parser as ShapesModel.parse(data), where data is a GeoDataFrame object and ignore the offset parameter (recommended). radius – Size of the Circles. It must be provided if the shapes are Circles. index – Index of the shapes, must be of type str. If None, it’s generated automatically. transformations – Transformations of shapes. kwargs (Any) – Additional arguments for GeoJSON reader. Return type: GeoDataFrame Returns: : geopandas.GeoDataFrame class spatialdata.models.PointsModel# Bases: object classmethod validate(data)# Validate data. Parameters: data (DataFrame) – dask.dataframe.core.DataFrame to validate. Return type: None Returns: : None classmethod parse(data, **kwargs)# Validate (or parse) points data. Parameters: data (Any) – Data to parse: If numpy.ndarray, an annotation pandas.DataFrame can be provided, as well as a feature_key column in the annotation dataframe. Furthermore, numpy.ndarray is assumed to have shape (n_points, axes), with axes being “x”, “y” and optionally “z”. If pandas.DataFrame, a coordinates mapping can be provided with key as valid axes (‘x’, ‘y’, ‘z’) and value as column names in dataframe. If the dataframe already has columns named ‘x’, ‘y’ and ‘z’, the mapping can be omitted. annotation – Annotation dataframe. Only if data is numpy.ndarray. If data is an array, the index of the annotations will be used as the index of the parsed points. coordinates – Mapping of axes names (keys) to column names (valus) in data. Only if data is pandas.DataFrame. Example: {‘x’: ‘my_x_column’, ‘y’: ‘my_y_column’}. If not provided and data is pandas.DataFrame, and x, y and optionally z are column names, then they will be used as coordinates. feature_key – Optional, feature key in annotation or data. Example use case: gene id categorical column describing the gene identity of each point. instance_key – Optional, instance key in annotation or data. Example use case: cell id column, describing which cell a point belongs to. This argument is likely going to be deprecated: scverse/spatialdata#503. transformations – Transformations of points. kwargs (Any) – Additional arguments for dask.dataframe.from_array(). Return type: DataFrame Returns: : dask.dataframe.core.DataFrame Notes The order of the columns of the dataframe returned by the parser is not guaranteed to be the same as the order of the columns in the dataframe passed as an argument. class spatialdata.models.TableModel# Bases: object classmethod parse(adata, region=None, region_key=None, instance_key=None, overwrite_metadata=False)# Parse the anndata.AnnData to be compatible with the model. Parameters: adata (AnnData) – The AnnData object. region (str | list[str] | None (default: None)) – Region(s) to be used. region_key (str | None (default: None)) – Key in adata.obs that specifies the region. instance_key (str | None (default: None)) – Key in adata.obs that specifies the instance. overwrite_metadata (bool (default: False)) – If True, the region, region_key and instance_key metadata will be overwritten. Return type: AnnData Returns: : The parsed data. validate(data)# Validate the data. Parameters: data (AnnData) – The data to validate. Return type: AnnData Returns: : The validated data. previous Data Loaders next Models utils Contents Image2DModel Image2DModel.parse() Image2DModel.validate() Image3DModel Image3DModel.parse() Image3DModel.validate() Labels2DModel Labels2DModel.parse() Labels2DModel.validate() Labels3DModel Labels3DModel.parse() Labels3DModel.validate() ShapesModel ShapesModel.validate() ShapesModel.validate_shapes_not_mixed_types() ShapesModel.parse() PointsModel PointsModel.validate() PointsModel.parse() TableModel TableModel.parse() TableModel.validate()
Image2DModel
Pattern 5: coordinates – Mapping of axes names (keys) to column names (valus) in data. Only if data is pandas.DataFrame. Example: {‘x’: ‘my_x_column’, ‘y’: ‘my_y_column’}. If not provided and data is pandas.DataFrame, and x, y and optionally z are column names, then they will be used as coordinates.
data
Reference Files
This skill includes comprehensive documentation in references/:
- api.md - Api documentation
- core_concepts.md - Core Concepts documentation
- other.md - Other documentation
- tutorials.md - Tutorials documentation
Use view to read specific reference files when detailed information is needed.
Working with This Skill
For Beginners
Start with the getting_started or tutorials reference files for foundational concepts.
For Specific Features
Use the appropriate category reference file (api, guides, etc.) for detailed information.
For Code Examples
The quick reference section above contains common patterns extracted from the official docs.
Resources
references/
Organized documentation extracted from official sources. These files contain:
- Detailed explanations
- Code examples with language annotations
- Links to original documentation
- Table of contents for quick navigation
scripts/
Add helper scripts here for common automation tasks.
assets/
Add templates, boilerplate, or example projects here.
Notes
- This skill was automatically generated from official documentation
- Reference files preserve the structure and examples from source docs
- Code examples include language detection for better syntax highlighting
- Quick reference patterns are extracted from common usage examples in the docs
Updating
To refresh this skill with updated documentation:
- Re-run the scraper with the same configuration
- The skill will be rebuilt with the latest information