Geometries and Vector Data Cubes with xvec.GeometryIndex

Geometries and Vector Data Cubes with xvec.GeometryIndex#

See also

Learn more at the xvec documentation.

Highlights#

Xvec’s use of custom indexes is exciting because it illustrates how a new Index can help define a new data model — vector data cube: “an n-D array that has either at least one dimension indexed by a 1-D array of vector geometries”.

  1. Indexing using geometries and associated predicates is supported using .sel

  2. A new .xvec accessor exposes additional querying functionality.

  3. Exposes complex functionality from other full-featured packages (e.g. shapely) to Xarray.

Example#

First we create a data cube with geometries

Hide code cell source

import geopandas as gpd
from geodatasets import get_path
import shapely
import xarray as xr
import xvec

xr.set_options(display_expand_indexes=True, display_expand_data=False, display_expand_attrs=False)

counties = gpd.read_file(get_path("geoda.natregimes"))

cube = xr.Dataset(
    data_vars=dict(
        population=(["county", "year"], counties[["PO60", "PO70", "PO80", "PO90"]]),
        unemployment=(["county", "year"], counties[["UE60", "UE70", "UE80", "UE90"]]),
        divorce=(["county", "year"], counties[["DV60", "DV70", "DV80", "DV90"]]),
        age=(["county", "year"], counties[["MA60", "MA70", "MA80", "MA90"]]),
    ),
    coords=dict(county=counties.geometry, year=[1960, 1970, 1980, 1990]),
)
cube
Downloading file 'natregimes.zip' from 'https://geodacenter.github.io/data-and-lab//data/natregimes.zip' to '/home/docs/.cache/geodatasets'.
<xarray.Dataset> Size: 370kB
Dimensions:       (county: 3085, year: 4)
Coordinates:
  * county        (county) geometry 25kB POLYGON ((-95.34258270263672 48.5467...
  * year          (year) int64 32B 1960 1970 1980 1990
Data variables:
    population    (county, year) int32 49kB 4304 3987 3764 ... 43766 55800 65077
    unemployment  (county, year) float64 99kB 7.9 9.0 5.903 ... 7.018 5.489
    divorce       (county, year) float64 99kB 1.859 2.62 3.747 ... 4.782 7.415
    age           (county, year) float64 99kB 28.8 30.5 34.5 ... 28.97 35.33

Note how the county dimension is associated with a geopandas.GeometryArray.

Assigning#

Now we can assign an xvec.GeometryIndex to county.

cube = cube.xvec.set_geom_indexes("county")
cube
<xarray.Dataset> Size: 370kB
Dimensions:       (county: 3085, year: 4)
Coordinates:
  * county        (county) geometry 25kB POLYGON ((-95.34258270263672 48.5467...
  * year          (year) int64 32B 1960 1970 1980 1990
Data variables:
    population    (county, year) int32 49kB 4304 3987 3764 ... 43766 55800 65077
    unemployment  (county, year) float64 99kB 7.9 9.0 5.903 ... 7.018 5.489
    divorce       (county, year) float64 99kB 1.859 2.62 3.747 ... 4.782 7.415
    age           (county, year) float64 99kB 28.8 30.5 34.5 ... 28.97 35.33
Indexes:
    county   GeometryIndex (crs=None)

Indexing#

Geometries as labels#

cube.sel(county=cube.county[0])
<xarray.Dataset> Size: 152B
Dimensions:       (year: 4)
Coordinates:
  * year          (year) int64 32B 1960 1970 1980 1990
    county        object 8B POLYGON ((-95.34258270263672 48.54670333862305, -...
Data variables:
    population    (year) int32 16B 4304 3987 3764 4076
    unemployment  (year) float64 32B 7.9 9.0 5.903 3.895
    divorce       (year) float64 32B 1.859 2.62 3.747 7.389
    age           (year) float64 32B 28.8 30.5 34.5 35.5

Complex spatial predicates#

Lets index to counties that intersect the provided bounding box

box = shapely.box(-125.4, 40, -120.0, 50)

subset = cube.sel(county=box, method="intersects")
subset
<xarray.Dataset> Size: 8kB
Dimensions:       (county: 63, year: 4)
Coordinates:
  * county        (county) geometry 504B POLYGON ((-121.61367797851562 39.305...
  * year          (year) int64 32B 1960 1970 1980 1990
Data variables:
    population    (county, year) int32 1kB 82030 101969 143851 ... 106701 127780
    unemployment  (county, year) float64 2kB 11.1 9.8 10.53 ... 7.5 9.947 4.811
    divorce       (county, year) float64 2kB 3.455 3.54 5.927 ... 6.056 8.076
    age           (county, year) float64 2kB 33.0 30.5 31.3 ... 27.0 28.6 32.7
Indexes:
    county   GeometryIndex (crs=None)

Hide code cell source

f, axes = subset.population.xvec.plot(col="year", robust=True)
for ax in axes.flat:
    ax.plot(*box.boundary.xy, color='w')
    ax.plot(*box.boundary.xy, color='k', lw=0.75)
../_images/e91745fbbe4ae3581ef7ba317d009d2813f79eb4f1595b22c26ba804af8d814e.png

Notice how we did that with xarray.DataArray.sel()?!