Stack and unstack with PandasMultiIndex#
Highlights#
An
xarray.indexes.PandasMultiIndexis associated with multiple coordinate variables sharing the same dimension.Create PandasMultiIndex from PandasIndex using
xarray.Dataset.stack()and convert back withxarray.Dataset.unstack().Labels of coordinates associated with a PandasMultiIndex can be passed all at once to
.sel.
Example#
Letβs open a tutorial dataset.
import xarray as xr
ds_air = xr.tutorial.open_dataset("air_temperature")
ds_air
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)Stack / Unstack#
Stacking the βlatβ and βlonβ dimensions of the example dataset results here in
the corresponding βlatβ and βlonβ stacked coordinates both associated with a
PandasMultiIndex by default.
The underlying data are reshaped to collapse the lat and lon dimensions to a new space dimension.
stacked = ds_air.stack(space=("lat", "lon"))
stacked
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, space: 1325)
Coordinates:
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* space (space) object 11kB MultiIndex
* lat (space) float32 5kB 75.0 75.0 75.0 75.0 ... 15.0 15.0 15.0 15.0
* lon (space) float32 5kB 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, space) float64 31MB 241.2 242.5 243.5 ... 296.5 296.2 295.7
Attributes: (5)The multi-index allows retrieving the original, unstacked dataset where the
βlatβ and βlonβ dimension coordinates have their own PandasIndex.
unstacked = stacked.unstack("space")
unstacked
<xarray.Dataset> Size: 31MB
Dimensions: (lat: 25, lon: 53, time: 2920)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7
Attributes: (5)Assigning#
We can also directly associate a PandasMultiIndex
with existing coordinates sharing the same dimension.
ds_air = (
ds_air.assign_coords(season=ds_air.time.dt.season)
.rename_vars(time="datetime")
.drop_indexes("datetime")
)
ds_air
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
datetime (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
season (time) <U3 35kB 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF'
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Dimensions without coordinates: time
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)multi_indexed = ds_air.set_xindex(
["season", "datetime"], xr.indexes.PandasMultiIndex
)
multi_indexed
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
* time (time) object 23kB MultiIndex
* season (time) <U3 35kB 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF'
* datetime (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)Indexing#
Contrary to what is shown in the default PandasIndex example,
it is here possible to provide labels to xarray.Dataset.sel() for both
of the multi-index time coordinates.
multi_indexed.sel(season="DJF", datetime="2013")
<xarray.Dataset> Size: 4MB
Dimensions: (time: 360, lat: 25, lon: 53)
Coordinates:
* time (time) object 3kB MultiIndex
* season (time) <U3 4kB 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF' 'DJF'
* datetime (time) datetime64[ns] 3kB 2013-01-01 ... 2013-12-31T18:00:00
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 4MB ...
Attributes: (5)Chaining .sel calls for those coordinates each with their own index would
yield equivalent results, though.
single_indexed = ds_air.set_xindex("datetime").set_xindex("season")
single_indexed.sel(season="DJF").sel(datetime="2013")
<xarray.Dataset> Size: 4MB
Dimensions: (time: 360, lat: 25, lon: 53)
Coordinates:
* datetime (time) datetime64[ns] 3kB 2013-01-01 ... 2013-12-31T18:00:00
* season (time) <U3 4kB 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF' 'DJF'
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Dimensions without coordinates: time
Data variables:
air (time, lat, lon) float64 4MB ...
Attributes: (5)Assigning a pandas.MultiIndex#
It is easy to wrap an existing pandas.MultiIndex object into a new Xarray
Dataset or DataArray.
import pandas as pd
midx = pd.MultiIndex.from_product(
[["a", "b"], [1, 2]], names=("foo", "bar")
)
midx
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],
names=['foo', 'bar'])
This can be done via xarray.Coordinates.from_pandas_multiindex().
midx_coords = xr.Coordinates.from_pandas_multiindex(midx, dim="x")
ds = xr.Dataset(coords=midx_coords)
ds
<xarray.Dataset> Size: 96B
Dimensions: (x: 4)
Coordinates:
* x (x) object 32B MultiIndex
* foo (x) object 32B 'a' 'a' 'b' 'b'
* bar (x) int64 32B 1 2 1 2
Data variables:
*empty*