The default PandasIndex#
Highlights#
xarray.indexes.PandasIndexcan wrap one dimensionalpandas.Indexobjects to allow indexing along 1D coordinate variables. These indexes can apply to both “dimension” coordinates and “non-dimension” coordinates.When opening or constructing a new Dataset or DataArray, Xarray creates by default a
xarray.indexes.PandasIndexfor each “dimension” coordinate.It is possible to either drop those default indexes or skip their creation.
Example#
Let’s open a tutorial dataset.
import xarray as xr
ds_air = xr.tutorial.open_dataset("air_temperature")
ds_air
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)It has created by default a PandasIndex for each of
the “lat”, “lon” and “time” dimension coordinates, as we can also see below via
the xarray.Dataset.xindexes property.
ds_air.xindexes
Indexes:
lat PandasIndex
lon PandasIndex
time PandasIndex
Those indexes are used under the hood for, e.g., label-based selection.
ds_air.sel(time="2013")
<xarray.Dataset> Size: 15MB
Dimensions: (time: 1460, lat: 25, lon: 53)
Coordinates:
* time (time) datetime64[ns] 12kB 2013-01-01 ... 2013-12-31T18:00:00
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 15MB ...
Attributes: (5)Set indexes for non-dimension coordinates#
Xarray does not automatically create an index for non-dimension coordinates like the “season (time)” coordinate added below.
ds_air.coords["season"] = ds_air.time.dt.season
ds_air
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
season (time) <U3 35kB 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF' 'DJF'
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)Without an index, it is not possible select data based on the “season” coordinate.
ds_air.sel(season="DJF")
<xarray.Dataset> Size: 8MB
Dimensions: (time: 720, lat: 25, lon: 53)
Coordinates:
* time (time) datetime64[ns] 6kB 2013-01-01 ... 2014-12-31T18:00:00
season (time) <U3 9kB 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF' 'DJF'
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 8MB ...
Attributes: (5)However, it is possible to manually set a PandasIndex for that 1-dimensional
coordinate.
ds_extra = ds_air.set_xindex("season", xr.indexes.PandasIndex)
ds_extra
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* season (time) <U3 35kB 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF' 'DJF'
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)Which now enables label-based selection.
ds_extra.sel(season="DJF")
<xarray.Dataset> Size: 8MB
Dimensions: (time: 720, lat: 25, lon: 53)
Coordinates:
* time (time) datetime64[ns] 6kB 2013-01-01 ... 2014-12-31T18:00:00
* season (time) <U3 9kB 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF' 'DJF'
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 8MB ...
Attributes: (5)It is not yet supported to provide labels to xarray.Dataset.sel() for
multiple index coordinates sharing common dimensions (unless those coordinates
also share the same index object, e.g., like shown in the PandasMultiIndex example).
ds_extra.sel(season="DJF", time="2013")
ValueError: Xarray does not support label-based selection with more than one index over the following dimension(s):
'time': 2 indexes involved
Suggestion: use a multi-index for each of those dimension(s).
Drop indexes#
Indexes are not always necessary and (re-)computing them may introduce some unwanted overhead.
The code line below drops the default indexes that have been created when opening the example dataset.
ds_air.drop_indexes(["time", "lat", "lon"])
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
season (time) <U3 35kB 'DJF' 'DJF' 'DJF' 'DJF' ... 'DJF' 'DJF' 'DJF' 'DJF'
lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)Skip the creation of default indexes#
Let’s re-open the example dataset above, this time with no index.
ds_air_no_index = xr.tutorial.open_dataset(
"air_temperature", create_default_indexes=False
)
ds_air_no_index
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
time (time) datetime64[ns] 23kB ...
lat (lat) float32 100B ...
lon (lon) float32 212B ...
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)Like xarray.open_dataset(), indexes are created by default for
dimension coordinates when constructing a new Dataset.
ds = xr.Dataset(coords={"x": [1, 2], "y": [3, 4, 5]})
ds
<xarray.Dataset> Size: 40B
Dimensions: (x: 2, y: 3)
Coordinates:
* x (x) int64 16B 1 2
* y (y) int64 24B 3 4 5
Data variables:
*empty*Also when assigning new coordinates.
ds.assign_coords(u=[10, 20])
<xarray.Dataset> Size: 56B
Dimensions: (x: 2, y: 3, u: 2)
Coordinates:
* x (x) int64 16B 1 2
* y (y) int64 24B 3 4 5
* u (u) int64 16B 10 20
Data variables:
*empty*To skip the creation of those default indexes, we need to explicitly create a
new xarray.Coordinates object and pass indexes={} (empty
dictionary).
coords = xr.Coordinates({"u": [10, 20]}, indexes={})
ds.assign_coords(coords)
<xarray.Dataset> Size: 56B
Dimensions: (x: 2, y: 3, u: 2)
Coordinates:
* x (x) int64 16B 1 2
* y (y) int64 24B 3 4 5
u (u) int64 16B 10 20
Data variables:
*empty*