Intervals with pandas.IntervalIndex#
See also
Learn more at the Pandas documentation.
Highlights#
Xarray’s built-in support for pandas Index classes extends to more sophisticated classes like
pandas.IntervalIndex.Xarray now generates such indexes automatically when using
xarray.DataArray.groupby_bins()orxarray.Dataset.groupby_bins().Sadly
pandas.IntervalIndexsupports numpy datetimes but not cftime.
Important
A pandas IntervalIndex models intervals using a single variable. The Climate and Forecast Conventions, by contrast, model the intervals using two arrays: the intervals (“bounds” variable) and “central values”. See cfinterval for more.
Example#
Assigning#
%xmode minimal
import pandas as pd
import xarray as xr
xr.set_options(display_expand_indexes=True, display_expand_attrs=False)
pd.set_option('display.max_seq_items', 10)
orig = xr.tutorial.open_dataset("air_temperature")
orig
Exception reporting mode: Minimal
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)Let’s replace the time vector with an IntervalIndex, assuming that the data represent averages over 6 hour periods centered at 00h, 06h, 12h, 18h
left = orig.time.data - pd.Timedelta("3h")
right = orig.time.data + pd.Timedelta("3h")
time_bounds = pd.IntervalIndex.from_arrays(left, right, closed="left")
time_bounds
IntervalIndex([[2012-12-31 21:00:00, 2013-01-01 03:00:00),
[2013-01-01 03:00:00, 2013-01-01 09:00:00),
[2013-01-01 09:00:00, 2013-01-01 15:00:00),
[2013-01-01 15:00:00, 2013-01-01 21:00:00),
[2013-01-01 21:00:00, 2013-01-02 03:00:00),
...
[2014-12-30 15:00:00, 2014-12-30 21:00:00),
[2014-12-30 21:00:00, 2014-12-31 03:00:00),
[2014-12-31 03:00:00, 2014-12-31 09:00:00),
[2014-12-31 09:00:00, 2014-12-31 15:00:00),
[2014-12-31 15:00:00, 2014-12-31 21:00:00)],
dtype='interval[datetime64[ns], left]', length=2920)
indexed = orig.copy(deep=True)
indexed["time"] = time_bounds
indexed
<xarray.Dataset> Size: 31MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
* time (time) interval[datetime64[ns], left] 47kB [2012-12-31 21:00:00,...
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (time, lat, lon) float64 31MB ...
Attributes: (5)Note the above object still shows the time coordinates has associated PandasIndex but the values are now represented in and “IntervalArray” (as indicated by interval[datetime64[ns], left])
Indexing#
Let’s index out a representative value for 2013-05-01 02:00.
orig.sel(time="2013-05-01 02:00")
KeyError: "not all values found in index 'time'. Try setting the `method` keyword argument (example: method='nearest')."
Indexing the original dataset required specifying method="nearest"
orig.sel(time="2013-05-01 02:00", method="nearest").time
<xarray.DataArray 'time' ()> Size: 8B
array('2013-05-01T00:00:00.000000000', dtype='datetime64[ns]')
Coordinates:
time datetime64[ns] 8B 2013-05-01
Attributes: (2)With an IntervalIndex, however, that is unnecessary
indexed.sel(time="2013-05-01 02:00").time
<xarray.DataArray 'time' ()> Size: 8B
array(Interval(2013-04-30 21:00:00, 2013-05-01 03:00:00, closed='left'),
dtype=object)
Coordinates:
time object 8B [2013-04-30 21:00:00, 2013-05-01 03:00:00)Binned grouping#
Xarray now creates IntervalIndex by default for binned grouping operations
orig.groupby_bins("lat", bins=[25, 35, 45, 55]).mean()
<xarray.Dataset> Size: 4MB
Dimensions: (lat_bins: 3, time: 2920, lon: 53)
Coordinates:
* lat_bins (lat_bins) interval[int64, right] 48B (25, 35] (35, 45] (45, 55]
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
air (lat_bins, time, lon) float64 4MB 291.2 291.5 ... 277.3 278.8
Attributes: (5)