Intervals with pandas.IntervalIndex

Intervals with pandas.IntervalIndex#

See also

Learn more at the Pandas documentation.

Highlights#

  1. Xarray’s built-in support for pandas Index classes extends to more sophisticated classes like pandas.IntervalIndex.

  2. Xarray now generates such indexes automatically when using xarray.DataArray.groupby_bins() or xarray.Dataset.groupby_bins().

  3. Sadly pandas.IntervalIndex supports numpy datetimes but not cftime.

Important

A pandas IntervalIndex models intervals using a single variable. The Climate and Forecast Conventions, by contrast, model the intervals using two arrays: the intervals (“bounds” variable) and “central values”. See cfinterval for more.

Example#

Assigning#

%xmode minimal

import pandas as pd
import xarray as xr

xr.set_options(display_expand_indexes=True, display_expand_attrs=False)
pd.set_option('display.max_seq_items', 10)

orig = xr.tutorial.open_dataset("air_temperature")
orig
Exception reporting mode: Minimal
<xarray.Dataset> Size: 31MB
Dimensions:  (time: 2920, lat: 25, lon: 53)
Coordinates:
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
    air      (time, lat, lon) float64 31MB ...
Attributes: (5)

Let’s replace the time vector with an IntervalIndex, assuming that the data represent averages over 6 hour periods centered at 00h, 06h, 12h, 18h

left = orig.time.data - pd.Timedelta("3h")
right = orig.time.data + pd.Timedelta("3h")
time_bounds = pd.IntervalIndex.from_arrays(left, right, closed="left")
time_bounds
IntervalIndex([[2012-12-31 21:00:00, 2013-01-01 03:00:00),
               [2013-01-01 03:00:00, 2013-01-01 09:00:00),
               [2013-01-01 09:00:00, 2013-01-01 15:00:00),
               [2013-01-01 15:00:00, 2013-01-01 21:00:00),
               [2013-01-01 21:00:00, 2013-01-02 03:00:00),
               ...
               [2014-12-30 15:00:00, 2014-12-30 21:00:00),
               [2014-12-30 21:00:00, 2014-12-31 03:00:00),
               [2014-12-31 03:00:00, 2014-12-31 09:00:00),
               [2014-12-31 09:00:00, 2014-12-31 15:00:00),
               [2014-12-31 15:00:00, 2014-12-31 21:00:00)],
              dtype='interval[datetime64[ns], left]', length=2920)
indexed = orig.copy(deep=True)
indexed["time"] = time_bounds
indexed
<xarray.Dataset> Size: 31MB
Dimensions:  (time: 2920, lat: 25, lon: 53)
Coordinates:
  * time     (time) interval[datetime64[ns], left] 47kB [2012-12-31 21:00:00,...
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
    air      (time, lat, lon) float64 31MB ...
Attributes: (5)

Note the above object still shows the time coordinates has associated PandasIndex but the values are now represented in and “IntervalArray” (as indicated by interval[datetime64[ns], left])

Indexing#

Let’s index out a representative value for 2013-05-01 02:00.

orig.sel(time="2013-05-01 02:00")
KeyError: "not all values found in index 'time'. Try setting the `method` keyword argument (example: method='nearest')."

Indexing the original dataset required specifying method="nearest"

orig.sel(time="2013-05-01 02:00", method="nearest").time
<xarray.DataArray 'time' ()> Size: 8B
array('2013-05-01T00:00:00.000000000', dtype='datetime64[ns]')
Coordinates:
    time     datetime64[ns] 8B 2013-05-01
Attributes: (2)

With an IntervalIndex, however, that is unnecessary

indexed.sel(time="2013-05-01 02:00").time
<xarray.DataArray 'time' ()> Size: 8B
array(Interval(2013-04-30 21:00:00, 2013-05-01 03:00:00, closed='left'),
      dtype=object)
Coordinates:
    time     object 8B [2013-04-30 21:00:00, 2013-05-01 03:00:00)

Binned grouping#

Xarray now creates IntervalIndex by default for binned grouping operations

orig.groupby_bins("lat", bins=[25, 35, 45, 55]).mean()
<xarray.Dataset> Size: 4MB
Dimensions:   (lat_bins: 3, time: 2920, lon: 53)
Coordinates:
  * lat_bins  (lat_bins) interval[int64, right] 48B (25, 35] (35, 45] (45, 55]
  * time      (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
  * lon       (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Data variables:
    air       (lat_bins, time, lon) float64 4MB 291.2 291.5 ... 277.3 278.8
Attributes: (5)