Integer ranges with pd.RangeIndex

Integer ranges with pd.RangeIndex#

Highlights#

  1. Like other pandas Index types, a pandas.RangeIndex object may wrapped in an xarray.indexes.PandasIndex.

  2. Unlike other pandas Index types, we always want to assign a pandas.RangeIndex directly instead of setting it from an existing coordinate variable.

  3. Xarray preserves the memory-saving pandas.RangeIndex structure by wrapping it in a lazy coordinate variable instead of a fully materialized array.

Example#

Assigning#

import pandas as pd
import xarray as xr
idx = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000), dim="x")

ds = xr.Dataset(coords=xr.Coordinates.from_xindex(idx))
ds
<xarray.Dataset> Size: 8MB
Dimensions:  (x: 1000000)
Coordinates:
  * x        (x) int64 8MB 0 1 2 3 4 5 ... 999995 999996 999997 999998 999999
Data variables:
    *empty*

Lazy coordinate#

The x coordinate variable associated with the range index is lazy (i.e., all array values are not fully materialized in memory).

ds.x
<xarray.DataArray 'x' (x: 1000000)> Size: 8MB
[1000000 values with dtype=int64]
Coordinates:
  * x        (x) int64 8MB 0 1 2 3 4 5 ... 999995 999996 999997 999998 999999

Important

ds.x.values will materialize all values in-memory! x may behave like a โ€œcoordinate variable bombโ€ ๐Ÿ’ฃ.

Indexing#

Slicing along the x dimension preserves the range index โ€“ although with a new range โ€“ and keeps a lazy associated coordinate variable.

sliced = ds.isel(x=slice(1_000, 50_000, 100))

sliced.x
<xarray.DataArray 'x' (x: 490)> Size: 4kB
[490 values with dtype=int64]
Coordinates:
  * x        (x) int64 4kB 1000 1100 1200 1300 1400 ... 49600 49700 49800 49900
sliced.xindexes["x"]
PandasIndex(RangeIndex(start=1000, stop=50000, step=100, name='x'))

Indexing with arbitrary values along the same dimension converts the underlying pandas index type (this is all handled by pandas).

indexed = ds.isel(x=[10, 55, 124, 265])

indexed.x
<xarray.DataArray 'x' (x: 4)> Size: 32B
array([ 10,  55, 124, 265])
Coordinates:
  * x        (x) int64 32B 10 55 124 265
indexed.xindexes["x"]
PandasIndex(Index([10, 55, 124, 265], dtype='int64', name='x'))