Integer ranges with pd.RangeIndex#
Highlights#
Like other pandas Index types, a
pandas.RangeIndexobject may wrapped in anxarray.indexes.PandasIndex.Unlike other pandas Index types, we always want to assign a
pandas.RangeIndexdirectly instead of setting it from an existing coordinate variable.Xarray preserves the memory-saving
pandas.RangeIndexstructure by wrapping it in a lazy coordinate variable instead of a fully materialized array.
Example#
Assigning#
idx = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000), dim="x")
ds = xr.Dataset(coords=xr.Coordinates.from_xindex(idx))
ds
<xarray.Dataset> Size: 8MB
Dimensions: (x: 1000000)
Coordinates:
* x (x) int64 8MB 0 1 2 3 4 5 ... 999995 999996 999997 999998 999999
Data variables:
*empty*Lazy coordinate#
The x coordinate variable associated with the range index is lazy (i.e., all
array values are not fully materialized in memory).
ds.x
<xarray.DataArray 'x' (x: 1000000)> Size: 8MB [1000000 values with dtype=int64] Coordinates: * x (x) int64 8MB 0 1 2 3 4 5 ... 999995 999996 999997 999998 999999
Important
ds.x.values will materialize all values in-memory! x may behave like a โcoordinate variable bombโ ๐ฃ.
Indexing#
Slicing along the x dimension preserves the range index โ although with a new
range โ and keeps a lazy associated coordinate variable.
sliced = ds.isel(x=slice(1_000, 50_000, 100))
sliced.x
<xarray.DataArray 'x' (x: 490)> Size: 4kB [490 values with dtype=int64] Coordinates: * x (x) int64 4kB 1000 1100 1200 1300 1400 ... 49600 49700 49800 49900
sliced.xindexes["x"]
PandasIndex(RangeIndex(start=1000, stop=50000, step=100, name='x'))
Indexing with arbitrary values along the same dimension converts the underlying pandas index type (this is all handled by pandas).
indexed = ds.isel(x=[10, 55, 124, 265])
indexed.x
<xarray.DataArray 'x' (x: 4)> Size: 32B array([ 10, 55, 124, 265]) Coordinates: * x (x) int64 32B 10 55 124 265
indexed.xindexes["x"]
PandasIndex(Index([10, 55, 124, 265], dtype='int64', name='x'))