Floating point ranges with RangeIndex

Floating point ranges with RangeIndex#

Highlights#

  1. Pandas has no equivalent of pandas.RangeIndex for floating point ranges. Fortunately, there is xarray.indexes.RangeIndex that works with real numbers.

  2. Xarray’s RangeIndex is built on top of xarray.indexes.CoordinateTransformIndex (see Functional transformations with CoordinateTransformIndex) and therefore supports very large ranges represented as lazy coordinate variables.

Example#

Assigning#

import xarray as xr

Using xarray.indexes.RangeIndex.arange().

idx1 = xr.indexes.RangeIndex.arange(0.0, 1000.0, 1e-9, dim="x")

ds1 = xr.Dataset(coords=xr.Coordinates.from_xindex(idx1))
ds1
<xarray.Dataset> Size: 8TB
Dimensions:  (x: 1000000000000)
Coordinates:
  * x        (x) float64 8TB 0.0 1e-09 2e-09 3e-09 ... 1e+03 1e+03 1e+03 1e+03
Data variables:
    *empty*
Indexes:
    x        RangeIndex (start=0, stop=1e+03, step=1e-09)

Using xarray.indexes.RangeIndex.linspace().

idx2 = xr.indexes.RangeIndex.linspace(
    0.0, 1000.0, 1_000_000_000_000, dim="x"
)

ds2 = xr.Dataset(coords=xr.Coordinates.from_xindex(idx2))
ds2
<xarray.Dataset> Size: 8TB
Dimensions:  (x: 1000000000000)
Coordinates:
  * x        (x) float64 8TB 0.0 1e-09 2e-09 3e-09 ... 1e+03 1e+03 1e+03 1e+03
Data variables:
    *empty*
Indexes:
    x        RangeIndex (start=0, stop=1e+03, step=1e-09)

Lazy coordinate#

The x coordinate variable associated with the range index is lazy (i.e., all array values are not fully materialized in memory).

ds1.x
<xarray.DataArray 'x' (x: 1000000000000)> Size: 8TB
[1000000000000 values with dtype=float64]
Coordinates:
  * x        (x) float64 8TB 0.0 1e-09 2e-09 3e-09 ... 1e+03 1e+03 1e+03 1e+03
Indexes:
    x        RangeIndex (start=0, stop=1e+03, step=1e-09)

If materialized, this would be a very large array!

ds1.x.nbytes / 1024**4  # 7TB!
7.275957614183426

Important

ds.x.values will materialize all values in-memory! x may behave like a “coordinate variable bomb” 💣.

Indexing#

Slicing along the x dimension preserves the range index – although with a new range – and keeps a lazy associated coordinate variable.

sliced = ds1.isel(x=slice(1_000, 50_000, 100))

sliced.x
<xarray.DataArray 'x' (x: 490)> Size: 4kB
[490 values with dtype=float64]
Coordinates:
  * x        (x) float64 4kB 1e-06 1.1e-06 1.2e-06 ... 4.98e-05 4.99e-05
Indexes:
    x        RangeIndex (start=1e-06, stop=5e-05, step=1e-07)