Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

MultiIndex Approach

One common approach is to use a pandas MultiIndex to combine time and metadata into a single dimension.

Setup

%xmode minimal
import numpy as np
import pandas as pd
import xarray as xr

Creating the MultiIndex Dataset

We create a dataset where the time dimension has a MultiIndex with both time values and word labels.

# Create sample data
T = 1000
C = 2
times = np.linspace(0, 120, T)
data = np.random.rand(C, T)

# Define word boundaries
breaks = np.array([0, 333, 666, 1000])

# Create word labels for each time point
words = np.array(["red"] * T)
words[breaks[0] : breaks[1]] = "red"
words[breaks[1] : breaks[2]] = "green"
words[breaks[2] :] = "blue"

# Create MultiIndex
mdx = pd.MultiIndex.from_arrays([words, times], names=["word", "time"])

# Create xarray Dataset
ds = xr.DataArray(data, [("C", range(C)), ("T", mdx)]).to_dataset(name="data")
ds

Selection by Word

We can select by word label:

ds.sel(word="red")

Limitations

1. Time slicing is awkward

Slicing by time requires specifying the word level too, or using .loc patterns:

For float valued time slicing raises a ValueError!

# This doesn't work as expected:
ds.sel(time=slice(0, 50))  # TypeError
# Need to be more explicit
# but this will always return a copy, instead of a less expensive view.
ds.sel(T=ds.T[(ds.time >= 0.15) & (ds.time <= 55.8)])
# ds.sel(T=slice(.15, 55.8))

2. can’t isel the metadata coords

You might want to be able to select the 3rd word. That is not easy with this index.

# value error
ds.isel(word=0)

3. Interval boundaries are lost

The MultiIndex doesn’t preserve the actual interval boundaries - we can’t easily ask “what time range does word X span?” without computing it from the data.

4. Constrained to measurement time points

If metadata events happen at times not in your measurement grid, you lose that precision. For example, if you sample monthly but an event happened mid-month, you can’t represent that exactly.

5. Memory duplication

Each time point stores its word label, which is redundant information.