Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Direct Coords Approach

Instead of a IntervalIndex, we can encode metadata as coordinates that share the time dimension.

Setup

import numpy as np
import xarray as xr

Creating the Dataset

We create a dataset where word is a coordinate on the time dimension.

# Create sample data
T = 1000
C = 2
times = np.linspace(0, 120, T)
data = np.random.rand(C, T)

# Define word boundaries
breaks = np.array([0, 333, 666, 1000])

# Create word labels for each time point
words = np.array(["red"] * T)
words[breaks[0] : breaks[1]] = "red"
words[breaks[1] : breaks[2]] = "green"
words[breaks[2] :] = "blue"

# Create Dataset with word as a coord on the time dimension
ds = xr.Dataset(
    {"data": (("C", "time"), data)}, coords={"time": times, "word": ("time", words)}
).set_xindex("word")
ds

What Works

Time slicing:

ds.sel(time=slice(0.15, 15.5))

Selection by word label works:

ds.sel(word="red")

Limitations

1. Annoying to construct

The natural representation of metadata is often: onset, duration, word. To create the dense array we need to manually expand this into a value for every time point.

2. No isel by word

Since word is on the time dimension, there’s no word dimension to index into:

# This doesn't work - word is not a dimension
try:
    ds.isel(word=0)
except ValueError as e:
    print(f"Error: {e}")

3. Interval info is obscured

Important questions become hard to answer:

  • What was the total duration of the 3rd word?

  • What are the exact interval boundaries?

4. Constrained to measurement time points

If metadata events happen at times not in your measurement grid, you lose that precision. For example, if you sample monthly but an event happened mid-month, you can’t represent that exactly.

5. Can’t drop coord as index when it becomes scalar

When a selection reduces the coordinate to a single value, you can’t easily drop it from being an index:

# Selecting a single word returns all matching time points
result = ds.sel(word="red")
print(f"Result has {len(result.time)} time points")
print(f"word coord: {result.word.values}")

# But word is still an index on the time dimension
# If you want to drop it, you can't easily do so when it's scalar
# result.drop_indexes('word')  # This can cause issues

Advantages

Despite the limitations, this approach:

  • Clearly shows that word spans time

  • Uses only standard xarray features

  • Is simple to understand