Instead of a IntervalIndex, we can encode metadata as coordinates that share the time dimension.
Setup¶
import numpy as np
import xarray as xrCreating the Dataset¶
We create a dataset where word is a coordinate on the time dimension.
# Create sample data
T = 1000
C = 2
times = np.linspace(0, 120, T)
data = np.random.rand(C, T)
# Define word boundaries
breaks = np.array([0, 333, 666, 1000])
# Create word labels for each time point
words = np.array(["red"] * T)
words[breaks[0] : breaks[1]] = "red"
words[breaks[1] : breaks[2]] = "green"
words[breaks[2] :] = "blue"
# Create Dataset with word as a coord on the time dimension
ds = xr.Dataset(
{"data": (("C", "time"), data)}, coords={"time": times, "word": ("time", words)}
).set_xindex("word")
dsds.sel(time=slice(0.15, 15.5))Selection by word label works:¶
ds.sel(word="red")Limitations¶
1. Annoying to construct¶
The natural representation of metadata is often: onset, duration, word. To create the dense array we need to manually expand this into a value for every time point.
2. No isel by word¶
Since word is on the time dimension, there’s no word dimension to index into:
# This doesn't work - word is not a dimension
try:
ds.isel(word=0)
except ValueError as e:
print(f"Error: {e}")3. Interval info is obscured¶
Important questions become hard to answer:
What was the total duration of the 3rd word?
What are the exact interval boundaries?
4. Constrained to measurement time points¶
If metadata events happen at times not in your measurement grid, you lose that precision. For example, if you sample monthly but an event happened mid-month, you can’t represent that exactly.
5. Can’t drop coord as index when it becomes scalar¶
When a selection reduces the coordinate to a single value, you can’t easily drop it from being an index:
# Selecting a single word returns all matching time points
result = ds.sel(word="red")
print(f"Result has {len(result.time)} time points")
print(f"word coord: {result.word.values}")
# But word is still an index on the time dimension
# If you want to drop it, you can't easily do so when it's scalar
# result.drop_indexes('word') # This can cause issuesAdvantages¶
Despite the limitations, this approach:
Clearly shows that word spans time
Uses only standard xarray features
Is simple to understand