Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

NDPointIndex vs NDIndex

xarray has a built in NDPointIndex. Which is

useful for dealing with n-dimensional coordinate variables representing irregular data.

Which sounds like the same problem that NDIndex solves! However NDPointIndex and NDIndex actually solve distinct problems. and would be difficult to use to solve what is solved by the other.

NDPointIndex

NDPointIndex handles the case of allowing selecting by a distance metric for points distributed in N-dimensional space. For example if we data points distributed in 2 dimensions then each point will have 2 different annotaiton values associated iwth it (x, and y). This gives rise to NDPointIndex’s requirement to have the same number of coordinates as dimensions.

NDIndex

NDIndex solves works with annotations on top of regular gridded data. For example if we have a long 1 D dataset of continuous data collection over time (absolute time) that where every 5 seconds a new trial begins (e.g. a speech task in a neuroscience experiment) then we then break the data up into individual trials and keep track of the relative time in each trial then our data cube will have shape (trials, rel_time). If we want to keep track of the absolute time associated with each data point then that coordinate will be 2D coord (trials, rel_time). Or, more usefully we might have an event that happened at a different time in each trial (e.g. speech onset) and want to perform analysis relative to that point. For this we would nede a new coord event_locked_time which will be 2D as well.

event locking example

In this case we have a single annotation associated with each point in the data, even though it is 2D! i.e. each point has a unique event_locked_time. This would not be allowed by NDPointIndex

Differences

The fundamental difference is:

AspectNDPointIndexNDIndex
CoordinatesMultiple 2D coords that together define positionSingle N-D coord with derived values
Query typeSpatial: “find point at (x, y)”Value: “find cell where value ≈ target”
Use caseCurvilinear grids, scattered observationsStructured arrays with computed coordinates

They solve different problems

NDPointIndex answers: “Which grid cell is nearest to coordinates (lat=45.2, lon=-122.5)?”

ds.sel(lat=45.2, lon=-122.5, method="nearest")  # Spatial query

NDIndex answers: “Which (trial, rel_time) cells have event_time closest to .5?”

ds.sel(event_time=.5, method="nearest")  # Value lookup in N-D array

Use NDPointIndex when your coordinates define positions in space (or similar multi-dimensional coordinate systems).

Use NDIndex when you have derived coordinates computed from dimension coordinates (like event_time = rel_time - event_onset).

%xmode minimal
import numpy as np
import xarray as xr
from linked_indices.example_data import trial_based_dataset
Exception reporting mode: Minimal

Trying NDPointIndex with trial-based data

Now let’s see what happens when we try to use NDPointIndex with our trial-based dataset where we have a single 2D abs_time coordinate.

ds = trial_based_dataset(mode="stacked").drop_vars("trial_onset")
ds
Loading...

Problem 1: NDPointIndex requires matching number of variables and dimensions

NDPointIndex expects one coordinate variable per dimension. Our abs_time is a single 2D variable, not two 1D variables that define points in 2D space.

# This fails! NDPointIndex expects 2 variables for 2 dimensions
try:
    ds.set_xindex(["abs_time"], xr.indexes.NDPointIndex)
except ValueError as e:
    print(f"ValueError: {e}")
ValueError: the number of variables 1 doesn't match the number of dimensions 2

What if we broadcast trial to 2D as well?

One might think: “NDPointIndex needs 2 variables for 2 dimensions. What if we broadcast trial to shape (trial, rel_time) to match abs_time?”

Let’s try this approach:

# Create 2D coordinates for NDPointIndex:
# 1. trial_2d - trial labels broadcast to 2D
# 2. event_locked - time relative to an event (different in each trial)

trial_labels = ds.trial.values  # ['cosine', 'square', 'sawtooth']
trial_2d = np.broadcast_to(trial_labels[:, np.newaxis], (3, 500))

# Events happen at different times in each trial
event_times = np.array([2.0, 2.5, 1.5])
event_locked = ds.rel_time.values[np.newaxis, :] - event_times[:, np.newaxis]

ds_2d = ds.assign_coords(
    trial_2d=(["trial", "rel_time"], trial_2d),
    event_locked=(["trial", "rel_time"], event_locked),
)
ds_2d
Loading...
# Try to use NDPointIndex with trial_2d (strings) and event_locked
# This fails because NDPointIndex uses KDTree which requires numeric data!
try:
    ds_2d.set_xindex(["trial_2d", "event_locked"], xr.indexes.NDPointIndex)
except ValueError as e:
    print(f"ValueError: {e}")
ValueError: could not convert string to float: np.str_('cosine')

Problem 2: NDPointIndex requires numeric coordinates

Because NDPointIndex uses a KDTree for spatial queries, all coordinates must be numeric. Our trial coordinate uses meaningful string labels like 'cosine', 'square', 'sawtooth' - a common pattern in experimental data. NDPointIndex cannot handle this.

To explore the other ergonomics issues with NDPointIndex, let’s create a version with numeric trial indices:

# Create numeric trial indices to work around the string limitation
trial_idx_2d = np.broadcast_to(np.arange(3)[:, np.newaxis], (3, 500))

ds_2d_numeric = ds.assign_coords(
    trial_idx_2d=(["trial", "rel_time"], trial_idx_2d),
    event_locked=(["trial", "rel_time"], event_locked),
)

# Now NDPointIndex can work with numeric coordinates
ds_ndpoint_2d = ds_2d_numeric.set_xindex(
    ["trial_idx_2d", "event_locked"], xr.indexes.NDPointIndex
)
ds_ndpoint_2d
Loading...

Ergonomics Issue 1: Must provide ALL indexed coordinates

With NDPointIndex, you must provide values for all coordinates in the index. You cannot select by just one:

# With NDPointIndex, we MUST provide both coordinates
# This fails - can't select by event_locked alone:
try:
    ds_ndpoint_2d.sel(event_locked=0.5, method="nearest")
except ValueError as e:
    print(f"ValueError: {e}")
ValueError: missing labels for coordinate(s): trial_idx_2d.

Ergonomics Issue 2: Point-wise queries, not dimensional selection

NDPointIndex supports querying multiple points using DataArrays, but these are point-wise queries - you specify exact (trial, event_locked) pairs. You cannot ask “give me all trials at event_locked=0.5”:

# Scalar query returns a single point
result_scalar = ds_ndpoint_2d.sel(trial_idx_2d=1, event_locked=0.5, method="nearest")
print("Scalar query (trial=1, event_locked=0.5):")
print(f"  Shape: {dict(result_scalar.sizes)}")

# DataArray query returns multiple points - but you must specify EACH point explicitly
trial_query = xr.DataArray([0, 1, 2], dims="query")
event_query = xr.DataArray([0.5, 0.5, 0.5], dims="query")  # Same event_locked for all
result_array = ds_ndpoint_2d.sel(
    trial_idx_2d=trial_query, event_locked=event_query, method="nearest"
)
print("\nDataArray query (3 explicit points):")
print(f"  Shape: {dict(result_array.sizes)}")
result_array
Scalar query (trial=1, event_locked=0.5):
  Shape: {}

DataArray query (3 explicit points):
  Shape: {'query': 3}
Loading...

Ergonomics Issue 3: No slice support

NDPointIndex doesn’t support range queries with slice(). You can only query for single points:

# NDPointIndex doesn't support slices
try:
    ds_ndpoint_2d.sel(trial_idx_2d=slice(0, 2), event_locked=slice(-0.5, 1.0))
except Exception as e:
    print(f"{type(e).__name__}: {e}")
ValueError: NDPointIndex only supports selection with method='nearest'

Ergonomics Issue 4: Semantic mismatch - “nearest in space” vs “value equals”

NDPointIndex uses Euclidean distance to find the nearest point. This creates unexpected behavior when the coordinates have different scales or meanings:

# Semantic mismatch: we ask for trial_idx_2d=0 and event_locked=1.0
# NDPointIndex uses Euclidean distance, so it might return a different trial
# if that's "closer" in the (trial_idx_2d, event_locked) space
result = ds_ndpoint_2d.sel(trial_idx_2d=0, event_locked=1.0, method="nearest")
result
Loading...
# Query for trial_idx_2d=1.5 (between trials) and event_locked=0
# NDPointIndex mixes trial and time to find the spatially nearest point
result2 = ds_ndpoint_2d.sel(trial_idx_2d=1.5, event_locked=0, method="nearest")
result2
Loading...

Comparison: How NDIndex handles the same data

Let’s see how NDIndex handles this dataset. We’ll also add an event_locked_time coordinate that represents time relative to an event that happened at a different time in each trial - this is the real use case for NDIndex:

from linked_indices import NDIndex

# Create fresh dataset with NDIndex instead of NDPointIndex
ds_ndindex = ds.assign_coords(
    event_locked=(["trial", "rel_time"], event_locked)
).set_xindex(["abs_time", "event_locked"], NDIndex)
ds_ndindex
Loading...
# NDIndex: Select by event_locked time - "give me 0.5 seconds after the event in ALL trials"
# This is the key use case: same relative time across trials with different event times
result_event = ds_ndindex.sel(event_locked=0.5, method="nearest")
result_event
Loading...
# NDIndex: Slice by event_locked time - "give me -0.5 to +1.0 seconds around the event"
# Returns a bounding box containing all cells in that range across all trials
result_slice = ds_ndindex.sel(event_locked=slice(-0.5, 1.0))
result_slice
Loading...

Summary: NDPointIndex vs NDIndex Ergonomics

AspectNDPointIndexNDIndex
Coordinate typesNumeric only (KDTree limitation)Any type (strings, floats, etc.)
Select by single coordMust provide ALL coordsCan select by any single coord
Query stylePoint-wise (specify each point)Dimensional (broadcast across dims)
Slice supportNoYes (returns bounding box)
Query semantics“Nearest point in N-D space”“Cells where value matches”
Coordinate couplingCoords are coupled (spatial distance)Coords are independent

The fundamental mismatch: NDPointIndex treats (trial_idx_2d, event_locked) as a 2D coordinate space and finds points by Euclidean distance. But for trial-based data, trial and time are independent dimensions - we want to select by event_locked alone without having to specify a trial.

NDIndex is designed for exactly this use case: selecting by values in N-D derived coordinates while preserving the dimensional structure of the data.

Summary

FeatureNDPointIndexNDIndex
Use caseUnstructured point clouds, curvilinear gridsStructured arrays with derived coordinates
Query typeSpatial: find nearest (x, y) pointValue: find cell where event_locked_time ≈ 7.5
CoordinatesMultiple N-D coords (one per dimension)Single N-D coord with computed values
Data structurePoints in N-D coordinate spaceN-D array of scalar values
Slice supportNoYes (bounding box)