NDIndex Performance - linked-indices

This notebook benchmarks NDIndex performance for various operations and dataset sizes. All benchmarks use Python’s timeit module for rigorous, reproducible measurements.

Summary¶

NDIndex enables label-based selection on N-D coordinates. Performance depends on whether the coordinate is sorted (row-major order):

Sorted coordinates: O(log n) binary search - 100-1000x faster for large arrays
Unsorted coordinates: O(n) linear scan - still usable but slower for large arrays

Key Performance Distinction¶

This benchmark measures three things:

Index lookup: Time NDIndex spends finding the matching indices (O(log n) for sorted)
Cold sel(): First call to ds.sel() - includes xarray result creation
Warm sel(): Repeated calls to ds.sel() - xarray reuses cached views

Expected Selection Times by Coordinate Shape¶

Coordinate Shape	Total Cells	Index Lookup	Scalar sel()	Slice sel() (cold)	Slice sel() (warm)
10 × 100	1K	~0.003 ms	~0.05 ms	~0.2 ms	~0.05 ms
100 × 1,000	100K	~0.003 ms	~0.05 ms	~0.5 ms	~0.05 ms
100 × 10,000	1M	~0.003 ms	~0.05 ms	~2 ms	~0.05 ms
1,000 × 10,000	10M	~0.003 ms	~0.05 ms	~20 ms	~0.05 ms
1,000 × 100,000	100M	~0.003 ms	~0.05 ms	~50 ms	~0.05 ms

Key Findings¶

Index lookup is O(log n) for sorted coordinates - NDIndex finds matching indices in ~0.003ms regardless of array size.
Scalar selection stays constant - Result is always 1 cell, so total time is ~0.05ms regardless of array size.
Slice selection: cold vs warm - First call takes O(n) for xarray to create the result Dataset. Subsequent calls reuse cached views.
For unsorted coords, everything is O(n) - No binary search available, so all operations scan the full array.
Index creation is O(n) - Due to xarray’s set_xindex() internal processing. The sorted check is lazy (computed on first sel()).
isel() is ~10-50x faster than sel() - Use integer indexing when possible.

Recommendations¶

Sorted coordinates: Scalar selection is instant. Slice selection cold time is O(n).
Unsorted, < 1M cells: Selection is fast enough for interactive use (~1-3 ms)
Unsorted, 1-10M cells: Still usable but noticeable lag (~10-30 ms)
Unsorted, > 10M cells: Consider pre-filtering with isel() or chunking with dask

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr

from linked_indices import NDIndex
from linked_indices.benchmark_utils import timeit_benchmark, benchmark_selection_scaling
from linked_indices.example_data import (
    create_trial_ndindex_dataset,
    create_diagonal_dataset,
    create_radial_dataset,
    create_jittered_dataset,
)

Dataset Generators¶

We test with different coordinate patterns. These are imported from linked_indices.example_data:

Trial dataset: abs_time = offset[trial] + time[sample] - sorted, monotonically increasing
Diagonal dataset: derived[y, x] = y_offset[y] + x_coord[x] - sorted, gradient pattern
Radial dataset: radius = sqrt(x² + y²) - unsorted, non-monotonic 2D pattern
Jittered dataset: Trial data with per-sample timing variation

# Quick test that imports work
ds_trial = create_trial_ndindex_dataset(10, 100)
ds_diagonal = create_diagonal_dataset(100, 100)
print(f"Trial dataset: {dict(ds_trial.sizes)}")
print(f"Diagonal dataset: {dict(ds_diagonal.sizes)}")

Benchmark Helper¶

The timeit_benchmark helper (from linked_indices.benchmark_utils) uses Python’s timeit module with automatic loop count detection. It returns both “best” (minimum, representing true algorithm cost) and “mean” (typical real-world performance including occasional GC pauses) times.

# Example: benchmark a simple numpy operation
demo_result = timeit_benchmark(
    lambda: np.sum(np.random.randn(1000)), globals={"np": np}
)
print(
    f"Demo benchmark: {demo_result['best_ms']:.4f} ms (best), n_loops={demo_result['n_loops']}"
)

1. Index Creation Performance¶

How long does it take to create an NDIndex for datasets of different sizes?

sizes = [
    (10, 100),  # 1K cells
    (10, 1000),  # 10K cells
    (100, 1000),  # 100K cells
    (100, 10000),  # 1M cells
    (1000, 10000),  # 10M cells
    (1000, 100000),  # 100M cells
]

creation_results = []
print("Index Creation Performance (using timeit)")
print("=" * 70)
print(f"{'Shape':>20} | {'Cells':>12} | {'Best (ms)':>12} | {'Mean (ms)':>12}")
print("-" * 70)

for n_trials, n_times in sizes:
    n_cells = n_trials * n_times
    shape_str = f"{n_trials:,} × {n_times:,}"

    # Prepare base dataset
    trial_onsets = np.arange(n_trials) * n_times * 0.01
    rel_time = np.linspace(0, n_times * 0.01, n_times)
    abs_time = trial_onsets[:, np.newaxis] + rel_time[np.newaxis, :]
    data = np.random.randn(n_trials, n_times)

    ds_base = xr.Dataset(
        {"data": (["trial", "rel_time"], data)},
        coords={
            "trial": np.arange(n_trials),
            "rel_time": rel_time,
            "abs_time": (["trial", "rel_time"], abs_time),
        },
    )

    result = timeit_benchmark(
        lambda: ds_base.set_xindex(["abs_time"], NDIndex),
        globals={"ds_base": ds_base, "NDIndex": NDIndex},
    )
    result["n_cells"] = n_cells
    result["shape"] = shape_str
    creation_results.append(result)

    print(
        f"{shape_str:>20} | {n_cells:>12,} | {result['best_ms']:>12.3f} | {result['mean_ms']:>12.3f}"
    )

df_creation = pd.DataFrame(creation_results)

fig, ax = plt.subplots(figsize=(8, 5))
x_labels = [f"{r['n_cells']:,}" for r in creation_results]
ax.bar(x_labels, df_creation["best_ms"])
ax.set_xlabel("Number of cells")
ax.set_ylabel("Index creation time (ms)")
ax.set_title("NDIndex Creation Performance")
ax.set_ylim(0, max(df_creation["best_ms"]) * 1.5)

for i, (x, y) in enumerate(zip(x_labels, df_creation["best_ms"])):
    ax.text(i, y + 0.001, f"{y:.3f}", ha="center", fontsize=9)

plt.tight_layout()

Note: Index creation time scales with array size due to xarray’s internal processing in set_xindex(). The sorted check itself is lazy (computed on first sel()).

2. Scalar Selection Performance (Sorted Coordinates)¶

These benchmarks use sorted coordinates, so they benefit from O(log n) binary search.

Important distinction:

Index lookup: The time NDIndex spends finding the matching cell (O(log n) for sorted)
Full sel(): Index lookup + xarray’s result Dataset creation

For scalar selection, the result is always 1 cell, so xarray overhead is constant.

Note on method parameter:

Scalar exact (sel(x=val)): Requires value to exist exactly in the array
Scalar nearest (sel(x=val, method='nearest')): Finds closest value

print("Scalar Selection Performance: Index Lookup vs Full sel()")
print("=" * 100)
print(
    f"{'Shape':>20} | {'Cells':>10} | {'Index (ms)':>12} | {'Full sel (ms)':>12} | {'xarray overhead':>15}"
)
print("-" * 100)

scalar_results = []

for n_trials, n_times in sizes:
    n_cells = n_trials * n_times
    shape_str = f"{n_trials:,} × {n_times:,}"
    ds = create_trial_ndindex_dataset(n_trials, n_times)

    # Pick a target that exists exactly in the array
    exact_target = float(ds.abs_time.values[n_trials // 2, n_times // 2])

    # Get the index object for direct benchmarking
    index = ds.xindexes["abs_time"]

    # Benchmark index lookup only (scalar selection)
    result_index = timeit_benchmark(
        lambda: index.sel({"abs_time": exact_target}),
        globals={"index": index, "exact_target": exact_target},
    )

    # Benchmark full sel()
    result_full = timeit_benchmark(
        lambda: ds.sel(abs_time=exact_target),
        globals={"ds": ds, "exact_target": exact_target},
    )

    overhead = result_full["best_ms"] - result_index["best_ms"]

    scalar_results.append(
        {
            "n_cells": n_cells,
            "shape": shape_str,
            "index_ms": result_index["best_ms"],
            "full_ms": result_full["best_ms"],
            "overhead_ms": overhead,
        }
    )

    print(
        f"{shape_str:>20} | {n_cells:>10,} | {result_index['best_ms']:>12.4f} | {result_full['best_ms']:>12.4f} | {overhead:>14.4f}ms"
    )

df_scalar = pd.DataFrame(scalar_results)

fig, ax = plt.subplots(figsize=(10, 5))
ax.loglog(
    df_scalar["n_cells"],
    df_scalar["index_ms"],
    "o-",
    markersize=8,
    label="Index lookup only (NDIndex)",
    color="C0",
)
ax.loglog(
    df_scalar["n_cells"],
    df_scalar["full_ms"],
    "s-",
    markersize=8,
    label="Full ds.sel() (includes xarray overhead)",
    color="C1",
)

ax.set_xlabel("Number of cells")
ax.set_ylabel("Time (ms)")
ax.set_title("Scalar Selection: Index Lookup vs Full sel()")
ax.grid(True, alpha=0.3)
ax.legend()

# Add annotation
ax.annotate(
    "Index lookup is O(log n)\n(constant ~0.003ms)",
    xy=(1e7, df_scalar["index_ms"].iloc[-2]),
    xytext=(1e5, 0.001),
    fontsize=9,
    arrowprops=dict(arrowstyle="->", color="gray"),
)

plt.tight_layout()

3. Slice Selection Performance (Sorted Coordinates)¶

How long does sel(abs_time=slice(start, stop)) take with sorted coordinates?

Key insight for slices: The index lookup is O(log n), but xarray’s result creation has two different performance profiles:

First call (cold): O(n) - xarray needs to slice arrays and create views
Subsequent calls (warm): ~O(1) - xarray reuses cached views

The benchmark below shows both the “cold” (first call) and “warm” (repeated calls) times.

# Use the reusable benchmark function for sorted slice selection
slice_results = benchmark_selection_scaling(
    create_trial_ndindex_dataset,
    sizes=sizes,
    coord_name="abs_time",
    slice_fraction=0.5,
    force_unsorted=False,
    print_results=True,
)

df_slice = pd.DataFrame(slice_results)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left plot: All three metrics for slices
ax = axes[0]
ax.loglog(
    df_slice["n_cells"],
    df_slice["index_ms"],
    "o-",
    markersize=8,
    label="Index lookup (O(log n))",
    color="C0",
)
ax.loglog(
    df_slice["n_cells"],
    df_slice["cold_ms"],
    "^-",
    markersize=8,
    label="Cold sel() (first call)",
    color="C3",
)
ax.loglog(
    df_slice["n_cells"],
    df_slice["warm_ms"],
    "s-",
    markersize=8,
    label="Warm sel() (repeated calls)",
    color="C1",
)
ax.set_xlabel("Number of cells")
ax.set_ylabel("Time (ms)")
ax.set_title("Slice Selection: What Takes Time?")
ax.grid(True, alpha=0.3)
ax.legend()

# Right plot: Compare scalar vs slice cold times
ax = axes[1]
ax.loglog(
    df_scalar["n_cells"],
    df_scalar["full_ms"],
    "o-",
    markersize=8,
    label="Scalar (result: 1 cell)",
    color="C2",
)
ax.loglog(
    df_slice["n_cells"],
    df_slice["cold_ms"],
    "^-",
    markersize=8,
    label="Slice 50% cold (first call)",
    color="C3",
)
ax.loglog(
    df_slice["n_cells"],
    df_slice["warm_ms"],
    "s-",
    markersize=8,
    label="Slice 50% warm (repeated)",
    color="C1",
)
ax.set_xlabel("Number of cells")
ax.set_ylabel("Time (ms)")
ax.set_title("Scalar vs Slice Performance")
ax.grid(True, alpha=0.3)
ax.legend()

plt.tight_layout()

Key Takeaway: Index Lookup is O(log n), Cold vs Warm Matters¶

The tables above show that NDIndex’s index lookup is constant at ~0.003ms regardless of array size.

For slice selection, there are two performance regimes:

Cold (first call): O(n) scaling - xarray creates the result Dataset (~20ms for 100M cells)
Warm (repeated calls): ~O(1) - xarray reuses cached views (~0.05ms)

For typical interactive use (selecting once and working with the result), you’ll see the “cold” time. For batch operations that repeat the same selection, you’ll see the “warm” time.

Does Slice Size Affect Performance?¶

Let’s test if requesting a narrow slice (1%) vs a wide slice (50%) makes any difference.

print("Slice Size Effect on Performance (1M cells: 100 × 10,000)")
print("=" * 70)

n_trials, n_times = 100, 10000  # 1M cells
ds = create_trial_ndindex_dataset(n_trials, n_times)

vmin, vmax = ds.abs_time.values.min(), ds.abs_time.values.max()
mid = (vmin + vmax) / 2
value_range = vmax - vmin

slice_widths = [
    ("0.1% (very narrow)", 0.001),
    ("1%", 0.01),
    ("10%", 0.10),
    ("25%", 0.25),
    ("50%", 0.50),
    ("90%", 0.90),
]

slice_size_results = []
for name, fraction in slice_widths:
    half_width = value_range * fraction / 2
    start = mid - half_width
    stop = mid + half_width

    result = timeit_benchmark(
        lambda: ds.sel(abs_time=slice(start, stop)),
        globals={"ds": ds, "start": start, "stop": stop},
    )
    result["slice_width"] = name
    result["fraction"] = fraction
    slice_size_results.append(result)

    print(f"{name:20s}: {result['best_ms']:>8.3f} ms")

df_slice_size = pd.DataFrame(slice_size_results)

fig, ax = plt.subplots(figsize=(8, 4))
ax.bar(df_slice_size["slice_width"], df_slice_size["best_ms"], color="C0")
ax.set_xlabel("Slice Width")
ax.set_ylabel("Selection time (ms)")
ax.set_title("Slice Size vs Performance (1M cells)")
ax.tick_params(axis="x", rotation=15)

# Add value labels
for i, (x, y) in enumerate(zip(df_slice_size["slice_width"], df_slice_size["best_ms"])):
    ax.text(i, y + 0.02, f"{y:.2f}", ha="center", fontsize=9)

ax.set_ylim(0, df_slice_size["best_ms"].max() * 1.2)
plt.tight_layout()

Conclusion: For sorted coordinates, the index lookup is O(log n), but the total sel() time scales with result size due to xarray’s Dataset construction. A 90% slice creates 9x more data than a 10% slice, so it takes proportionally longer. For very narrow slices (<1%), xarray overhead dominates and times are nearly constant.

Slice Selection: Exact vs Nearest Boundaries¶

When selecting slices, you can use method='nearest' to snap the slice boundaries to the nearest existing values in the coordinate. This is useful when the exact boundary values don’t exist in your data.

Let’s compare the performance of exact vs nearest boundary selection.

print("Slice Selection: Exact vs method='nearest' (Sorted Coordinates)")
print("=" * 100)

# Compare exact vs nearest across different sizes
exact_vs_nearest_sizes = [
    (10, 100),  # 1K
    (100, 1000),  # 100K
    (100, 10000),  # 1M
    (1000, 10000),  # 10M
]

exact_results = benchmark_selection_scaling(
    create_trial_ndindex_dataset,
    sizes=exact_vs_nearest_sizes,
    slice_fraction=0.5,
    method=None,  # Exact boundaries
    print_results=True,
)

print()

nearest_results = benchmark_selection_scaling(
    create_trial_ndindex_dataset,
    sizes=exact_vs_nearest_sizes,
    slice_fraction=0.5,
    method="nearest",  # Nearest boundaries
    print_results=True,
)

df_exact = pd.DataFrame(exact_results)
df_nearest = pd.DataFrame(nearest_results)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left plot: Index lookup comparison
ax = axes[0]
ax.loglog(
    df_exact["n_cells"],
    df_exact["index_ms"],
    "o-",
    markersize=8,
    label="Exact boundaries",
    color="C0",
)
ax.loglog(
    df_nearest["n_cells"],
    df_nearest["index_ms"],
    "s-",
    markersize=8,
    label="method='nearest'",
    color="C2",
)
ax.set_xlabel("Number of cells")
ax.set_ylabel("Index lookup time (ms)")
ax.set_title("Slice Selection: Index Lookup (Exact vs Nearest)")
ax.grid(True, alpha=0.3)
ax.legend()

# Right plot: Full sel() comparison (cold)
ax = axes[1]
ax.loglog(
    df_exact["n_cells"],
    df_exact["cold_ms"],
    "o-",
    markersize=8,
    label="Exact boundaries",
    color="C0",
)
ax.loglog(
    df_nearest["n_cells"],
    df_nearest["cold_ms"],
    "s-",
    markersize=8,
    label="method='nearest'",
    color="C2",
)
ax.set_xlabel("Number of cells")
ax.set_ylabel("Cold sel() time (ms)")
ax.set_title("Slice Selection: Full sel() (Exact vs Nearest)")
ax.grid(True, alpha=0.3)
ax.legend()

plt.suptitle("Exact vs Nearest Boundary Selection (Sorted Coordinates)", y=1.02)
plt.tight_layout()

Key insight: Both exact and method='nearest' use O(log n) binary search for sorted coordinates. The performance is nearly identical because:

Exact boundaries: Uses np.searchsorted() with side='left' and side='right'
Nearest boundaries: Uses np.searchsorted() + neighbor comparison

The extra neighbor comparison for nearest is O(1), so it doesn’t affect overall scaling.

When to use method='nearest':

When slice boundaries may not exist exactly in your data
When you want to “snap” to the nearest available values
When working with irregularly-sampled data

Understanding the Unsorted O(n) Path¶

The benchmarks above use sorted coordinates (O(log n)). Let’s look at the raw numpy operations that would be used for unsorted coordinates to understand the O(n) performance characteristics.

# Benchmark raw numpy operations (isolated from xarray overhead)
print("Raw NumPy Operation Comparison (10M element array)")
print("=" * 70)

n = 10_000_000
values = np.random.randn(1000, 10000)  # 10M cells
nearest_target = 0.0
start, stop = -0.5, 0.5

# Pick a value that exists in the array for exact match
exact_target = values[500, 5000]


# Scalar exact: the core operation (using flatnonzero for efficiency)
def scalar_exact_numpy():
    flat_matches = np.flatnonzero(values == exact_target)
    if len(flat_matches) == 0:
        raise KeyError("Not found")
    flat_idx = flat_matches[0]
    return np.unravel_index(flat_idx, values.shape)


# Scalar nearest: the core operation
def scalar_nearest_numpy():
    flat_idx = np.argmin(np.abs(values - nearest_target))
    return np.unravel_index(flat_idx, values.shape)


# Slice: the core operation
def slice_numpy():
    in_range = (values >= start) & (values <= stop)
    # Find bounding box
    for axis in range(values.ndim):
        axes_to_reduce = tuple(j for j in range(values.ndim) if j != axis)
        has_value = np.any(in_range, axis=axes_to_reduce)
        indices = np.where(has_value)[0]
    return slice(int(indices[0]), int(indices[-1]) + 1)


result_exact = timeit_benchmark(
    scalar_exact_numpy,
    globals={"values": values, "exact_target": exact_target, "np": np},
)
result_nearest = timeit_benchmark(
    scalar_nearest_numpy,
    globals={"values": values, "nearest_target": nearest_target, "np": np},
)
result_slice = timeit_benchmark(
    slice_numpy, globals={"values": values, "start": start, "stop": stop, "np": np}
)

print(f"Scalar exact (numpy only):   {result_exact['best_ms']:>8.3f} ms")
print(f"Scalar nearest (numpy only): {result_nearest['best_ms']:>8.3f} ms")
print(f"Slice (numpy only):          {result_slice['best_ms']:>8.3f} ms")
print()
print(
    f"Ratio exact/nearest: {result_exact['best_ms'] / result_nearest['best_ms']:.2f}x"
)
print(
    f"Ratio slice/nearest: {result_slice['best_ms'] / result_nearest['best_ms']:.2f}x"
)

Understanding the Unsorted Performance Order¶

The raw numpy benchmarks above show the O(n) performance order for unsorted coordinates: slice < nearest < exact

This is explainable:

Slice (fastest for unsorted): Boolean comparisons (values >= start) & (values <= stop) are highly vectorized. The subsequent np.any() and np.where() on boolean arrays are also very efficient.
Scalar nearest: np.argmin(np.abs(values - target)) creates intermediate arrays but returns a single scalar - no result array allocation needed.
Scalar exact (slowest for unsorted): np.flatnonzero(values == target) must allocate a result array of unknown size and copy matching indices into it. This dynamic allocation overhead makes it slower despite simpler per-element comparison.

Key insight: For unsorted coordinates, array allocation patterns matter more than operation complexity. For sorted coordinates, all operations use O(log n) binary search and are equally fast.

4. isel Performance¶

How does NDIndex affect isel() performance? This matters because isel() needs to slice the internal coordinate arrays.

print("isel() Performance Comparison (using timeit)")
print("=" * 60)

isel_results = []

for n_trials, n_times in sizes:
    n_cells = n_trials * n_times
    ds = create_trial_ndindex_dataset(n_trials, n_times)
    ds_no_index = ds.drop_indexes("abs_time")

    result_with = timeit_benchmark(lambda: ds.isel(trial=0), globals={"ds": ds})
    result_without = timeit_benchmark(
        lambda: ds_no_index.isel(trial=0), globals={"ds_no_index": ds_no_index}
    )

    overhead = result_with["best_ms"] / result_without["best_ms"]
    isel_results.append(
        {
            "n_cells": n_cells,
            "with_index": result_with["best_ms"],
            "without_index": result_without["best_ms"],
            "overhead": overhead,
        }
    )

    print(
        f"{n_cells:>10,} cells: with={result_with['best_ms']:.3f}ms, without={result_without['best_ms']:.3f}ms, overhead={overhead:.2f}x"
    )

df_isel = pd.DataFrame(isel_results)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

ax = axes[0]
ax.loglog(df_isel["n_cells"], df_isel["with_index"], "o-", label="With NDIndex")
ax.loglog(df_isel["n_cells"], df_isel["without_index"], "s-", label="Without NDIndex")
ax.set_xlabel("Number of cells")
ax.set_ylabel("isel() time (ms)")
ax.set_title("isel() Performance")
ax.legend()
ax.grid(True, alpha=0.3)

ax = axes[1]
ax.semilogx(df_isel["n_cells"], df_isel["overhead"], "o-")
ax.axhline(1, color="gray", linestyle="--", alpha=0.5)
ax.set_xlabel("Number of cells")
ax.set_ylabel("Overhead factor")
ax.set_title("NDIndex isel() Overhead")
ax.grid(True, alpha=0.3)
ax.set_ylim(0, 2)

plt.tight_layout()

5. Complex Coordinate Patterns¶

Let’s test with more realistic coordinate patterns from the slicing gallery.

5a. Radial Coordinates (Non-Linear 2D)¶

Unlike trial-based data, radial coordinates have non-monotonic patterns.

print("Radial Coordinate Selection Performance")
print("=" * 60)

radial_sizes = [
    (100, 100),  # 10K cells
    (316, 316),  # ~100K cells
    (1000, 1000),  # 1M cells
    (3162, 3162),  # ~10M cells
]

radial_results = []

for ny, nx in radial_sizes:
    n_cells = ny * nx
    ds = create_radial_dataset(ny, nx)

    # Select an annulus (ring) - common pattern in radial data
    max_radius = ds.radius.values.max()
    start = max_radius * 0.3
    stop = max_radius * 0.5

    result = timeit_benchmark(
        lambda: ds.sel(radius=slice(start, stop)),
        globals={"ds": ds, "start": start, "stop": stop},
    )
    result["n_cells"] = n_cells
    radial_results.append(result)

    print(
        f"{n_cells:>10,} cells: {result['best_ms']:>8.3f} ms (best), {result['mean_ms']:>8.3f} ms (mean)"
    )

# Plot radial coordinate performance
df_radial = pd.DataFrame(radial_results)

fig, ax = plt.subplots(figsize=(8, 5))
ax.loglog(df_radial["n_cells"], df_radial["best_ms"], "o-", markersize=8, color="C2")
ax.set_xlabel("Number of cells")
ax.set_ylabel("Selection time (ms)")
ax.set_title("Radial Coordinate (Annulus) Selection Performance")
ax.grid(True, alpha=0.3)

plt.tight_layout()

5b. Diagonal Gradient (From Slicing Gallery)¶

The diagonal gradient coordinate has values that increase both along rows and columns: derived[y, x] = y * 2 + x. This creates a more complex selection pattern.

print("Diagonal Gradient Coordinate Selection Performance")
print("=" * 60)

diagonal_sizes = [
    (100, 100),  # 10K cells
    (316, 316),  # ~100K cells
    (1000, 1000),  # 1M cells
    (3162, 3162),  # ~10M cells
]

diagonal_results = []

for ny, nx in diagonal_sizes:
    n_cells = ny * nx
    ds = create_diagonal_dataset(ny, nx)

    # Select a band across the diagonal
    max_derived = ds.derived.values.max()
    start = max_derived * 0.3
    stop = max_derived * 0.5

    result = timeit_benchmark(
        lambda: ds.sel(derived=slice(start, stop)),
        globals={"ds": ds, "start": start, "stop": stop},
    )
    result["n_cells"] = n_cells
    diagonal_results.append(result)

    print(
        f"{n_cells:>10,} cells: {result['best_ms']:>8.3f} ms (best), {result['mean_ms']:>8.3f} ms (mean)"
    )

# Compare all coordinate patterns
df_radial = pd.DataFrame(radial_results)
df_diagonal = pd.DataFrame(diagonal_results)

fig, ax = plt.subplots(figsize=(8, 5))
ax.loglog(
    df_radial["n_cells"],
    df_radial["best_ms"],
    "o-",
    markersize=8,
    label="Radial (annulus)",
)
ax.loglog(
    df_diagonal["n_cells"],
    df_diagonal["best_ms"],
    "s-",
    markersize=8,
    label="Diagonal gradient",
)

ax.set_xlabel("Number of cells")
ax.set_ylabel("Selection time (ms)")
ax.set_title("Complex Coordinate Pattern Performance")
ax.grid(True, alpha=0.3)
ax.legend()

plt.tight_layout()

5c. Jittered Timing¶

Real-world data often has timing jitter. Does this affect performance?

print("\nJittered vs Clean Timing Performance")
print("=" * 60)

n_trials, n_times = 100, 10000  # 1M cells

# Can't use the imported create_jittered_dataset as it sets the seed
# Create clean dataset without setting NDIndex first
trial_onsets = np.arange(n_trials) * n_times * 0.01
rel_time = np.linspace(0, n_times * 0.01, n_times)
abs_time_clean = trial_onsets[:, np.newaxis] + rel_time[np.newaxis, :]
data = np.random.randn(n_trials, n_times)

ds_clean = xr.Dataset(
    {"data": (["trial", "rel_time"], data)},
    coords={
        "trial": np.arange(n_trials),
        "rel_time": rel_time,
        "abs_time": (["trial", "rel_time"], abs_time_clean),
    },
).set_xindex(["abs_time"], NDIndex)

ds_jittered = create_jittered_dataset(n_trials, n_times, jitter_std=0.5)

# Select middle 50%
vmin, vmax = ds_clean.abs_time.values.min(), ds_clean.abs_time.values.max()
start = vmin + (vmax - vmin) * 0.25
stop = vmin + (vmax - vmin) * 0.75

result_clean = timeit_benchmark(
    lambda: ds_clean.sel(abs_time=slice(start, stop)),
    globals={"ds_clean": ds_clean, "start": start, "stop": stop},
)

result_jittered = timeit_benchmark(
    lambda: ds_jittered.sel(abs_time=slice(start, stop)),
    globals={"ds_jittered": ds_jittered, "start": start, "stop": stop},
)

print(
    f"Clean timing:    {result_clean['best_ms']:.3f} ms (best), {result_clean['mean_ms']:.3f} ms (mean)"
)
print(
    f"Jittered timing: {result_jittered['best_ms']:.3f} ms (best), {result_jittered['mean_ms']:.3f} ms (mean)"
)
print(f"Ratio: {result_jittered['best_ms'] / result_clean['best_ms']:.2f}x")

# Plot jittered vs clean comparison
fig, ax = plt.subplots(figsize=(8, 4))

labels = ["Clean timing", "Jittered timing"]
times = [result_clean["best_ms"], result_jittered["best_ms"]]
colors = ["C0", "C1"]

bars = ax.bar(labels, times, color=colors)
ax.set_ylabel("Selection time (ms)")
ax.set_title("Jittered vs Clean Timing Performance (1M cells)")

for bar, val in zip(bars, times):
    ax.text(
        bar.get_x() + bar.get_width() / 2,
        val + 0.01,
        f"{val:.3f}ms",
        ha="center",
        fontsize=10,
    )

ax.set_ylim(0, max(times) * 1.3)
plt.tight_layout()

6. Operation Comparison¶

Compare different selection operations side-by-side on the same dataset.

print("Operation Comparison (100K cells dataset)")
print("=" * 60)

n_trials, n_times = 100, 1000
ds = create_trial_ndindex_dataset(n_trials, n_times)

vmin, vmax = ds.abs_time.values.min(), ds.abs_time.values.max()
mid = (vmin + vmax) / 2
# Pick an exact value from the array for exact match test
exact_val = float(ds.abs_time.values[n_trials // 2, n_times // 2])
start = vmin + (vmax - vmin) * 0.25
stop = vmin + (vmax - vmin) * 0.75

operations = [
    ("Scalar (exact)", lambda: ds.sel(abs_time=exact_val)),
    ("Scalar (nearest)", lambda: ds.sel(abs_time=mid, method="nearest")),
    ("Slice (25-75%)", lambda: ds.sel(abs_time=slice(start, stop))),
    (
        "Slice (narrow 1%)",
        lambda: ds.sel(
            abs_time=slice(mid - 0.005 * (vmax - vmin), mid + 0.005 * (vmax - vmin))
        ),
    ),
    ("isel (single trial)", lambda: ds.isel(trial=0)),
    ("isel (slice trials)", lambda: ds.isel(trial=slice(0, 50))),
]

op_results = []
for name, func in operations:
    result = timeit_benchmark(
        func,
        globals={
            "ds": ds,
            "mid": mid,
            "exact_val": exact_val,
            "start": start,
            "stop": stop,
            "vmin": vmin,
            "vmax": vmax,
        },
    )
    result["operation"] = name
    op_results.append(result)
    print(
        f"{name:25s}: {result['best_ms']:>8.3f} ms (best), {result['mean_ms']:>8.3f} ms (mean)"
    )

df_ops = pd.DataFrame(op_results)

fig, ax = plt.subplots(figsize=(10, 5))
bars = ax.barh(df_ops["operation"], df_ops["best_ms"])
ax.set_xlabel("Time (ms)")
ax.set_title("Operation Performance Comparison (100K cells)")

for bar, val in zip(bars, df_ops["best_ms"]):
    ax.text(
        val + 0.001,
        bar.get_y() + bar.get_height() / 2,
        f"{val:.3f}ms",
        va="center",
        fontsize=9,
    )

ax.set_xlim(0, df_ops["best_ms"].max() * 1.3)
plt.tight_layout()

7. Sorted vs Unsorted Coordinates¶

NDIndex automatically detects if coordinates are sorted in row-major order and uses O(log n) binary search for faster lookups. Let’s compare the two code paths on the same dataset by forcing the unsorted path for comparison.

# Check sorted status and show how to force unsorted path
ds = create_trial_ndindex_dataset(100, 1000)  # 100K cells

# Access the internal NDCoord to check sorted status
index = ds.xindexes["abs_time"]
coord = index._nd_coords["abs_time"]
print(f"Coordinate is detected as sorted: {coord.is_sorted}")

# We can force the unsorted path by temporarily setting _is_sorted = False
# This allows us to compare sorted vs unsorted performance on the SAME data

print("Sorted vs Unsorted Performance Comparison (same data, different code paths)")
print("=" * 95)
print(
    f"{'Path':>10} | {'Cells':>12} | {'Exact (ms)':>12} | {'Nearest (ms)':>12} | {'Slice (ms)':>12}"
)
print("-" * 95)

sorted_results = []
unsorted_results = []

test_sizes = [
    (10, 100),  # 1K
    (100, 1000),  # 100K
    (100, 10000),  # 1M
    (1000, 10000),  # 10M
]

for n_outer, n_inner in test_sizes:
    n_cells = n_outer * n_inner
    ds = create_trial_ndindex_dataset(n_outer, n_inner)

    # Pick targets
    exact_target = float(ds.abs_time.values[n_outer // 2, n_inner // 2])
    nearest_target = exact_target + 0.0001
    vmin, vmax = ds.abs_time.values.min(), ds.abs_time.values.max()
    start = vmin + (vmax - vmin) * 0.25
    stop = vmin + (vmax - vmin) * 0.75

    # Benchmark with sorted path (automatic)
    result_exact_s = timeit_benchmark(
        lambda: ds.sel(abs_time=exact_target),
        globals={"ds": ds, "exact_target": exact_target},
    )
    result_nearest_s = timeit_benchmark(
        lambda: ds.sel(abs_time=nearest_target, method="nearest"),
        globals={"ds": ds, "nearest_target": nearest_target},
    )
    result_slice_s = timeit_benchmark(
        lambda: ds.sel(abs_time=slice(start, stop)),
        globals={"ds": ds, "start": start, "stop": stop},
    )

    sorted_results.append(
        {
            "n_cells": n_cells,
            "exact_ms": result_exact_s["best_ms"],
            "nearest_ms": result_nearest_s["best_ms"],
            "slice_ms": result_slice_s["best_ms"],
        }
    )

    print(
        f"{'Sorted':>10} | {n_cells:>12,} | {result_exact_s['best_ms']:>12.4f} | {result_nearest_s['best_ms']:>12.4f} | {result_slice_s['best_ms']:>12.4f}"
    )

    # Force unsorted path by temporarily modifying the internal state
    index = ds.xindexes["abs_time"]
    coord = index._nd_coords["abs_time"]
    original_sorted = coord._is_sorted
    coord._is_sorted = False  # Force unsorted path

    try:
        result_exact_u = timeit_benchmark(
            lambda: ds.sel(abs_time=exact_target),
            globals={"ds": ds, "exact_target": exact_target},
        )
        result_nearest_u = timeit_benchmark(
            lambda: ds.sel(abs_time=nearest_target, method="nearest"),
            globals={"ds": ds, "nearest_target": nearest_target},
        )
        result_slice_u = timeit_benchmark(
            lambda: ds.sel(abs_time=slice(start, stop)),
            globals={"ds": ds, "start": start, "stop": stop},
        )
    finally:
        coord._is_sorted = original_sorted  # Restore

    unsorted_results.append(
        {
            "n_cells": n_cells,
            "exact_ms": result_exact_u["best_ms"],
            "nearest_ms": result_nearest_u["best_ms"],
            "slice_ms": result_slice_u["best_ms"],
        }
    )

    print(
        f"{'Unsorted':>10} | {n_cells:>12,} | {result_exact_u['best_ms']:>12.4f} | {result_nearest_u['best_ms']:>12.4f} | {result_slice_u['best_ms']:>12.4f}"
    )

    # Show speedup
    speedup_exact = result_exact_u["best_ms"] / result_exact_s["best_ms"]
    speedup_nearest = result_nearest_u["best_ms"] / result_nearest_s["best_ms"]
    speedup_slice = result_slice_u["best_ms"] / result_slice_s["best_ms"]
    print(
        f"{'Speedup':>10} | {'':<12} | {speedup_exact:>11.0f}x | {speedup_nearest:>11.0f}x | {speedup_slice:>11.0f}x"
    )
    print("-" * 95)

df_sorted = pd.DataFrame(sorted_results)
df_unsorted = pd.DataFrame(unsorted_results)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot each operation type
for ax, op_name, col in zip(
    axes, ["Exact", "Nearest", "Slice"], ["exact_ms", "nearest_ms", "slice_ms"]
):
    ax.loglog(
        df_sorted["n_cells"],
        df_sorted[col],
        "o-",
        markersize=8,
        label="Sorted (O(log n))",
        color="C0",
    )
    ax.loglog(
        df_unsorted["n_cells"],
        df_unsorted[col],
        "s-",
        markersize=8,
        label="Unsorted (O(n))",
        color="C1",
    )

    ax.set_xlabel("Number of cells")
    ax.set_ylabel("Selection time (ms)")
    ax.set_title(f"{op_name} Selection")
    ax.grid(True, alpha=0.3)
    ax.legend()

plt.suptitle("Sorted vs Unsorted Performance (same data, different code paths)", y=1.02)
plt.tight_layout()

Unsorted Performance: Index Lookup vs Full sel()¶

Let’s look more closely at unsorted performance, breaking down index lookup vs full sel() just like we did for sorted coordinates. This uses the same data but forces the unsorted code path.

# Use the reusable benchmark function for unsorted slice selection
unsorted_sizes = [
    (10, 100),  # 1K
    (10, 1000),  # 10K
    (100, 1000),  # 100K
    (100, 10000),  # 1M
    (1000, 10000),  # 10M
]

unsorted_slice_results = benchmark_selection_scaling(
    create_trial_ndindex_dataset,
    sizes=unsorted_sizes,
    coord_name="abs_time",
    slice_fraction=0.5,
    force_unsorted=True,
    print_results=True,
)

df_unsorted_slice = pd.DataFrame(unsorted_slice_results)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left plot: Unsorted - Index lookup vs full sel()
ax = axes[0]
ax.loglog(
    df_unsorted_slice["n_cells"],
    df_unsorted_slice["index_ms"],
    "o-",
    markersize=8,
    label="Index lookup (O(n))",
    color="C1",
)
ax.loglog(
    df_unsorted_slice["n_cells"],
    df_unsorted_slice["cold_ms"],
    "^-",
    markersize=8,
    label="Cold sel() (first call)",
    color="C3",
)
ax.loglog(
    df_unsorted_slice["n_cells"],
    df_unsorted_slice["warm_ms"],
    "s-",
    markersize=8,
    label="Warm sel() (repeated)",
    color="C4",
)
ax.set_xlabel("Number of cells")
ax.set_ylabel("Time (ms)")
ax.set_title("Unsorted Slice Selection: What Takes Time?")
ax.grid(True, alpha=0.3)
ax.legend()

# Right plot: Compare sorted vs unsorted index lookup
ax = axes[1]
# Get matching sizes between sorted and unsorted results
sorted_subset = df_slice[df_slice["n_cells"].isin(df_unsorted_slice["n_cells"])]
ax.loglog(
    sorted_subset["n_cells"],
    sorted_subset["index_ms"],
    "o-",
    markersize=8,
    label="Sorted index lookup (O(log n))",
    color="C0",
)
ax.loglog(
    df_unsorted_slice["n_cells"],
    df_unsorted_slice["index_ms"],
    "s-",
    markersize=8,
    label="Unsorted index lookup (O(n))",
    color="C1",
)
ax.set_xlabel("Number of cells")
ax.set_ylabel("Index lookup time (ms)")
ax.set_title("Sorted vs Unsorted: Index Lookup Only")
ax.grid(True, alpha=0.3)
ax.legend()

plt.suptitle(
    "Unsorted Coordinate Performance (forced unsorted path on sorted data)", y=1.02
)
plt.tight_layout()

How Sorted Detection Works¶

NDIndex checks if the flattened (row-major) coordinate array is monotonically increasing:

def _is_sorted(arr):
    flat = arr.ravel()
    return np.all(flat[:-1] <= flat[1:])

Coordinates that are typically sorted:

derived = offset[outer] + values[inner] - common in trial-based or segmented data
total_distance = segment_start + local_position - sequential recordings
Any derived coordinate that increases monotonically in row-major order

Coordinates that are typically unsorted:

radius = sqrt(x² + y²) - radial/polar data
angle = atan2(y, x) - angular data
Coordinates with large random jitter that breaks monotonicity
Any coordinate where values can decrease when traversing row-major order

8. Memory Usage¶

NDIndex stores references to coordinate arrays, not copies. Let’s verify the memory overhead is minimal.

n_trials, n_times = 100, 10000
n_cells = n_trials * n_times

trial_onsets = np.arange(n_trials) * n_times * 0.01
rel_time = np.linspace(0, n_times * 0.01, n_times)
abs_time = trial_onsets[:, np.newaxis] + rel_time[np.newaxis, :]
data = np.random.randn(n_trials, n_times)

ds_base = xr.Dataset(
    {"data": (["trial", "rel_time"], data)},
    coords={
        "trial": np.arange(n_trials),
        "rel_time": rel_time,
        "abs_time": (["trial", "rel_time"], abs_time),
    },
)
ds_indexed = ds_base.set_xindex(["abs_time"], NDIndex)

print(f"Dataset size: {n_cells:,} cells")
print(f"abs_time array size: {abs_time.nbytes / 1024 / 1024:.2f} MB")
print(
    f"Arrays share memory: {np.shares_memory(ds_base.abs_time.values, ds_indexed.abs_time.values)}"
)

Summary¶

Performance Characteristics¶

Operation	Index Lookup	Cold sel()	Warm sel()	Notes
Index creation	-	O(n)	-	xarray’s `set_xindex()` processing; sorted check is lazy
Scalar selection	O(log n)	~O(1)	~O(1)	Result is always 1 cell, no caching effect
Slice selection	O(log n)	O(n)	~O(1)	First call creates result, subsequent calls reuse views
`isel()`	-	O(1)	O(1)	Array slicing is always fast

Key Findings¶

Index lookup is O(log n) for sorted coordinates - Binary search finds bounds in ~0.003ms regardless of array size. This is what NDIndex controls.
Scalar selection stays constant - Since the result is always 1 cell, total time stays ~0.05ms for any array size.
Slice selection: cold vs warm - First call to sel() on a slice takes O(n) because xarray creates the result Dataset. Subsequent calls reuse cached views and are ~O(1).
For unsorted coords, everything is O(n) - No binary search available, so all operations scan the full array.
Sorted vs unsorted speedup: 100-500x for large arrays - At 10M cells, sorted is ~100x faster for scalar and ~5x faster for slice selection.

Understanding Cold vs Warm Performance¶

When you call ds.sel(abs_time=slice(start, stop)):

Index lookup (NDIndex): ~0.003ms - finds slice bounds via binary search
Cold (first call): xarray creates result Dataset - O(n) where n is result size
Warm (repeated calls): xarray reuses views - ~O(1)

For typical interactive use (one selection per Dataset), you’ll see “cold” times. For batch processing with repeated selections on the same Dataset, you’ll see “warm” times.

Recommendations¶

Check if your coordinates are sorted - Use ds.xindexes["coord"]._nd_coords["coord"].is_sorted to check.
For sorted coordinates: Scalar selection is instant. Slice selection cold time is O(result size).
For unsorted coordinates < 1M cells: Still fast enough for interactive use (~1-3 ms).
For unsorted coordinates > 10M cells: Consider:
- Pre-filtering with isel() before sel()
- Using narrower slices when possible
- Chunking your data with dask
Memory: NDIndex doesn’t copy data, so memory overhead is zero.