Graphs & Selection

Galaxy builds Mapper graphs (Stars) from embeddings, computes graph distances, clusters, and selects representatives.

Configuration

In params.yaml:

Galaxy:
  metric: stellar_curvature_distance   # Graph-to-graph distance
  selector: max_nodes                  # Representative selection strategy
  nReps: 3                             # Or use distance_threshold
  stars: [jmap]
  jmap:
    nCubes: [8, 12]
    percOverlap: [0.2, 0.4]
    minIntersection: [-1]              # -1 => weighted edges
    clusterer:
      - [HDBSCAN, {min_cluster_size: 5}]

Parameters

Galaxy

metricstr

Distance metric between graphs. Options:

  • "stellar_curvature_distance" (recommended)

selectorstr

Representative selection rule within each cluster. Options:

  • "max_nodes" (default)

  • "max_edges"

  • "min_nodes"

  • "random"

nRepsint or null

Number of representatives to select. If null, use distance_threshold in collapse().

JMAP (Mapper)

nCubeslist of int

Number of intervals per dimension in the cover. Higher -> finer resolution.

percOverlaplist of float

Overlap fraction between adjacent intervals (0-1).

minIntersectionlist of int

Minimum overlap size to form an edge. -1 uses weighted edges without a minimum.

clustererlist of [str, dict]

Clustering algorithm and parameters used within each cube.

Build and Compare

from thema.multiverse import Galaxy

galaxy = Galaxy(YAML_PATH=yaml)
galaxy.fit()  # Writes Star objects to models/

selection = galaxy.collapse(
    metric="stellar_curvature_distance",
    curvature="balanced_forman_curvature",
    selector="max_nodes"
)

# selection: {cluster_id: {"star": StarGraph, "file": Path, ...}}

Visualization (Optional)

import matplotlib.pyplot as plt

coords = galaxy.get_galaxy_coordinates()  # MDS on the distance matrix
plt.scatter(coords[:, 0], coords[:, 1], s=12, alpha=0.6)
plt.title("Galaxy of graph models")
plt.show()

Curvature Choices

  • forman_curvature: Fast default

  • balanced_forman_curvature: More sensitive structure

  • resistance_curvature: Emphasizes connectivity

  • ollivier_ricci_curvature: Most detailed, slowest