Graphs & Selection¶
Galaxy builds Mapper graphs (Stars) from embeddings, computes graph distances, clusters, and selects representatives.
Configuration¶
In params.yaml:
Galaxy:
metric: stellar_curvature_distance # Graph-to-graph distance
selector: max_nodes # Representative selection strategy
nReps: 3 # Or use distance_threshold
stars: [jmap]
jmap:
nCubes: [8, 12]
percOverlap: [0.2, 0.4]
minIntersection: [-1] # -1 => weighted edges
clusterer:
- [HDBSCAN, {min_cluster_size: 5}]
Parameters¶
Galaxy¶
- metricstr
Distance metric between graphs. Options:
"stellar_curvature_distance"(recommended)
- selectorstr
Representative selection rule within each cluster. Options:
"max_nodes"(default)"max_edges""min_nodes""random"
- nRepsint or null
Number of representatives to select. If null, use
distance_thresholdincollapse().
JMAP (Mapper)¶
- nCubeslist of int
Number of intervals per dimension in the cover. Higher -> finer resolution.
- percOverlaplist of float
Overlap fraction between adjacent intervals (0-1).
- minIntersectionlist of int
Minimum overlap size to form an edge.
-1uses weighted edges without a minimum.- clustererlist of [str, dict]
Clustering algorithm and parameters used within each cube.
Build and Compare¶
from thema.multiverse import Galaxy
galaxy = Galaxy(YAML_PATH=yaml)
galaxy.fit() # Writes Star objects to models/
selection = galaxy.collapse(
metric="stellar_curvature_distance",
curvature="balanced_forman_curvature",
selector="max_nodes"
)
# selection: {cluster_id: {"star": StarGraph, "file": Path, ...}}
Visualization (Optional)¶
import matplotlib.pyplot as plt
coords = galaxy.get_galaxy_coordinates() # MDS on the distance matrix
plt.scatter(coords[:, 0], coords[:, 1], s=12, alpha=0.6)
plt.title("Galaxy of graph models")
plt.show()
Curvature Choices¶
forman_curvature: Fast defaultbalanced_forman_curvature: More sensitive structureresistance_curvature: Emphasizes connectivityollivier_ricci_curvature: Most detailed, slowest