Galaxy¶
Galaxy builds many Stars (graphs), measures pairwise distances between them, clusters, and then selects representative graphs per cluster.
Highlights¶
Fit: generate Stars over your parameter grid
Collapse: compute distances (curvature filtrations), cluster (Agglomerative), select reps
Coordinates: get a 2D layout (MDS) for quick visual sanity checks
API¶
- class thema.multiverse.universe.galaxy.Galaxy(params=None, data=None, cleanDir=None, projDir=None, outDir=None, metric='stellar_curvature_distance', selector='max_nodes', nReps=3, filter_fn=None, YAML_PATH=None, verbose=False)[source]¶
Bases:
objectA space of stars.
The largest space of data representations, a galaxy can be searched to find particular stars and systems most suitable for a particular explorer.
Galaxy generates a space of star objects from the distribution of inner and outer systems.
Members¶
- data: str
Path to the original raw data file.
- cleanDir: str
Path to a populated directory containing Moons.
- projDir: str
Path to a populated directory containing Comets
- outDir: str
Path to an out directory to store star objects.
- selection: dict
Dictionary containing selected representative stars. Set by collapse function.
- YAML_PATH: str
Path to yaml configuration file.
Functions¶
- get_data_path() -> str
returns path to the raw data file
- fit() -> None
fits a space of Stars and saves to outDir
- collapse() -> list
clusters and selects representatives of star models
- get_galaxy_coordinates() -> np.ndarray
computes a 2D coordinate system of stars in the galaxy using Multidimensional Scaling (MDS)
- save() -> None
Saves instance to pickle file.
Example
>>> cleanDir = <PATH TO MOON OBJECT FILES> >>> data = <PATH TO RAW DATA FILE> >>> projDir = <PATH TO COMET OBJECT FILES> >>> outDir = <PATH TO OUT DIRECTORY OF PROJECTIONS>
>>> params = { ... "jmap": { "nCubes":[2,5,8], ... "percOverlap": [0.2, 0.4], ... "minIntersection":[-1], ... "clusterer": [["HDBSCAN", {"minDist":0.1}]] ... } ... } >>> galaxy = Galaxy(params=params, ... data=data, ... cleanDir = cleanDir, ... projDir = projDir, ... outDir = outDir)
>>> galaxy.fit() >>> # First, compute distances and cluster the stars >>> selected_stars = galaxy.collapse() >>> print(f"Selected {len(selected_stars)} representative stars") >>> >>> # Generate and visualize the galaxy coordinates with custom plotting >>> import matplotlib.pyplot as plt >>> import numpy as np >>> >>> # Manual plotting of the galaxy coordinates (NOTE: `Thema` does not have built-in visualization dependencies) >>> coordinates = galaxy.get_galaxy_coordinates() >>> plt.figure(figsize=(8, 6)) >>> plt.scatter(coordinates[:, 0], coordinates[:, 1], alpha=0.7) >>> plt.title('2D Coordinate Map of Star Models') >>> plt.xlabel('X Coordinate') >>> plt.ylabel('Y Coordinate') >>> plt.show() ```
- collapse(metric=None, nReps=None, selector=None, filter_fn=None, files: list | None = None, distance_threshold: float | None = None, **kwargs)[source]¶
Collapses the space of Stars into representative Stars. Either nReps (number of clusters) or distance_threshold (AgglomerativeClustering) can be used.
- Parameters:
metric (str, optional) – Metric function name for comparing graphs. Defaults to self.metric.
nReps (int, optional) – Number of clusters for AgglomerativeClustering. Ignored if distance_threshold is set.
selector (str, optional) – Selection function name to choose representative stars. Defaults to self.selector.
filter_fn (callable, str, or None) – Filter function to select a subset of graphs. Defaults to no filter.
files (list[str] or None) – Optional list of file paths to process. Defaults to self.outDir.
distance_threshold (float, optional) – AgglomerativeClustering distance threshold. Used if nReps is None.
**kwargs – Additional arguments passed to the metric function.
- Returns:
Mapping from cluster labels to selected stars and cluster sizes.
- Return type:
dict
- fit()[source]¶
Configure and generate space of Stars. Uses the function_scheduler to spawn multiple star instances and fit them in parallel.
- Returns:
Saves star objects to outDir and prints a count of failed saves.
- Return type:
None
- getParams()[source]¶
Returns the parameters of the Galaxy instance.
- Returns:
A dictionary containing the parameters of the Galaxy instance.
- Return type:
dict
- get_galaxy_coordinates() ndarray[source]¶
Computes a 2D coordinate system for stars in the galaxy, allowing visualization of their relative positions. This function uses Multidimensional Scaling (MDS) to project the high-dimensional distance matrix into a 2D space, preserving the relative distances between stars as much as possible.
Note: This method requires that distances have been computed first, usually by calling the collapse() method or directly computing distances with a metric function.
- Returns:
A 2D array of shape (n_stars, 2) containing the X,Y coordinates of each star in the galaxy. Each row represents the 2D coordinates of one star.
- Return type:
np.ndarray
Examples
>>> # After fitting the galaxy and computing distances >>> import matplotlib.pyplot as plt >>> coordinates = galaxy.get_galaxy_coordinates() >>> >>> # Basic scatter plot >>> plt.figure(figsize=(10, 8)) >>> plt.scatter(coordinates[:, 0], coordinates[:, 1], alpha=0.7) >>> plt.title('Star Map of the Galaxy') >>> plt.xlabel('X Coordinate') >>> plt.ylabel('Y Coordinate') >>> plt.show() >>> >>> # Advanced plot with cluster coloring >>> if galaxy.selection: # If collapse() has been called >>> plt.figure(figsize=(12, 10)) >>> # Plot all stars >>> plt.scatter(coordinates[:, 0], coordinates[:, 1], c='lightgray', alpha=0.5) >>> # Highlight representative stars >>> for cluster_id, info in galaxy.selection.items(): >>> # Find the index of the representative star in the keys array >>> rep_idx = np.where(galaxy.keys == info['star'])[0][0] >>> plt.scatter(coordinates[rep_idx, 0], coordinates[rep_idx, 1], >>> s=100, c='red', edgecolor='black', label=f'Cluster {cluster_id}') >>> plt.legend() >>> plt.title('Star Map with Representative Stars') >>> plt.show()
- save(file_path)[source]¶
Save the current object instance to a file using pickle serialization.
- Parameters:
file_path (str) – The path to the file where the object will be saved.