Galaxy¶
The Galaxy
class is a core component of the THEMA package, designed to manage and explore a vast space of data representations. It serves as a container for star objects, each representing a unique unsupervised projection of data. By organizing these projections into a galaxy, users are able to democratize model selection and quantify agreement between models. THEMA helps ensure that the most reliable representations of your data are chosen.
A galaxy is generated from distributions of inner and outer systems, which define the parameters for the creation of star objects. This structure allows for a flexible and scalable approach to managing complex data representations.
Key Features
Star Management: The
Galaxy
class maintains a collection of star objects, enabling easy access and manipulation of various data projections.Search Functionality: Users can search for specific stars within the galaxy based on defined criteria, facilitating targeted exploration of data.
Distribution-Based Generation: Stars are generated from specified inner and outer system distributions, providing a customizable and systematic approach to data representation.
Use Cases
Data Exploration: Providing a structured way to explore different unsupervised projections of data.
Visualization: Enabling visualization of complex data relationships through star objects.
Algorithm Implementation: Serving as a foundation for implementing and managing various graph generation algorithms, such as the Kepler Mapper algorithm used in the
jmapStar Class
.
By leveraging the Galaxy
class, users can create a comprehensive and organized space of data representations, facilitating deeper insights and more effective analysis of complex datasets.
Galaxy Class¶
- class thema.multiverse.universe.galaxy.Galaxy(params=None, data=None, cleanDir=None, projDir=None, outDir=None, metric='stellar_kernel_distance', selector='random', nReps=3, YAML_PATH=None, verbose=False)[source]¶
Bases:
object
A space of stars.
The largest space of data representations, a galaxy can be searched to find particular stars and systems most suitable for a particular explorer.
Galaxy generates a space of star objects from the distribution of inner and outer systems.
Members¶
- data: str
Path to the original raw data file.
- cleanDir: str
Path to a populated directory containing Moons.
- projDir: str
Path to a populated directory containing Comets
- outDir: str
Path to an out directory to store star objects.
- selection: dict
Dictionary containing selected representative stars. Set by collapse function.
- YAML_PATH: str
Path to yaml configuration file.
Functions¶
- get_data_path() -> str
returns path to the raw data file
- fit() -> None
fits a space of Stars and saves to outDir
- collapse() -> list
clusters and selects representatives of star models
- show_MDS() -> None
plots a 2D representation of model layout
- save() -> None
Saves instance to pickle file.
Example
>>> cleanDir = <PATH TO MOON OBJECT FILES> >>> data = <PATH TO RAW DATA FILE> >>> projDir = <PATH TO COMET OBJECT FILES> >>> outDir = <PATH TO OUT DIRECTORY OF PROJECTIONS>
>>> params = { ... "jmap": { "nCubes":[2,5,8], ... "percOverlap": [0.2, 0.4], ... "minIntersection":[-1], ... "clusterer": [["HDBSCAN", {"minDist":0.1}]] ... } ... } >>> galaxy = Galaxy(params=params, ... data=data, ... cleanDir = cleanDir, ... projDir = projDir, ... outDir = outDir)
>>> galaxy.fit() >>> galaxy.show_MDS() >>> galaxy.collapse() ```
- collapse(metric=None, nReps=None, selector=None, **kwargs)[source]¶
Collapses the space of Stars into a small number of representative Stars
- Parameters:
metric (str, optional) – The metric used when comparing graphs. Currently, we only support stellar_kernel_distance. (default: None)
nReps (int, optional) – The number of representative stars. (default: None)
selector (str, optional) – The selection criteria to choose representatives from a cluster. Currently, only “random” is supported. (default: None)
**kwargs (dict) – Additional arguments necessary for different metric functions.
- Returns:
A dictionary containing the path to the star and the size of the group it represents.
- Return type:
dict
Examples
>>> galaxy = Galaxy() >>> galaxy.collapse(metric='stellar_kernel_distance', nReps=5, selector='random') {'0': {'star': 'path/to/star1', 'cluster_size': 10}, '1': {'star': 'path/to/star2', 'cluster_size': 8}, '2': {'star': 'path/to/star3', 'cluster_size': 12}, '3': {'star': 'path/to/star4', 'cluster_size': 9}, '4': {'star': 'path/to/star5', 'cluster_size': 11}}
- fit()[source]¶
Configure and generate space of Stars Uses the ProcessPoolExecutor library to spawn multiple star instances and fit them.
- Returns:
Saves star objects to outDir
- Return type:
None
- getParams()[source]¶
Returns the parameters of the Galaxy instance.
- Returns:
A dictionary containing the parameters of the Galaxy instance.
- Return type:
dict
- save(file_path)[source]¶
Save the current object instance to a file using pickle serialization.
- Parameters:
file_path (str) – The path to the file where the object will be saved.
- show_mds(randomState: int = None)[source]¶
Generates an embedding based on precomputed metric.
- Parameters:
randomState (int, default None) – seed to set MDS and ensure reproducable results
- Returns:
Shows a plot of the embedding.
- Return type:
None