Stars

A Star turns your projection into a graph that reveals the shape of your data. Think of it like a map: rows that are similar get grouped into nodes, and nodes that share rows get connected by edges. This lets you see clusters, branches, and outliers at a glance—without needing to understand the math under the hood.

All three options produce the same type of output (a starGraph). They differ in how much tuning they require and how they decide which rows belong together:

Star Type

Best For

Key Parameters

jmapStar Class

Fine-grained control. Tune cubes & overlap.

n_cubes, perc_overlap, clusterer

gudhiStar Class

Auto-tuned defaults. Less manual tuning.

N, beta, C, clusterer

pyballStar Class

Simplest option. Just set one parameter.

EPS (ball radius)

Which should I use?

  • New to this? Start with pyballStar Class—set EPS and go.

  • Want sensible defaults? Try gudhiStar Class—it estimates parameters for you.

  • Need precise control? Use jmapStar Class—full control over resolution and clustering.

Star API

Star Class

class thema.multiverse.universe.star.Star(data_path, clean_path, projection_path)[source]

Bases: Core

Simple Targeted Atlas Representation

A STAR is a base class template for atlas (graph) construction algorithms. As a parent class, Star enforces structure on data management and graph generation, enabling a ‘universal’ procedure for generating these objects.

For more information on implementing a realization of Star, please see docs/development/star.md.

abstractmethod fit()[source]

An abstract method to be implemented by children of Star. This function must be realized by a graph construction algorithm able to initialize self.starGraph as a starGraph class.

Note: All parameters necessary for the graph construction algorithm must be passed as arguments to the star child’s constructor.

abstractmethod get_pseudoLaplacian()[source]

An abstract method to be implemented by children of Star. This function must be realized by a matrix representation of the star graph somewhat representative of a laplacian.

save(file_path, force=False)[source]

Save the current object instance to a file using pickle serialization.

Parameters:
  • file_path (str) – The path to the file where the object will be saved.

  • force (bool, default=False) – If True, saves object even if the starGraph is uninitialized or empty.

Returns:

True if saved successfully, False otherwise.

Return type:

bool

starGraph Class

class thema.multiverse.universe.utils.starGraph.starGraph(graph: Graph)[source]

Bases: object

A graph wrapper to guide you through the stars!

Parameters:

graph (nx.Graph) – The graph object representing the star graph.

graph

The graph object representing the star graph.

Type:

nx.Graph

is_EdgeLess()

Check if the graph is edgeless.

components()

Get a list of connected components in the graph.

get_MST(k=0, components=None)[source]

Calculate a customizable Minimum Spanning Tree of the weighted graph.

get_shortest_path(nodeID_1, nodeID_2)[source]

Calculate the shortest path between two nodes in the graph using Dijkstra’s algorithm.

property components

Get a list of connected components in the graph.

Returns:

A dictionary where the keys are component indices and the values are subgraphs representing the connected components.

Return type:

dict

get_MST(k=0, components=None)[source]

Calculate a customizable Minimum Spanning Tree of the weighted graph.

Default is to return a minimum spanning tree for each connected component. If a k value is supplied that is greater than the number of connected components, then a minimum spanning forest of k trees will be returned (in the case that k is less than the number of connected components, then the default MST is returned).

In the case that only certain components should be considered for further edge removal, then they may be specified in components and the k value should be supplied as a list.

Parameters:
  • k (int or list, optional) – The number of trees in the minimum spanning forest. Note that k is ignored if it is less than the number of connected components. Default is 0.

  • components (int or list, optional) – The connected components that are to be split. Default is None.

Returns:

The minimum spanning tree or forest of the weighted graph.

Return type:

nx.Graph

get_shortest_path(nodeID_1, nodeID_2)[source]

Calculate the shortest path between two nodes in the graph using Dijkstra’s algorithm.

Parameters:
  • nodeID_1 (int or str) – The identifier of the source node.

  • nodeID_2 (int or str) – The identifier of the target node.

Returns:

A tuple containing: - A list representing the nodes in the shortest path from nodeID_1 to nodeID_2. - The length of the shortest path, considering edge weights. If no path exists between the nodes, returns (None, infinity).

Return type:

tuple

property is_EdgeLess

Check if the graph is edgeless.

Returns:

True if the graph is edgeless, False otherwise.

Return type:

bool

jmapStar Class

thema.multiverse.universe.stars.jmapStar.initialize()[source]

Returns jmapStar class from module.This is a general method that allows us to initialize arbitrary star objects.

Returns:

jmapStar – The jMAP projectile object.

Return type:

object

class thema.multiverse.universe.stars.jmapStar.jmapStar(data_path: str, clean_path: str, projection_path: str, nCubes: int, percOverlap: float, minIntersection: int, clusterer: list)[source]

Bases: Star

JMAP Star Class

Our custom implementation of a Kepler Mapper (K-Mapper) into a Star object. Here we allow users to explore the topological structure of their data using the Mapper algorithm, which is a powerful tool for visualizing high-dimensional data.

Generates a graph representation of projection using Kepler Mapper.

Members

data: pd.DataFrame

a pandas dataframe of raw data

clean: pd.DataFrame

a pandas dataframe of complete, scaled, and encoded data

projection: np.narray

a numpy array containing projection coordinates

nCubes: int

kmapper paramter relating to covering of space

percOverlap: float

kmapper paramter relating to covering of space

minIntersection: int

number of shared items required to define an edge. Set to -1 to create a weighted graph.

clusterer: function

Clustering function passed to kmapper (e.g. HDBSCAN).

mapper: kmapper.mapper

A kmapper mapper object.

complex: dict

A dictionary specifying node membership

starGraph: thema.multiverse.universe.utils.starGraph class

An expanded framework for analyzing networkx graphs

Functions

get_data_path() -> str

returns path to raw data

get_clean_path() -> str

returns path to Moon object containing clean data

get_projection_path()-> str

returns path to Comet object contatining projection data

fit() -> None

Computes a complex and corresponding starGraph

get_unclustered_items() -> list

returns list of unclustered items from HDBSCAN

save() -> None

Saves object as a .pkl file.

fit()[source]

Computes a kmapper complex based on the configuration parameters and constructs a resulting graph.

Returns:

Initializes complex and starGraph members

Return type:

None

Warning

Particular combinations of parameters can result in empty graphs or empty complexes.

get_pseudoLaplacian(neighborhood='node')[source]

Calculates and returns a pseudo laplacian n by n matrix representing neighborhoods in the graph. Here, n corresponds to the number of items (ie rows in the clean data - keep in mind some raw data rows may have been dropped in cleaning). Here, the diagonal element A_ii represents the number of neighborhoods item i appears in. The element A_ij represent the number of neighborhoods both item i and j belong to.

Parameters:

neighborhood (str) – Specifies the type of neighborhood. For jmapStar, neighborhood options are ‘node’ or ‘cc’

get_unclustered_items()[source]

Returns the list of items that were not clustered in the mapper fitting.

Returns:

self._unclustered_item – A list of unclustered item ids

Return type:

list

gudhiStar Class

class thema.multiverse.universe.stars.gudhiStar.gudhiStar(data_path: str, clean_path: str, projection_path: str, clusterer: list, N: int = 100, beta: float = 0.0, C: float = 10.0)[source]

Bases: Star

GUDHI Star Class

  • inherits from Star

Generates a graph representation of projection using gudhi.

See: https://gudhi.inria.fr/python/latest/cover_complex_sklearn_isk_ref.html

Members

data: pd.DataFrame

a pandas dataframe of raw data

clean: pd.DataFrame

a pandas dataframe of complete, scaled, and encoded data

projection: np.narray

a numpy array containing projection coordinates

clusterer: list

A list of length 2 containing clusterer name in pos 0, and kwargs in pos 1.

mapper: gudhi.cover_complex.MapperComplex

a mapper object

starGraph: thema.multiverse.universe.starGraph class

An expanded framework for analyzing networkx graphs

Functions

get_data_path() -> str

returns path to raw data

get_clean_path() -> str

returns path to Moon object containing clean data

get_projection_path()-> str

returns path to Comet object containing projection data

fit() -> None

Computes a complex and corresponding starGraph

get_unclustered_items() -> list

returns list of unclustered items from HDBSCAN

save() -> None

Saves object as a .pkl file.

fit(labels=None)[source]

Constructs a cosmic Graph using gudhi’s MapperComplex.

Returns:

Initializes starGraph member

Return type:

None

Warning

Particular combinations of parameters can result in empty graphs or empty complexes.

get_pseudoLaplacian(neighborhood='node')[source]

Calculates and returns a pseudo laplacian n by n matrix representing neighborhoods in the graph. Here, n corresponds to the number of items (ie rows in the clean data - keep in mind some raw data rows may have been dropped in cleaning). Here, the diagonal element A_ii represents the number of neighborhoods item i appears in. The element A_ij represent the number of neighborhoods both item i and j belong to.

Parameters:

neighborhood (str) – Specifies the type of neighborhood. For jmapStar, neighborhood options are ‘node’ or ‘cc’

get_unclustered_items()[source]

Returns the list of items that were not clustered in the mapper fitting.

Returns:

self._unclustered_item – A list of unclustered item ids

Return type:

list

thema.multiverse.universe.stars.gudhiStar.initialize()[source]

Returns gudhiStar class from module.

pyballStar Class

thema.multiverse.universe.stars.pyballStar.initialize()[source]

Returns pyballStar class from module.

class thema.multiverse.universe.stars.pyballStar.pyballStar(data_path, clean_path, projection_path, EPS=0.1)[source]

Bases: Star

PyBall Mapper Star Class

Generates a graph representation of projection using PyBall Mapper.

See: https://github.com/dioscuri-tda/pyBallMapper

Members

data: pd.DataFrame

a pandas dataframe of raw data

clean: pd.DataFrame

a pandas dataframe of complete, scaled, and encoded data

projection: np.narray

a numpy array containing projection coordinates

EPS: float

epsilon parameter for BallMapper

mapper: pyballmapper.BallMapper

a BallMapper object

starGraph: thema.multiverse.universe.starGraph class

An expanded framework for analyzing networkx graphs

Functions

get_data_path() -> str

returns path to raw data

get_clean_path() -> str

returns path to Moon object containing clean data

get_projection_path()-> str

returns path to Comet object containing projection data

fit() -> None

Computes a complex and corresponding starGraph

get_unclustered_items() -> list

returns list of unclustered items

save() -> None

Saves object as a .pkl file.

fit()[source]

An abstract method to be implemented by children of Star. This function must be realized by a graph construction algorithm able to initialize self.starGraph as a starGraph class.

Note: All parameters necessary for the graph construction algorithm must be passed as arguments to the star child’s constructor.

get_pseudoLaplacian(neighborhood='node')[source]

Calculates and returns a pseudo laplacian n by n matrix representing neighborhoods in the graph. Here, n corresponds to the number of items (ie rows in the clean data - keep in mind some raw data rows may have been dropped in cleaning). Here, the diagonal element A_ii represents the number of neighborhoods item i appears in. The element A_ij represent the number of neighborhoods both item i and j belong to.

Parameters:

neighborhood (str) – Specifies the type of neighborhood. For pyballStar, neighborhood options are ‘node’ or ‘cc’

get_unclustered_items()[source]

Returns the list of items that were not clustered in the mapper fitting.

Returns:

self._unclustered_item – A list of unclustered item ids

Return type:

list