Stars¶

THEMA is a Python package designed to provide a structured approach to creating and managing a universe of different unsupervised projections of data. At the core of THEMA lies the concept of Stars which are base class templates for atlas (graph) construction algorithms.

Simple Targeted Atlas Representation (STAR): The starGraph class serves as a base template for constructing atlas representations of data. By enforcing structure on data management and graph generation, starGraph enables a universal procedure for generating these objects. This base class provides a foundation for implementing various graph generation algorithms.
JMAP Star Class: The JMAP Star Class is a custom implementation of a Kepler Mapper (K-Mapper) algorithm into a Star object. This implementation allows users to explore the topological structure of their data using the Mapper algorithm, which is a powerful tool for visualizing high-dimensional data. The JMAP Star Class generates a graph representation of projections using Kepler Mapper, offering insights into the complex relationships within the data.

Star Functionality¶

Star Class¶

class thema.multiverse.universe.star.Star(data_path, clean_path, projection_path)[source]¶

Bases: Core

Simple Targeted Atlas Representation¶

A STAR is a base class template for atlas (graph) construction algorithms. As a parent class, Star enforces structure on data management and graph generation, enabling a ‘universal’ procedure for generating these objects.

For more information on implementing a realization of Star, please see docs/development/star.md.

abstract fit()[source]¶

An abstract method to be implemented by children of Star. This function must be realized by a graph construction algorithm able to initialize self.starGraph as a starGraph class.

Note: All parameters necessary for the graph construction algorithm must be passed as arguments to the star child’s constructor.

save(file_path, force=False)[source]¶

Save the current object instance to a file using pickle serialization.

Parameters:

file_path (str) – The path to the file where the object will be saved.
force (bool, default=False) – If True, saves object even with an uninitialized or empty starGraph member.

StarGraph Class¶

class thema.multiverse.universe.starGraph.starGraph(graph)[source]¶

Bases: object

A graph wrapper to guide you through the stars!

Parameters:: graph (nx.Graph) – The graph object representing the star graph.

graph¶

The graph object representing the star graph.

Type:: nx.Graph

is_EdgeLess()¶: Check if the graph is edgeless.

components()¶: Get a list of connected components in the graph.

get_MST(k=0, components=None)[source]¶: Calculate a customizable Minimum Spanning Tree of the weighted graph.

get_shortest_path(nodeID_1, nodeID_2)[source]¶: Calculate the shortest path between two nodes in the graph using Dijkstra’s algorithm.

property components¶

Get a list of connected components in the graph.

Returns:: A dictionary where the keys are component indices and the values are subgraphs representing the connected components.
Return type:: dict

get_MST(k=0, components=None)[source]¶

Calculate a customizable Minimum Spanning Tree of the weighted graph.

Default is to return a minimum spanning tree for each connected component. If a k value is supplied that is greater than the number of connected components, then a minimum spanning forest of k trees will be returned (in the case that k is less than the number of connected components, then the default MST is returned).

In the case that only certain components should be considered for further edge removal, then they may be specified in components and the k value should be supplied as a list.

Parameters:

k (int or list, optional) – The number of trees in the minimum spanning forest. Note that k is ignored if it is less than the number of connected components. Default is 0.
components (int or list, optional) – The connected components that are to be split. Default is None.

Returns:

The minimum spanning tree or forest of the weighted graph.

Return type:

nx.Graph

get_shortest_path(nodeID_1, nodeID_2)[source]¶

Calculate the shortest path between two nodes in the graph using Dijkstra’s algorithm.

Parameters:

nodeID_1 (int or str) – The identifier of the source node.
nodeID_2 (int or str) – The identifier of the target node.

Returns:

A tuple containing: - A list representing the nodes in the shortest path from nodeID_1 to nodeID_2. - The length of the shortest path, considering edge weights. If no path exists between the nodes, returns (None, infinity).

Return type:

tuple

property is_EdgeLess¶

Check if the graph is edgeless.

Returns:: True if the graph is edgeless, False otherwise.
Return type:: bool

jmapStar Class¶

class thema.multiverse.universe.stars.jmapStar.Nerve(minIntersection: int = -1)[source]¶

Bases: object

A class to handle generating weighted graphs from Keppler Mapper Simplicial Complexes.

Parameters:

weighted (bool, optional) – True if you want to generate a weighted graph. If False, please specify a minIntersection.
minIntersection (int, optional) – Minimum intersection considered when computing the nerve. An edge will be created only when the intersection between two nodes is greater than or equal to minIntersection. Not specifying this parameter will result in an unweighted graph.

compute(nodes)[source]¶

Compte the nerve of a simplicial complex.

Parameters:: nodes (dict) – A dictionary with entries {node id}:{list of ids in node}.
Returns:: edges – A 1-skeleton of the nerve (intersecting nodes).
Return type:: list

Examples

>>> nodes = {'node1': [1, 2, 3], 'node2': [2, 3, 4]}
>>> compute(nodes)
[['node1', 'node2']]

compute_unweighted_edges(nodes)[source]¶

Helper function to find edges of the overlapping clusters.

Parameters:

nodes (dict) – A dictionary with entries {node id}:{list of ids in node}.

Returns:

edges (list) – A 1-skeleton of the nerve (intersecting nodes).
simplicies (list) – Complete list of simplices.

Examples

>>> nodes = {'node1': [1, 2, 3], 'node2': [2, 3, 4]}
>>> compute_unweighted_edges(nodes)
[['node1', 'node2']]

compute_weighted_edges(nodes)[source]¶

Helper function to find edges of the overlapping clusters.

Parameters:

nodes (dict) – A dictionary with entries {node id}:{list of ids in node}.

Returns:

edges (list) – A 1-skeleton of the nerve (intersecting nodes).
simplicies (list) – Complete list of simplices.

Examples

>>> nodes = {'node1': [1, 2, 3], 'node2': [2, 3, 4]}
>>> compute_weighted_edges(nodes)
[('node1', 'node2', 0.333)]

thema.multiverse.universe.stars.jmapStar.convert_keys_to_alphabet(dictionary)[source]¶: Simple Helper function to make kmapper node labels more readable.

thema.multiverse.universe.stars.jmapStar.get_clusterer(clusterer: list)[source]¶

Converts a list configuration to an initialized clusterer.

Parameters:: clusterer (list) – A length 2 list containing in position 0 the name of the clusterer, and in position 1 the parameters to configure it. Example clusterer = [“HDBSCAN”, {“minDist”:0.1}]
Return type:: An initialized clustering object

thema.multiverse.universe.stars.jmapStar.initialize()[source]¶

Returns jmapStar class from module.This is a general method that allows us to initialize arbitrary star objects.

Returns:: jmapStar – The jMAP projectile object.
Return type:: object

class thema.multiverse.universe.stars.jmapStar.jmapStar(data_path: str, clean_path: str, projection_path: str, nCubes: int, percOverlap: float, minIntersection: int, clusterer: list)[source]¶

Bases: Star

JMAP Star Class

Our custom implementation of a Kepler Mapper (K-Mapper) into a Star object. Here we allow users to explore the topological structure of their data using the Mapper algorithm, which is a powerful tool for visualizing high-dimensional data.

Generates a graph representation of projection using Kepler Mapper.

Members¶

data: pd.DataFrame: a pandas dataframe of raw data
clean: pd.DataFrame: a pandas dataframe of complete, scaled, and encoded data
projection: np.narray: a numpy array containing projection coordinates
nCubes: int: kmapper paramter relating to covering of space
percOverlap: float: kmapper paramter relating to covering of space
minIntersection: int: number of shared items required to define an edge. Set to -1 to create a weighted graph.
clusterer: function: Clustering function passed to kmapper (e.g. HDBSCAN).
mapper: kmapper.mapper: A kmapper mapper object.
complex: dict: A dictionary specifying node membership
starGraph: thema.multiverse.universe.starGraph class: An expanded framework for analyzing networkx graphs

Functions¶

get_data_path() -> str: returns path to raw data
get_clean_path() -> str: returns path to Moon object containing clean data
get_projection_path()-> str: returns path to Comet object contatining projection data
fit() -> None: Computes a complex and corresponding starGraph
get_unclustered_items() -> list: returns list of unclustered items from HDBSCAN
save() -> None: Saves object as a .pkl file.

fit()[source]¶

Computes a kmapper complex based on the configuration parameters and constructs a resulting graph.

Returns:: Initializes complex and starGraph members
Return type:: None

Warning

Particular combinations of parameters can result in empty graphs or empty complexes.

get_unclustered_items()[source]¶

Returns the list of items that were not clustered in the mapper fitting.

Returns:: self._unclustered_item – A list of unclustered item ids
Return type:: list