Telescope¶

The Telescope class provides visualization capabilities for democratically selected star instances. It is designed to meet various visualization needs for analyzing and understanding data relationships within the Thema framework.

Telescope Class¶

class thema.probe.telescope.Telescope(star_file)[source]¶

Bases: object

Telescope Class - to view star objects: a suite to meet all your visualization needs

Members¶

pos: n-dimensional array: node positioning data for graphs and components

Functions¶

makeGraph(): Visualize a graph!
makeHeatmap(): Visualize a breakdown of your connected components as a heatmap!
makeSankey(): Create a sankey diagram from a custom score_function
makePathGraph(): Creates a shortest-path graph for a single component, based based on a custom definition of target nodes

Example

>>> star_fp = '<PATH TO FILE>/jmap_clustererHDBSCANmin_cluster_size10_minIntersection-1_nCubes10_percOverlap0.6_id3_3.pkl'
>>> telscope_instance = Telescope(star_fp)

makeGraph(group_number: int = None, k: float = None, seed: int = None, col: str = None, aggregation_func=None, hideLegend: bool = False, node_size_multiple: int = 10)[source]¶

Visualize a graph!

Parameters:

group_number (int) – graph connected component number to subset the visualization to For example, just show component 1 and not the entire graph
k (float, default None) – value from 0-1, determines optimal distance between nodes setting nx.spring_layout positions
seed (int, default None) – Random state for deterministic node layouts, defaulted so graph representations are reproducable setting nx.spring_layout positions
col (str) – Column to color nodes by - from the raw data
aggregation_func (np.<function>, defaults to calculating mean) –
method by which to aggregate values within a node for coloring
- color node by the median value or by the sum of values for example
supports all numpy aggregation functions such as np.mean, np.median, np.sum, etc
hideLegend (bool, default False) – toggle the graph/component’s legend on or off
node_size_multiple (int, 10) – change the node sizing
╭────────────────────────────────╮
| (│ NODE SIZING OPTIONS -- WIP)
╰────────────────────────────────╯

Example

Visualize connected component #3 with nodes colored by the sum of total pollution of coal plants in the node (example using a dataset on coal plant impacts) >>> tel = Telescope(star_filePath) >>> tel.makeGraph(group_number=3, col=”Total Pollution”, aggregation_func=np.sum)

makeHeatmap(nodeDescriptorCols: bool = True, ncols: int | list[str] = None, aggregation_func=None, topZscoreCols: bool = False)[source]¶

Visualize a breakdown of your connected components!

Parameters:

ncols (int | List[Any], default 15) – int: the number of columns to visualize, selected from the front of your data list[str]: a list of specific columns from your data to create a heatmap of
aggregation_func (np.<function>, defaults to calculating mean) –
method by which to aggregate values within a node for coloring
- color node by the median value or by the sum of values for example
supports all numpy aggregation functions such as np.mean, np.median, np.sum, etc
topZscoreCols (bool = False,) –
visualize the ncols with the highest zscores with a group – in other words, the columns in which one or more groups is the MOST different than the dataset norm

Overrides a ncols int specification
nodeDescriptorCols (bool = True,) –
Smart select columns to view in your heatmap, based on a density representing the composition of a group by its nodes’ descriptions.

Overrides a ncols int specification

Returns:

n/a (displays an inline matplotlib.plt)
╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ TODO - Dynamically select ncols based on cols w/ highest variance between groups for default viz |
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

Example

>>> tel = Telescope(star_filePath)
>>> tel.makeHeatmap(ncols=['Pollution', 'Health Impact'], aggregation_func=np.mean)

makePathGraph(col: str, group_number: int, aggregation_func=None, top: bool = True, percentage: float = 0.1, path_labels: bool = False, node_labels: bool = False, k: float = None, seed: int = None, node_size_multiple: int = 10)[source]¶

Make a shortest-path graph by identifying sink (target) nodes and visualizing distance to them

Parameters:

col (str) – Column to color nodes by - from the raw data
group_number (int) – graph connected component number to subset the visualization to For example, just show component 1 and not the entire graph
aggregation_func (np.<function>, defaults to calculating mean) –
method by which to aggregate values within a node for coloring
- color node by the median value or by the sum of values
for example supports all numpy aggregation functions such as np.mean, np.median, np.sum, etc
top (bool, default True) – Whether to select the top n percentage or the bottom n percentage of nodes as target/sink nodes NOTE: corresponds to the percentage param
percentage (float) – The n-th percentage of nodes to select as sinks/targets NOTE: corresponds to the top param
labels (bool, False) – Add text labeling target nodes and which sink is closet to non-targets
k (float, default 0.12) – value from 0-1, determines optimal distance between nodes setting nx.spring_layout positions
seed (int, default 12) – Random state for deterministic node layouts, defaulted so graph representations are reproducible setting nx.spring_layout positions
node_size_multiple (int, 10) – change the node sizing
path_labels (bool, default False) – add labels to the nodes indicating target nodes, and the target that each node is closest to.
node_labels (bool, default False) – add labels to the nodes, showing their node IDs for getting node-level data.

makeSankey(score_function, dropUnclustered: bool = True, title_text: str = None)[source]¶

Creates a Sankey Diagram based on the score function.

Parameters:: score_function (function, pd:DataFrame -> List) – score_function must take in a dataframe and return a classification (categorical) of elements.

Example

Assuming data has columns “height” and “age” columns, one could define a score function as follows:

``` def my_score_function(df):

scores = 0.5 * df[‘height’] + 2 * df[‘age’] labels = [‘high’ if score > 20 else ‘low’ for score in scores] return labels

```

property pos¶

Get the position of the telescope.

Returns:: The position of the telescope.
Return type:: numpy.ndarray

Notes

This member variable ensures that graph layouts are held constant when viewing graphs, groups/components, and path graphs. It is updated when updating seed and k in the makeGraph() and makePathGraph() functions.