Telescope

The Telescope class provides visualization capabilities for democratically selected star instances. It is designed to meet various visualization needs for analyzing and understanding data relationships within the Thema framework.

Telescope Class

class thema.probe.telescope.Telescope(star_file)[source]

Bases: object

Telescope Class - to view star objects

a suite to meet all your visualization needs

Members

pos: n-dimensional array

node positioning data for graphs and components

Functions

makeGraph()

Visualize a graph!

makeHeatmap()

Visualize a breakdown of your connected components as a heatmap!

makeSankey()

Create a sankey diagram from a custom score_function

makePathGraph()

Creates a shortest-path graph for a single component, based based on a custom definition of target nodes

Example

>>> star_fp = '<PATH TO FILE>/jmap_clustererHDBSCANmin_cluster_size10_minIntersection-1_nCubes10_percOverlap0.6_id3_3.pkl'
>>> telscope_instance = Telescope(star_fp)
makeGraph(group_number: int = None, k: float = None, seed: int = None, col: str = None, aggregation_func=None, hideLegend: bool = False, node_size_multiple: int = 10)[source]

Visualize a graph!

Parameters:
  • group_number (int) – graph connected component number to subset the visualization to For example, just show component 1 and not the entire graph

  • k (float, default None) – value from 0-1, determines optimal distance between nodes setting nx.spring_layout positions

  • seed (int, default None) – Random state for deterministic node layouts, defaulted so graph representations are reproducable setting nx.spring_layout positions

  • col (str) – Column to color nodes by - from the raw data

  • aggregation_func (np.<function>, defaults to calculating mean) –

    method by which to aggregate values within a node for coloring
    • color node by the median value or by the sum of values for example

    supports all numpy aggregation functions such as np.mean, np.median, np.sum, etc

  • hideLegend (bool, default False) – toggle the graph/component’s legend on or off

  • node_size_multiple (int, 10) – change the node sizing

  • ╭────────────────────────────────╮

  • | (│ NODE SIZING OPTIONS -- WIP)

  • ╰────────────────────────────────╯

Example

Visualize connected component #3 with nodes colored by the sum of total pollution of coal plants in the node (example using a dataset on coal plant impacts) >>> tel = Telescope(star_filePath) >>> tel.makeGraph(group_number=3, col=”Total Pollution”, aggregation_func=np.sum)

makeHeatmap(nodeDescriptorCols: bool = True, ncols: int | list[str] = None, aggregation_func=None, topZscoreCols: bool = False)[source]

Visualize a breakdown of your connected components!

Parameters:
  • ncols (int | List[Any], default 15) – int: the number of columns to visualize, selected from the front of your data list[str]: a list of specific columns from your data to create a heatmap of

  • aggregation_func (np.<function>, defaults to calculating mean) –

    method by which to aggregate values within a node for coloring
    • color node by the median value or by the sum of values for example

    supports all numpy aggregation functions such as np.mean, np.median, np.sum, etc

  • topZscoreCols (bool = False,) –

    visualize the ncols with the highest zscores with a group – in other words, the columns in which one or more groups is the MOST different than the dataset norm

    Overrides a ncols int specification

  • nodeDescriptorCols (bool = True,) –

    Smart select columns to view in your heatmap, based on a density representing the composition of a group by its nodes’ descriptions.

    Overrides a ncols int specification

Returns:

  • n/a (displays an inline matplotlib.plt)

  • ╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮

  • │ TODO - Dynamically select ncols based on cols w/ highest variance between groups for default viz |

  • ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

Example

>>> tel = Telescope(star_filePath)
>>> tel.makeHeatmap(ncols=['Pollution', 'Health Impact'], aggregation_func=np.mean)
makePathGraph(col: str, group_number: int, aggregation_func=None, top: bool = True, percentage: float = 0.1, path_labels: bool = False, node_labels: bool = False, k: float = None, seed: int = None, node_size_multiple: int = 10)[source]

Make a shortest-path graph by identifying sink (target) nodes and visualizing distance to them

Parameters:
  • col (str) – Column to color nodes by - from the raw data

  • group_number (int) – graph connected component number to subset the visualization to For example, just show component 1 and not the entire graph

  • aggregation_func (np.<function>, defaults to calculating mean) –

    method by which to aggregate values within a node for coloring
    • color node by the median value or by the sum of values

    for example supports all numpy aggregation functions such as np.mean, np.median, np.sum, etc

  • top (bool, default True) – Whether to select the top n percentage or the bottom n percentage of nodes as target/sink nodes NOTE: corresponds to the percentage param

  • percentage (float) – The n-th percentage of nodes to select as sinks/targets NOTE: corresponds to the top param

  • labels (bool, False) – Add text labeling target nodes and which sink is closet to non-targets

  • k (float, default 0.12) – value from 0-1, determines optimal distance between nodes setting nx.spring_layout positions

  • seed (int, default 12) – Random state for deterministic node layouts, defaulted so graph representations are reproducible setting nx.spring_layout positions

  • node_size_multiple (int, 10) – change the node sizing

  • path_labels (bool, default False) – add labels to the nodes indicating target nodes, and the target that each node is closest to.

  • node_labels (bool, default False) – add labels to the nodes, showing their node IDs for getting node-level data.

makeSankey(score_function, dropUnclustered: bool = True, title_text: str = None)[source]

Creates a Sankey Diagram based on the score function.

Parameters:

score_function (function, pd:DataFrame -> List) – score_function must take in a dataframe and return a classification (categorical) of elements.

Example

Assuming data has columns “height” and “age” columns, one could define a score function as follows:

``` def my_score_function(df):

scores = 0.5 * df[‘height’] + 2 * df[‘age’] labels = [‘high’ if score > 20 else ‘low’ for score in scores] return labels

```

property pos

Get the position of the telescope.

Returns:

The position of the telescope.

Return type:

numpy.ndarray

Notes

This member variable ensures that graph layouts are held constant when viewing graphs, groups/components, and path graphs. It is updated when updating seed and k in the makeGraph() and makePathGraph() functions.