Observatories

The jmapObservatory class is a custom observatory designed specifically for viewing JMAP Stars. It extends the functionality of the base Observatory class, providing additional methods and attributes tailored to the graph models outputted by JMAP Star. This class allows for detailed analysis and exploration of complex data relationships within the JMAP framework.

jmapObservatory

thema.probe.observatories.jmapObservatory.initialize()[source]
class thema.probe.observatories.jmapObservatory.jmapObservatory(star_file)[source]

Bases: Observatory

Custom observatory for viewing JMAP Stars.

This class extends the Observatory class and provides additional functionality specific to the graph models outputted by JMAP Star.

Parameters:

star_filestr

The file path to the star file.

Attributes:

_unclusteredlist

A list of unclustered items.

_group_lookuptabledict

A dictionary to aid in group decomposition and item lookup.

_node_lookuptabledict

A dictionary to aid in group decomposition and item lookup.

_group_directorydict

A dictionary containing cluster members for each group.

Methods:

get_items_groupID(item_id)

Look up function for finding an item’s connected component (i.e., group).

get_items_nodeID(item_id)

Look up function for finding an item’s member node.

get_nodes_members(node_id)

Look up function to get members of a node.

get_groups_members(group_id)

Look up function to get items within a group.

get_groups_member_nodes(group_id)

Look up function to get nodes within a connected component.

get_nodes_groupID(node_id)

Returns the node’s group id.

get_global_stats()

Calculates global mean and standard deviation statistics.

get_nodes_raw_df(node_id)

Returns a subset of the raw dataframe only containing members of the specified node.

get_nodes_clean_df(node_id)

Returns a subset of the clean dataframe only containing members of the specified node.

get_nodes_projections(node_id)

Returns a subset of the projections array only containing members of the specified node.

get_groups_raw_df(group_id)

Returns a subset of the raw dataframe only containing members of the specified group.

get_groups_clean_df(group_id)

Returns a subset of the clean dataframe only containing members of the specified group.

get_groups_projections(group_id)

Returns a subset of the projections array only containing members of the specified group.

compute_node_description(node_id, description_fn=get_minimal_std)

Compute a simple description of each node in the graph.

compute_group_description(group_id, description_fn=get_minimal_std)

Compute a simple description of a policy group.

Example

>>> from thema.probe.observatories import jmapObservatory
>>> star_file = "path/to/star_file"
>>> obs = jmapObservatory(star_file)
>>> obs.get_items_groupID(1)
compute_group_description(group_id: int, description_fn=<function get_minimal_std>)[source]

Compute a simple description of a policy group.

This function creates a density description based on its member nodes description in compute_node_description().

Parameters:

group_id: int

A group’s identifier (-1 to get unclustered group)

description_fn: function

A function to be passed to compute_node_description()

rtype:

A density description of the group.

compute_group_identity(group_id: int, eval_fn=<function std_zscore_threshold_filter>, *args, **kwargs)[source]

Computes the most important identifiers of a group as specified by the evalulation function.

Parameters:
  • group_id – A group’s identifier.

  • eval_fn – The function used score each column in the dataframe. The minimum scoring columns are chosen to represent the group’s identity.

  • kwargs

    Any key word arguments that need to passed to the aliased evaluation functinon. If, for example, you wanted to pass a parameter std_threshold to your eval function, std_zscore_threshold_filter you could do as

    compute_group_identity(id, eval_fn=std_zscore_threshold_filter, std_threshold=0.8)

compute_node_description(node_id: str, description_fn=<function get_minimal_std>)[source]

Compute a simple description of each node in the graph.

This function labels each node based on a description function. The description function is used to select a defining column from the original dataset, which will serve as a representative of the noes identity. Obviously there is a number of ways to do this, but as a default this computes the most homogenous data column for a each node.

Parameters:

node_id:

A node identifier (-1 for unclustered items)

description_fn: function

A function that takes a data frame, mask, and density columns and returns a column.

Returns:

A dictionary containing the representing column label and the number of items in the node.

dataset_zscores_df(n_cols=10)[source]

STUB

define_nodeValueDict(group_number: int, col: str, aggregation_func=None) dict[source]

Creates a dict where each node is assiged a value based on an aggregation of items in that node,used to create a path graph target/sink nodes.

Parameters:
  • group_number (int) – Group/Connected component number

  • col (str) – Column from the clean dataframe

  • aggregation_func (np.<function>, defaults to calculating mean) –

    method by which to aggregate values within a node for coloring
    • color node by the median value or by the sum of values for

    example supports all numpy aggregation functions such as np.mean, np.median, np.sum, etc

Returns:

results_dict – A dictionarty with node IDs as keys and their corresponding numeric value as values

Return type:

dict

get_aggregatedGroupDf(aggregation_func=None, clean: bool = True) DataFrame[source]

Aggregate each group of the DataFrame using a custom aggregation function.

Parameters: - aggregation_func: function, the aggregation function to apply

Returns: - DataFrame with the aggregation function applied to each group

get_global_stats()[source]

Calculates global mean and standard deviation statistics.

Return type:

A dictionary containing statistics on both raw and clean df subsets for each group.

get_group_descriptions(description_fn=<function get_minimal_std>)[source]

Returns a dictionary of group descriptions for each group as specified by the passed description function.

Parameter

description_fn: function

A function that determines a representative column for each node in a group.

rtype:

A density representing the composition of a group by its nodes’ descriptions.

get_group_identities(eval_fn=<function std_zscore_threshold_filter>, *args, **kwargs)[source]

Returns a dictionary of group identies as specified by compute_group_identity.

Paramters

eval_fn:

The function used score each column in the dataframe. The minimum scoring columns are chosen to represent the group’s identity.

kwargs:

Any key word arguments that need to passed to the aliased evaluation functinon. If, for example, you wanted to pass a parameter std_threshold to your eval function, std_zscore_threshold_filter you could do so with

get_group_identities(eval_fn=std_zscore_threshold_filter, std_threshold=0.8)

get_group_numbers() list[source]

Return a list of all group #s in a jmapStar graph

get_groups_clean_df(group_id: int)[source]

Returns a subset of the clean dataframe only containing members of the specified group.

Parameters:

group_id (int) – A group’s identifier

Return type:

A pandas data frame.

get_groups_member_nodes(group_id: int)[source]

Look up Function to get nodes within a connected component

Parameters:

group_id: int

Group number of desired connected component

Returns:

A list of node members for the specified group

get_groups_members(group_id: int)[source]

Look up function to get items within a group

Parameters:

group_id: int

Group number of desired connected component

Returns:

A list of the item members for the specified group

get_groups_projections(group_id: int)[source]

Returns a subset of the projectinos array only containing members of the specified group.

Parameters:

node_id (str) – A groups’s identifier

Return type:

An np.array of projections.

get_groups_raw_df(group_id: int)[source]

Returns a subset of the raw dataframe only containing members of the specified group.

Parameters:

group_id (int) – A group’s identifier

Return type:

A pandas data frame.

get_items_groupID(item_id: int)[source]

Look up function for finding an item’s connected component (ie group)

Parameters:

item_id: int

Index of desired look up item from user’s raw data frame

Returns:

A list of group ids that the item is a member of (-1 if unclustered)

get_items_nodeID(item_id: int)[source]

Look up function for finding item’s member node

Parameters:

item_id: int

Index of desired look up item from user’s raw data frame

Returns:

A list of node ids that the item is a member of

get_nodes_clean_df(node_id: str)[source]

Returns a subset of the clean dataframe only containing members of the specified node.

Parameters:

node_id (str) – A node’s string identifier

Return type:

A pandas data frame.

get_nodes_groupID(node_id: str)[source]

Returns the node’s group id.

Parameters:

node_idstr

A character ID specifying the node

Returns:

A group ID number.

get_nodes_members(node_id: str)[source]

Look up function to get members of a node

Parameters:

node_id: str

String identifier of a node

Returns:

A list of member items

get_nodes_projections(node_id: str)[source]

Returns a subset of the projectinos array only containing members of the specified node.

Parameters:

node_id (str) – A node’s string identifier

Return type:

An np.array of projections.

get_nodes_raw_df(node_id: str)[source]

Returns a subset of the raw dataframe only containing members of the specified node.

Parameters:

node_id (str) – A node’s string identifier

Return type:

A pandas data frame.

target_matching(target: DataFrame, col_filter: list = None)[source]

Matches a target item into a generated group by calculating the minimum deviation from a groups mean over available numeric columns.

Parameters:
  • target (pd.DataFrame) – A data frame containing one row.

  • col_filter – A list of columns to perform the mathcing on.