Observatories¶
The jmapObservatory
class is a custom observatory designed specifically for viewing JMAP Stars. It extends the functionality of the base Observatory class, providing additional methods and attributes tailored to the graph models outputted by JMAP Star. This class allows for detailed analysis and exploration of complex data relationships within the JMAP framework.
jmapObservatory¶
- class thema.probe.observatories.jmapObservatory.jmapObservatory(star_file)[source]¶
Bases:
Observatory
Custom observatory for viewing JMAP Stars.
This class extends the Observatory class and provides additional functionality specific to the graph models outputted by JMAP Star.
Parameters:¶
- star_filestr
The file path to the star file.
Attributes:¶
- _unclusteredlist
A list of unclustered items.
- _group_lookuptabledict
A dictionary to aid in group decomposition and item lookup.
- _node_lookuptabledict
A dictionary to aid in group decomposition and item lookup.
- _group_directorydict
A dictionary containing cluster members for each group.
Methods:¶
- get_items_groupID(item_id)
Look up function for finding an item’s connected component (i.e., group).
- get_items_nodeID(item_id)
Look up function for finding an item’s member node.
- get_nodes_members(node_id)
Look up function to get members of a node.
- get_groups_members(group_id)
Look up function to get items within a group.
- get_groups_member_nodes(group_id)
Look up function to get nodes within a connected component.
- get_nodes_groupID(node_id)
Returns the node’s group id.
- get_global_stats()
Calculates global mean and standard deviation statistics.
- get_nodes_raw_df(node_id)
Returns a subset of the raw dataframe only containing members of the specified node.
- get_nodes_clean_df(node_id)
Returns a subset of the clean dataframe only containing members of the specified node.
- get_nodes_projections(node_id)
Returns a subset of the projections array only containing members of the specified node.
- get_groups_raw_df(group_id)
Returns a subset of the raw dataframe only containing members of the specified group.
- get_groups_clean_df(group_id)
Returns a subset of the clean dataframe only containing members of the specified group.
- get_groups_projections(group_id)
Returns a subset of the projections array only containing members of the specified group.
- compute_node_description(node_id, description_fn=get_minimal_std)
Compute a simple description of each node in the graph.
- compute_group_description(group_id, description_fn=get_minimal_std)
Compute a simple description of a policy group.
Example
>>> from thema.probe.observatories import jmapObservatory >>> star_file = "path/to/star_file" >>> obs = jmapObservatory(star_file) >>> obs.get_items_groupID(1)
- compute_group_description(group_id: int, description_fn=<function get_minimal_std>)[source]¶
Compute a simple description of a policy group.
This function creates a density description based on its member nodes description in compute_node_description().
Parameters:¶
- group_id: int
A group’s identifier (-1 to get unclustered group)
- description_fn: function
A function to be passed to compute_node_description()
- rtype:
A density description of the group.
- compute_group_identity(group_id: int, eval_fn=<function std_zscore_threshold_filter>, *args, **kwargs)[source]¶
Computes the most important identifiers of a group as specified by the evalulation function.
- Parameters:
group_id – A group’s identifier.
eval_fn – The function used score each column in the dataframe. The minimum scoring columns are chosen to represent the group’s identity.
kwargs –
Any key word arguments that need to passed to the aliased evaluation functinon. If, for example, you wanted to pass a parameter std_threshold to your eval function, std_zscore_threshold_filter you could do as
compute_group_identity(id, eval_fn=std_zscore_threshold_filter, std_threshold=0.8)
- compute_node_description(node_id: str, description_fn=<function get_minimal_std>)[source]¶
Compute a simple description of each node in the graph.
This function labels each node based on a description function. The description function is used to select a defining column from the original dataset, which will serve as a representative of the noes identity. Obviously there is a number of ways to do this, but as a default this computes the most homogenous data column for a each node.
Parameters:¶
- node_id:
A node identifier (-1 for unclustered items)
- description_fn: function
A function that takes a data frame, mask, and density columns and returns a column.
Returns:¶
A dictionary containing the representing column label and the number of items in the node.
- define_nodeValueDict(group_number: int, col: str, aggregation_func=None) dict [source]¶
Creates a dict where each node is assiged a value based on an aggregation of items in that node,used to create a path graph target/sink nodes.
- Parameters:
group_number (int) – Group/Connected component number
col (str) – Column from the clean dataframe
aggregation_func (np.<function>, defaults to calculating mean) –
- method by which to aggregate values within a node for coloring
color node by the median value or by the sum of values for
example supports all numpy aggregation functions such as np.mean, np.median, np.sum, etc
- Returns:
results_dict – A dictionarty with node IDs as keys and their corresponding numeric value as values
- Return type:
dict
- get_aggregatedGroupDf(aggregation_func=None, clean: bool = True) DataFrame [source]¶
Aggregate each group of the DataFrame using a custom aggregation function.
Parameters: - aggregation_func: function, the aggregation function to apply
Returns: - DataFrame with the aggregation function applied to each group
- get_global_stats()[source]¶
Calculates global mean and standard deviation statistics.
- Return type:
A dictionary containing statistics on both raw and clean df subsets for each group.
- get_group_descriptions(description_fn=<function get_minimal_std>)[source]¶
Returns a dictionary of group descriptions for each group as specified by the passed description function.
Parameter¶
- description_fn: function
A function that determines a representative column for each node in a group.
- rtype:
A density representing the composition of a group by its nodes’ descriptions.
- get_group_identities(eval_fn=<function std_zscore_threshold_filter>, *args, **kwargs)[source]¶
Returns a dictionary of group identies as specified by compute_group_identity.
Paramters¶
- eval_fn:
The function used score each column in the dataframe. The minimum scoring columns are chosen to represent the group’s identity.
- kwargs:
Any key word arguments that need to passed to the aliased evaluation functinon. If, for example, you wanted to pass a parameter std_threshold to your eval function, std_zscore_threshold_filter you could do so with
get_group_identities(eval_fn=std_zscore_threshold_filter, std_threshold=0.8)
- get_groups_clean_df(group_id: int)[source]¶
Returns a subset of the clean dataframe only containing members of the specified group.
- Parameters:
group_id (int) – A group’s identifier
- Return type:
A pandas data frame.
- get_groups_member_nodes(group_id: int)[source]¶
Look up Function to get nodes within a connected component
Parameters:¶
- group_id: int
Group number of desired connected component
Returns:¶
A list of node members for the specified group
- get_groups_members(group_id: int)[source]¶
Look up function to get items within a group
Parameters:¶
- group_id: int
Group number of desired connected component
Returns:¶
A list of the item members for the specified group
- get_groups_projections(group_id: int)[source]¶
Returns a subset of the projectinos array only containing members of the specified group.
- Parameters:
node_id (str) – A groups’s identifier
- Return type:
An np.array of projections.
- get_groups_raw_df(group_id: int)[source]¶
Returns a subset of the raw dataframe only containing members of the specified group.
- Parameters:
group_id (int) – A group’s identifier
- Return type:
A pandas data frame.
- get_items_groupID(item_id: int)[source]¶
Look up function for finding an item’s connected component (ie group)
Parameters:¶
- item_id: int
Index of desired look up item from user’s raw data frame
Returns:¶
A list of group ids that the item is a member of (-1 if unclustered)
- get_items_nodeID(item_id: int)[source]¶
Look up function for finding item’s member node
Parameters:¶
- item_id: int
Index of desired look up item from user’s raw data frame
Returns:¶
A list of node ids that the item is a member of
- get_nodes_clean_df(node_id: str)[source]¶
Returns a subset of the clean dataframe only containing members of the specified node.
- Parameters:
node_id (str) – A node’s string identifier
- Return type:
A pandas data frame.
- get_nodes_groupID(node_id: str)[source]¶
Returns the node’s group id.
Parameters:¶
- node_idstr
A character ID specifying the node
Returns:¶
A group ID number.
- get_nodes_members(node_id: str)[source]¶
Look up function to get members of a node
Parameters:¶
- node_id: str
String identifier of a node
Returns:¶
A list of member items
- get_nodes_projections(node_id: str)[source]¶
Returns a subset of the projectinos array only containing members of the specified node.
- Parameters:
node_id (str) – A node’s string identifier
- Return type:
An np.array of projections.
- get_nodes_raw_df(node_id: str)[source]¶
Returns a subset of the raw dataframe only containing members of the specified node.
- Parameters:
node_id (str) – A node’s string identifier
- Return type:
A pandas data frame.
- target_matching(target: DataFrame, col_filter: list = None)[source]¶
Matches a target item into a generated group by calculating the minimum deviation from a groups mean over available numeric columns.
- Parameters:
target (pd.DataFrame) – A data frame containing one row.
col_filter – A list of columns to perform the mathcing on.