THEMA: Core¶
The Core
class is a data container class used for modeling in THEMA, working in the background so users don’t have to. It provides access to three versions of the user’s data: raw data, cleaned data, and projected data. This class is inherited by almost everything in THEMA, providing a central data handling mechanism to manage data in the background.
Core Class¶
- class thema.core.Core(data_path, clean_path, projection_path)[source]¶
Bases:
object
A data container class for the various versions needed when modeling using the Mapper algorithm.
This class points to the locations of three local versions of the user’s data: 1) data: raw data pulled directly from a database (e.g. Mongo),
downloaded, or collected locally.
- clean: data that has been cleaned via dropping features, scaling,
removing NaNs, etc.
- projection: data that has been collapsed using a dimensionality
reduction technique (e.g. PCA, UMAP).
- Parameters:
data_path (str) – A path to raw data pickle file (relative from root).
clean_path (str) – A path to clean data pickle file (relative from root).
projection_path (str) – A path to projected data pickle file (relative from root).
- _data¶
The path to the raw data pickle file.
- Type:
str
- _clean¶
The path to the clean data pickle file.
- Type:
str
- _projection¶
The path to the projected data pickle file.
- Type:
str
- data()¶
Returns the raw data in your Core.
- clean()¶
Get the clean data in your Core.
- projection()¶
Get the projected data in your Core.
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.data pd.DataFrame([[1, 2, 3], [4, 5, 6]]) >>> core.clean pd.DataFrame([[1, 2, 3], [4, 5, 6]]) >>> core.projection np.array([[1, 2, 3], [4, 5, 6]])
- property clean¶
Returns the clean data from your Core.
This method handles .pkl files, reading in the clean data file and returning a pandas DataFrame.
- Returns:
The clean data from your Core.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the clean data file was not properly initialized.
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.clean pd.DataFrame([[1, 2, 3], [4, 5, 6]])
- property data¶
Returns the raw data in your Core.
Handles .csv, .xlsx, and .pkl file types, and returns pandas DataFrame.
- Parameters:
None
- Returns:
The raw data in your Core.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the raw data file was not properly initialized.
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.data pd.DataFrame([[1, 2, 3], [4, 5, 6]])
- get_clean_path()[source]¶
Get path to the associated clean file.
- Returns:
Path to clean data.
- Return type:
str
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.get_clean_path() "/path/to/clean.pkl"
- get_data_path()[source]¶
Get path to the user’s raw data file.
- Returns:
Path to raw data.
- Return type:
str
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.get_data_path() "/path/to/raw/data.csv"
- get_projection_path()[source]¶
Get path to the associated projection file.
- Returns:
Path to projection data.
- Return type:
str
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.get_projection_path() "/path/to/projection.pkl"
- property projection¶
Returns the projected data from your Core.
This method handles .pkl files, reading in the clean data file and returning a numpy array.
- Returns:
The projected data from your Core.
- Return type:
np.ndarray
- Raises:
ValueError – If the projection data file was not properly initialized.
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.projection np.array([[1, 2, 3], [4, 5, 6]])
- set_clean_path(path)[source]¶
Sets the clean data path to a new file.
- Parameters:
path (str) – Path to new clean data file.
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.clean pd.DataFrame([[1, 2, 3], [4, 5, 6]]) >>> core.set_clean_path("/path/to/new/clean.pkl") >>> core.get_clean_path() "/path/to/new/clean.pkl" >>> core.clean pd.DataFrame([[7, 8, 9], [10, 11, 12]])
- set_data_path(path)[source]¶
Sets the data path to a new raw file.
- Parameters:
path (str) – Path to new raw data file.
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.data pd.DataFrame([[1, 2, 3], [4, 5, 6]]) >>> core.set_data_path("/path/to/new/data.csv") >>> core.get_data_path() "/path/to/new/data.csv" >>> core.data pd.DataFrame([[7, 8, 9], [10, 11, 12]])
- set_projection_path(path)[source]¶
Sets the projection data path to a new file.
- Parameters:
path (str) – Path to new projection data file.
Examples
>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl", "/path/to/projection.pkl") >>> core.projection np.array([[1, 2, 3], [4, 5, 6]]) >>> core.set_projection_path("/path/to/new/projection.pkl") >>> core.get_projection_path() "/path/to/new/projection.pkl" >>> core.projection np.array([[7, 8, 9], [10, 11, 12]])