THEMA: Core

The Core class is a data container class used for modeling in THEMA, working in the background so users don’t have to. It provides access to three versions of the user’s data: raw data, cleaned data, and projected data. This class is inherited by almost everything in THEMA, providing a central data handling mechanism to manage data in the background.


Core Class

class thema.core.Core(data_path, clean_path, projection_path)[source]

Bases: object

A data container class for the various versions needed when modeling using the Mapper algorithm.

This class points to the locations of three local versions of the user’s data: 1) data: raw data pulled directly from a database (e.g. Mongo),

downloaded, or collected locally.

  1. clean: data that has been cleaned via dropping features, scaling,

    removing NaNs, etc.

  2. projection: data that has been collapsed using a dimensionality

    reduction technique (e.g. PCA, UMAP).

Parameters:
  • data_path (str) – A path to raw data pickle file (relative from root).

  • clean_path (str) – A path to clean data pickle file (relative from root).

  • projection_path (str) – A path to projected data pickle file (relative from root).

_data

The path to the raw data pickle file.

Type:

str

_clean

The path to the clean data pickle file.

Type:

str

_projection

The path to the projected data pickle file.

Type:

str

data()

Returns the raw data in your Core.

clean()

Get the clean data in your Core.

projection()

Get the projected data in your Core.

get_data_path()[source]

Returns the path to the raw data file.

get_clean_path()[source]

Returns the path to the clean data file.

get_projection_path()[source]

Returns the path to the projection data file.

set_data_path(path)[source]

Sets the raw data path to a new data file.

set_clean_path(path)[source]

Sets the clean data path to a new data file.

set_projection_path(path)[source]

Sets the projection data path to a new data file.

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",     "/path/to/projection.pkl")
>>> core.data
pd.DataFrame([[1, 2, 3], [4, 5, 6]])
>>> core.clean
pd.DataFrame([[1, 2, 3], [4, 5, 6]])
>>> core.projection
np.array([[1, 2, 3], [4, 5, 6]])
property clean

Returns the clean data from your Core.

This method handles .pkl files, reading in the clean data file and returning a pandas DataFrame.

Returns:

The clean data from your Core.

Return type:

pd.DataFrame

Raises:

ValueError – If the clean data file was not properly initialized.

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",         "/path/to/projection.pkl")
>>> core.clean
pd.DataFrame([[1, 2, 3], [4, 5, 6]])
property data

Returns the raw data in your Core.

Handles .csv, .xlsx, and .pkl file types, and returns pandas DataFrame.

Parameters:

None

Returns:

The raw data in your Core.

Return type:

pd.DataFrame

Raises:

ValueError – If the raw data file was not properly initialized.

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",         "/path/to/projection.pkl")
>>> core.data
pd.DataFrame([[1, 2, 3], [4, 5, 6]])
get_clean_path()[source]

Get path to the associated clean file.

Returns:

Path to clean data.

Return type:

str

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",         "/path/to/projection.pkl")
>>> core.get_clean_path()
"/path/to/clean.pkl"
get_data_path()[source]

Get path to the user’s raw data file.

Returns:

Path to raw data.

Return type:

str

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",         "/path/to/projection.pkl")
>>> core.get_data_path()
"/path/to/raw/data.csv"
get_projection_path()[source]

Get path to the associated projection file.

Returns:

Path to projection data.

Return type:

str

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",         "/path/to/projection.pkl")
>>> core.get_projection_path()
"/path/to/projection.pkl"
property projection

Returns the projected data from your Core.

This method handles .pkl files, reading in the clean data file and returning a numpy array.

Returns:

The projected data from your Core.

Return type:

np.ndarray

Raises:

ValueError – If the projection data file was not properly initialized.

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",         "/path/to/projection.pkl")
>>> core.projection
np.array([[1, 2, 3], [4, 5, 6]])
set_clean_path(path)[source]

Sets the clean data path to a new file.

Parameters:

path (str) – Path to new clean data file.

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",         "/path/to/projection.pkl")
>>> core.clean
pd.DataFrame([[1, 2, 3], [4, 5, 6]])
>>> core.set_clean_path("/path/to/new/clean.pkl")
>>> core.get_clean_path()
"/path/to/new/clean.pkl"
>>> core.clean
pd.DataFrame([[7, 8, 9], [10, 11, 12]])
set_data_path(path)[source]

Sets the data path to a new raw file.

Parameters:

path (str) – Path to new raw data file.

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",         "/path/to/projection.pkl")
>>> core.data
pd.DataFrame([[1, 2, 3], [4, 5, 6]])
>>> core.set_data_path("/path/to/new/data.csv")
>>> core.get_data_path()
"/path/to/new/data.csv"
>>> core.data
pd.DataFrame([[7, 8, 9], [10, 11, 12]])
set_projection_path(path)[source]

Sets the projection data path to a new file.

Parameters:

path (str) – Path to new projection data file.

Examples

>>> core = Core("/path/to/raw/data.csv", "/path/to/clean.pkl",         "/path/to/projection.pkl")
>>> core.projection
np.array([[1, 2, 3], [4, 5, 6]])
>>> core.set_projection_path("/path/to/new/projection.pkl")
>>> core.get_projection_path()
"/path/to/new/projection.pkl"
>>> core.projection
np.array([[7, 8, 9], [10, 11, 12]])