Outer System¶
The thema.multiverse.system.outer module handles dimension reduction.
- Comet is the abstract base for projectors.
- Oort runs a grid over projectors and writes results to disk.
Comet Base Class¶
- class thema.multiverse.system.outer.comet.Comet(data_path: str, clean_path: str)[source]¶
Bases:
CoreCollapse or Modify Existing Tabulars¶
A COMET is a base class template for projection (dimensionality reduction) algorithms. As a parent class, Comet enforces structure on data management and projection, enabling a ‘universal’ procedure for generating these objects.
Members¶
- datapd.DataFrame
a pandas dataframe of raw data
- cleanpd.DataFrame
a pandas dataframe of complete, encoded, and scaled data
Functions¶
- save()
saves Comet to .pkl serialized object file
See also
docssee for more information on implementing a realization of Comet
Examples
>>> from thema.multiverse.system.outer import Comet >>> class PCA(Comet): ... def fit(self): ... pass >>> pca = PCA(data_path='data.csv', clean_path='clean.csv') >>> pca.fit()
- abstract fit()[source]¶
Abstract method to be implemented by Comet’s child.
Notes
Method must initialize the projectionArray member.
- Raises:
NotImplementedError – If the method is not implemented by the child class.
- save(file_path)[source]¶
Save the current object instance to a file using pickle serialization.
- Parameters:
(str) (file_path) – here the object will be saved.
- Raises:
Exception – If the file cannot be saved.:
Examples
>>> from thema.multiverse.system.outer import Comet >>> class PCA(Comet): ... def fit(self): ... pass >>> pca = PCA(data_path='data.csv', clean_path='clean.csv') >>> pca.fit() >>> pca.save('pca.pkl')
Projectors¶
Currently supported projectors:
t-SNE (tsneProj)
PCA (pcaProj)
t-SNE¶
- thema.multiverse.system.outer.projectiles.tsneProj.initialize()[source]¶
Returns the tsneProj class object from module. This is a general method that allows us to initialize arbitrary projectile objects.
- Returns:
tsneProj – The t-SNE projectile object.
- Return type:
object
- class thema.multiverse.system.outer.projectiles.tsneProj.tsneProj(data_path, clean_path, perplexity, dimensions, seed)[source]¶
Bases:
Comett-SNE Projectile Class.
Inherits from Comet.
Projects data into lower dimensional space using the T-distributed Stochastic Neighbor Embedding. See: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
Members¶
- datapd.DataFrame
A pandas dataframe of raw data.
- cleanpd.DataFrame
A pandas dataframe of complete, encoded, and scaled data.
- projectionArraynp.array
A projection array.
- perplexityint
A tsne configuration parameter.
- dimensionsint
Number of dimensions for the embedding.
- seedint
Seed for randomization.
Functions¶
- fit()
Fits a tsne projection from given parameters and saves to projectionArray.
- save()
Saves tsneProj to .pkl serialized object file.
PCA¶
- thema.multiverse.system.outer.projectiles.pcaProj.initialize()[source]
Returns the pcaeProj class object from module. This is a general method that allows us to initialize arbitrary projectile objects.
- Returns:
pcaProj – The PCA projectile object.
- Return type:
object
- class thema.multiverse.system.outer.projectiles.pcaProj.pcaProj(data_path, clean_path, dimensions, seed)[source]
Bases:
CometPCA Projectile Class.
Inherits from Comet.
Projects data into lower dimensional space using sklearn’s PCA Projection. See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
- data
A pandas dataframe of raw data.
- Type:
pd.DataFrame
- clean
A pandas dataframe of complete, encoded, and scaled data.
- Type:
pd.DataFrame
- projectionArray
A projection array.
- Type:
np.array
- dimensions
Number of dimensions for the embedding.
- Type:
int
- seed
Seed for randomization.
- Type:
int
- __init__(self, data_path, clean_path, dimensions, seed)[source]
Constructs a pcaProj instance.
- fit(self)[source]
Performs a PCA projection based on the configuration parameters.
- save(self)
Saves pcaProj to .pkl serialized object file.
- fit()[source]
Performs a PCA projection based on the configuration parameters.
- Returns:
Initializes projectionArray member.
- Return type:
None
Oort Class¶
classDiagram
Core <|-- Comet
Comet <|-- tsneProj
Comet <|-- pcaProj
Core <|-- Oort
class Comet {
<<Abstract>>
+projectionArray
+fit()
+save()
}
class tsneProj {
+perplexity
+dimensions
+seed
+fit()
}
class pcaProj {
+dimensions
+seed
+fit()
}
class Oort {
+params
+cleanDir
+outDir
+fit()
+writeParams_toYaml()
}
Oort o--> Comet : instantiate projectiles
- class thema.multiverse.system.outer.oort.Oort(params=None, data=None, cleanDir=None, outDir=None, YAML_PATH=None, verbose=False)[source]
Bases:
CoreThe space of COMET objects.¶
The Oort cloud, sometimes called the Öpik–Oort cloud, is theorized to be a vast cloud of icy planetesimals surrounding the Sun at distances ranging from 2,000 to 200,000 AU.
Our Oort class generates a space of projected representations of an original, high dimensional dataset. Though sometimes it can be difficult to see through the cloud of projections, our tools allow you to easily navigate this terrain and properly explore your data.
- param data:
A pandas DataFrame of raw data.
- type data:
pd.DataFrame
- param params:
A parameter dictionary. Default is None.
- type params:
dict, optional
- param cleanDir:
Path to the clean data directory. Default is None.
- type cleanDir:
str, optional
- param outDir:
Path to the out data directory. Default is None.
- type outDir:
str, optional
- param YAML_PATH:
Path to the YAML parameter file. Default is None.
- type YAML_PATH:
str, optional
- data
A pandas DataFrame of raw data.
- Type:
pd.DataFrame
- params
A parameter dictionary.
- Type:
dict
- cleanDir
Path to the clean data directory.
- Type:
str
- outDir
Path to the out data directory.
- Type:
str
- YAML_PATH
Path to the YAML parameter file.
- Type:
str
- get_data_path() str
Returns the path to the raw data file.
- fit() None[source]
Fits projection space.
- save(file_path: str) None[source]
Saves object as a pickle file.
- getParams() dict[source]
Returns a dictionary of parameters.
- writeParams_toYaml(YAML_PATH: str) None[source]
Writes out the specified parameters to a YAML file.
Examples
>>> cleanDir = "<PATH TO MOON OBJECT FILES>" >>> data = "<PATH TO RAW DATA FILE>" >>> outDir = "<PATH TO OUT DIRECTORY OF PROJECTIONS>" >>> params = { ... "tsne" : { ... "perplexity" : [2, 5, 10], ... "dimensions" : [2], ... "seed" : [42] ... } ... } >>> oort = Oort( ... params=params, ... data=data, ... cleanDir=cleanDir, ... outDir=outDir, ... YAML_PATH=None ... ) >>> oort.fit()
Note
oort.fit() will produce 6 * len(os.listdir(cleanDir)) files in outDir in this example.
- fit()[source]
Configure and run your projections.
Uses the ProcessPoolExecutor library to spawn multiple projectile instances and fit them.
- Returns:
Saves projections to the specified outDir
- Return type:
None
Examples
>>> oort = Oort() >>> oort.fit()
- getParams()[source]
Get the parameters used to initialize the space of Comets in this Oort.
- Returns:
A dictionary containing the parameters used to initialize an Oort instance.
- Return type:
dict
Examples
>>> oort = Oort() >>> params = oort.getParams() >>> print(params) { "params": {...}, # dictionary containing the parameters used to initialize the Oort instance "data": "/path/to/data", # path to the data "cleanDir": True, # whether to clean the directory "outDir": "/path/to/output" # path to the output directory }
- save(file_path)[source]
Save the current object instance to a file using pickle serialization.
- Parameters:
file_path (str) – The path to the file where the object will be saved.
- Raises:
IOError – If there is an error while saving the object to the file.
Examples
>>> obj = MyClass() >>> obj.save("data.pkl") # Save the object to a file named "data.pkl"
- writeParams_toYaml(YAML_PATH=None)[source]
Write out the specified parameters to a YAML type file.
- Parameters:
YAML_PATH (str (filepath), optional) – The path to an existing .yaml type file. If not provided, the value of self.YAML_PATH will be used. If self.YAML_PATH is also None, a ValueError will be raised.
- Returns:
Saves a yaml file to the specified YAML_PATH.
- Return type:
None
- Raises:
ValueError – If YAML_PATH is None and self.YAML_PATH is also None.
TypeError – If the file path specified by YAML_PATH does not point to a YAML file.
Examples
Example usage of writeParams_toYaml:
>>> oort = Oort() >>> oort.writeParams_toYaml('/path/to/params.yaml') YAML file successfully updated