Thema¶
Topological Hyperparameter Evaluation and Mapping Algorithm
Thema explores model space through systematic preprocessing, dimensionality reduction, and graph construction, then selects representative models using topological distances.
Quick Links¶
Run the full pipeline in 5 minutes with minimal YAML configuration.
Complete guides from beginner to advanced customization.
Full API documentation for Planet, Oort, Galaxy classes.
What is Thema?¶
Thema generates multiple data representations through systematic hyperparameter grids, then identifies representative models using curvature-based graph distances. Instead of guessing at preprocessing choices or embedding parameters, explore the space of possibilities and let topological methods guide selection.
Pipeline Stages
Planet: Clean, encode, scale, and impute tabular data → Moon files
Oort: Generate embeddings (t-SNE, PCA) across parameter grids → Comet files
Galaxy: Build Mapper graphs, compute distances, cluster, and select representatives → Star files
Typical Workflow¶
graph LR
subgraph Input
A[params.yaml + raw dataset]
end
subgraph "Stage 1: Preprocess"
B["Planet: clean + impute"]
M1["Moon 1"]
M2["Moon 2"]
M3["Moon N"]
end
subgraph "Stage 2: Project"
C["Oort: dimensionality reduction"]
P1["t-SNE Comet"]
P2["PCA Comet"]
end
subgraph "Stage 3: Graph"
D["Galaxy: mapper graphs"]
S1["StarGraph 1"]
S2["StarGraph 2"]
end
subgraph "Stage 4: Select"
F["Filters"]
R["Representatives"]
end
A --> B
B --> M1
B --> M2
B --> M3
M1 --> C
M2 --> C
M3 --> C
C --> P1
C --> P2
P1 --> D
P2 --> D
D --> S1
D --> S2
S1 --> F
S2 --> F
F --> R
style Input fill:#f9f9f9,stroke:#999
style B fill:#D9EDF7,stroke:#31708F,stroke-width:2px
style C fill:#D9EDF7,stroke:#31708F,stroke-width:2px
style D fill:#D9EDF7,stroke:#31708F,stroke-width:2px
style F fill:#D9EDF7,stroke:#31708F,stroke-width:2px
Option 1: YAML-Driven (Recommended for most users)
from thema.thema import Thema
T = Thema("params.yaml")
T.genesis() # Runs Planet → Oort → Galaxy
print(T.selected_model_files)
Option 2: Programmatic
from thema.multiverse import Planet, Oort, Galaxy
planet = Planet(data="data.pkl", scaler="standard", ...)
planet.fit()
oort = Oort(cleanDir="./clean", params={"tsne": {...}})
oort.fit()
galaxy = Galaxy(projDir="./projections", params={"jmap": {...}})
galaxy.fit()
representatives = galaxy.collapse()
Key Features¶
- Robust Preprocessing
Multiple imputation strategies, encoding options, and scaling methods with reproducible seeds.
- Grid Search Over Embeddings
Systematic exploration of t-SNE perplexities, PCA dimensions, and projection parameters.
- Topological Graph Construction
Kepler Mapper implementation with configurable covers, clustering algorithms, and overlap parameters.
- Curvature-Based Selection
Graph distance metrics using Forman-Ricci and Ollivier-Ricci curvatures for model comparison.
- Flexible Filtering
Built-in filters (coverage, component count, graph size) plus custom filter support.
Installation¶
pip install thema
For development:
git clone https://github.com/Krv-Analytics/thema.git
cd thema
uv sync --extra dev --extra docs
Supports Python 3.10, 3.11, 3.12.
Logging¶
Enable detailed logging for debugging:
import thema
thema.enable_logging('DEBUG') # or 'INFO'
Use 'DEBUG' for verbose output or 'INFO' for standard progress messages.
Next Steps¶
Quickstart - Run your first pipeline
Getting Started - Complete tutorial
Preprocessing - Data cleaning with Planet
Embeddings - Dimensionality reduction with Oort
Graphs & Selection - Mapper graphs with Galaxy
Best Practices - Recommended workflows and troubleshooting