Data Module¶
The data
module provides functions for loading coal plant datasets and network graphs.
- retire.data.data.load_dataset()[source]¶
Load the US coal plants dataset from the package resources.
- Returns:
Complete US coal plants dataset containing plant characteristics, retirement status, contextual vulnerabilities, and associated metadata. Includes columns for plant location, capacity, age, retirement planning, economic factors, and environmental considerations.
- Return type:
- Raises:
FileNotFoundError – If the dataset file does not exist at the specified path.
pd.errors.EmptyDataError – If the CSV file is empty.
pd.errors.ParserError – If the CSV file cannot be parsed.
Examples
>>> from retire.data import load_dataset >>> df = load_dataset() >>> print(df.shape) (914, 45) >>> print(df.columns[:5].tolist()) ['Plant Name', 'ORISPL', 'State', 'County', 'LAT']
- retire.data.data.load_clean_dataset()[source]¶
Load the cleaned and scaled US coal plant dataset.
This dataset has undergone preprocessing including missing value imputation, feature scaling, and normalization for use in machine learning models and statistical analysis.
- Returns:
Cleaned and scaled coal plant dataset with standardized numerical features and processed categorical variables. All features are normalized to facilitate clustering and similarity analysis.
- Return type:
Examples
>>> from retire.data import load_clean_dataset >>> clean_df = load_clean_dataset() >>> print(clean_df.dtypes.value_counts()) float64 42 int64 3 dtype: int64
- retire.data.data.load_projection()[source]¶
Load the projected US coal plant dataset with future scenario modeling.
This dataset contains projections and forecasts for coal plant operations under various policy and economic scenarios, including retirement timing predictions and capacity factor estimates.
- Returns:
Projected coal plant dataset with scenario-based forecasts for retirement timing, capacity utilization, and economic viability under different policy environments.
- Return type:
Examples
>>> from retire.data import load_projection >>> proj_df = load_projection() >>> scenario_cols = [col for col in proj_df.columns if 'scenario' in col.lower()] >>> print(f"Available scenarios: {len(scenario_cols)}")
- retire.data.data.load_graph()[source]¶
Load the coal plant network graph from package resources.
Constructs a NetworkX graph representing relationships between coal plant clusters based on similarity metrics and contextual factors. Nodes represent plant clusters, and edges represent similarity relationships weighted by various plant characteristics.
- Returns:
Network graph with nodes representing coal plant clusters and edges representing similarity relationships. Node attributes include: - membership: list of plant indices belonging to the cluster - cluster_id: unique identifier for the cluster Edge attributes include: - weight: similarity strength between clusters
- Return type:
- Raises:
FileNotFoundError – If the graph node or edge CSV files do not exist.
ValueError – If the membership field cannot be parsed as a list.
Examples
>>> from retire.data import load_graph >>> G = load_graph() >>> print(f"Graph has {G.number_of_nodes()} nodes and {G.number_of_edges()} edges") Graph has 314 nodes and 1247 edges >>> # Check node attributes >>> node_attrs = list(G.nodes(data=True))[0] >>> print(f"Node attributes: {list(node_attrs[1].keys())}")
- retire.data.data.load_generator_level_dataset()[source]¶
Load the generator-level US coal plants dataset.
Provides detailed information at the individual generator unit level, including technical specifications, operational history, and retirement planning for each coal-fired generating unit in the US fleet.
- Returns:
Generator-level dataset with detailed technical and operational information for individual coal-fired generating units. Includes capacity, age, efficiency metrics, emissions data, and retirement status for each generator.
- Return type:
- Raises:
FileNotFoundError – If the dataset file does not exist at the specified path.
pd.errors.EmptyDataError – If the CSV file is empty.
pd.errors.ParserError – If the CSV file cannot be parsed.
Examples
>>> from retire.data import load_generator_level_dataset >>> gen_df = load_generator_level_dataset() >>> print(f"Total generators: {len(gen_df)}") >>> # Group by plant to see generator counts per plant >>> gens_per_plant = gen_df.groupby('ORISPL').size() >>> print(f"Average generators per plant: {gens_per_plant.mean():.1f}")
Data Loading Functions¶
Coal Plant Datasets¶
- retire.data.data.load_dataset()[source]¶
Load the US coal plants dataset from the package resources.
- Returns:
Complete US coal plants dataset containing plant characteristics, retirement status, contextual vulnerabilities, and associated metadata. Includes columns for plant location, capacity, age, retirement planning, economic factors, and environmental considerations.
- Return type:
- Raises:
FileNotFoundError – If the dataset file does not exist at the specified path.
pd.errors.EmptyDataError – If the CSV file is empty.
pd.errors.ParserError – If the CSV file cannot be parsed.
Examples
>>> from retire.data import load_dataset >>> df = load_dataset() >>> print(df.shape) (914, 45) >>> print(df.columns[:5].tolist()) ['Plant Name', 'ORISPL', 'State', 'County', 'LAT']
- retire.data.data.load_clean_dataset()[source]¶
Load the cleaned and scaled US coal plant dataset.
This dataset has undergone preprocessing including missing value imputation, feature scaling, and normalization for use in machine learning models and statistical analysis.
- Returns:
Cleaned and scaled coal plant dataset with standardized numerical features and processed categorical variables. All features are normalized to facilitate clustering and similarity analysis.
- Return type:
Examples
>>> from retire.data import load_clean_dataset >>> clean_df = load_clean_dataset() >>> print(clean_df.dtypes.value_counts()) float64 42 int64 3 dtype: int64
- retire.data.data.load_projection()[source]¶
Load the projected US coal plant dataset with future scenario modeling.
This dataset contains projections and forecasts for coal plant operations under various policy and economic scenarios, including retirement timing predictions and capacity factor estimates.
- Returns:
Projected coal plant dataset with scenario-based forecasts for retirement timing, capacity utilization, and economic viability under different policy environments.
- Return type:
Examples
>>> from retire.data import load_projection >>> proj_df = load_projection() >>> scenario_cols = [col for col in proj_df.columns if 'scenario' in col.lower()] >>> print(f"Available scenarios: {len(scenario_cols)}")
- retire.data.data.load_generator_level_dataset()[source]¶
Load the generator-level US coal plants dataset.
Provides detailed information at the individual generator unit level, including technical specifications, operational history, and retirement planning for each coal-fired generating unit in the US fleet.
- Returns:
Generator-level dataset with detailed technical and operational information for individual coal-fired generating units. Includes capacity, age, efficiency metrics, emissions data, and retirement status for each generator.
- Return type:
- Raises:
FileNotFoundError – If the dataset file does not exist at the specified path.
pd.errors.EmptyDataError – If the CSV file is empty.
pd.errors.ParserError – If the CSV file cannot be parsed.
Examples
>>> from retire.data import load_generator_level_dataset >>> gen_df = load_generator_level_dataset() >>> print(f"Total generators: {len(gen_df)}") >>> # Group by plant to see generator counts per plant >>> gens_per_plant = gen_df.groupby('ORISPL').size() >>> print(f"Average generators per plant: {gens_per_plant.mean():.1f}")
Graph and Network Data¶
- retire.data.data.load_graph()[source]¶
Load the coal plant network graph from package resources.
Constructs a NetworkX graph representing relationships between coal plant clusters based on similarity metrics and contextual factors. Nodes represent plant clusters, and edges represent similarity relationships weighted by various plant characteristics.
- Returns:
Network graph with nodes representing coal plant clusters and edges representing similarity relationships. Node attributes include: - membership: list of plant indices belonging to the cluster - cluster_id: unique identifier for the cluster Edge attributes include: - weight: similarity strength between clusters
- Return type:
- Raises:
FileNotFoundError – If the graph node or edge CSV files do not exist.
ValueError – If the membership field cannot be parsed as a list.
Examples
>>> from retire.data import load_graph >>> G = load_graph() >>> print(f"Graph has {G.number_of_nodes()} nodes and {G.number_of_edges()} edges") Graph has 314 nodes and 1247 edges >>> # Check node attributes >>> node_attrs = list(G.nodes(data=True))[0] >>> print(f"Node attributes: {list(node_attrs[1].keys())}")
Data Utilities¶
These functions help with processing and managing the datasets: