Utilities¶
Helper functions and utilities.
Module Reference¶
ascicat/utils.py Utility functions for ASCICat package Helper functions for data manipulation, validation, and formatting
Functions¶
format_catalyst_name(row) ¶
Format catalyst name from composition
| PARAMETER | DESCRIPTION |
|---|---|
row | Data row containing composition info TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | Formatted catalyst name |
format_surface(row) ¶
Format surface description
| PARAMETER | DESCRIPTION |
|---|---|
row | Data row containing surface info TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | Formatted surface description |
calculate_distance_from_optimal(delta_E, optimal_E) ¶
Calculate deviation from optimal binding energy
| PARAMETER | DESCRIPTION |
|---|---|
delta_E | Adsorption energy (eV) TYPE: |
optimal_E | Optimal binding energy (eV) TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
float or ndarray | Absolute deviation from optimum (eV) |
normalize_scores(scores) ¶
Min-max normalize scores to [0, 1]
| PARAMETER | DESCRIPTION |
|---|---|
scores | Raw scores TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Series | Normalized scores [0, 1] |
rank_by_column(df, column, ascending=False) ¶
Rank DataFrame by specified column
| PARAMETER | DESCRIPTION |
|---|---|
df | Data to rank TYPE: |
column | Column to rank by TYPE: |
ascending | Rank in ascending order (default: False for descending) TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame | Ranked DataFrame with 'rank' column |
filter_by_threshold(df, column, threshold, greater_than=True) ¶
Filter DataFrame by threshold value
| PARAMETER | DESCRIPTION |
|---|---|
df | Data to filter TYPE: |
column | Column to filter on TYPE: |
threshold | Threshold value TYPE: |
greater_than | If True, keep values > threshold, else < threshold TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame | Filtered DataFrame |
get_pareto_front(df, objectives, maximize) ¶
Extract Pareto front from multi-objective data
A solution is Pareto optimal if no other solution is better in all objectives simultaneously.
| PARAMETER | DESCRIPTION |
|---|---|
df | Data with objective columns TYPE: |
objectives | List of objective column names TYPE: |
maximize | Whether to maximize each objective (True) or minimize (False) TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame | Pareto optimal solutions |
Examples:
calculate_correlation_matrix(df, columns) ¶
Calculate correlation matrix for specified columns
| PARAMETER | DESCRIPTION |
|---|---|
df | Data TYPE: |
columns | Columns to include in correlation TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame | Correlation matrix |
save_to_json(data, file_path) ¶
Save dictionary to JSON file
| PARAMETER | DESCRIPTION |
|---|---|
data | Data to save TYPE: |
file_path | Output file path TYPE: |
load_from_json(file_path) ¶
Load dictionary from JSON file
| PARAMETER | DESCRIPTION |
|---|---|
file_path | Input file path TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
dict | Loaded data |
create_metadata(results_df, config, weights) ¶
Create metadata for ASCI results
| PARAMETER | DESCRIPTION |
|---|---|
results_df | ASCI results TYPE: |
config | Reaction configuration TYPE: |
weights | (w_a, w_s, w_c) TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
dict | Metadata dictionary |
format_number(value, decimals=3) ¶
Format number for display
| PARAMETER | DESCRIPTION |
|---|---|
value | Number to format TYPE: |
decimals | Number of decimal places TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | Formatted string |
print_table(df, columns=None, max_rows=20) ¶
Print DataFrame as formatted table
| PARAMETER | DESCRIPTION |
|---|---|
df | Data to print TYPE: |
columns | Columns to include (default: all) TYPE: |
max_rows | Maximum rows to print TYPE: |
validate_file_path(file_path, must_exist=False) ¶
Validate and convert file path
| PARAMETER | DESCRIPTION |
|---|---|
file_path | File path to validate TYPE: |
must_exist | If True, raise error if file doesn't exist TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Path | Validated Path object |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError | If file doesn't exist and must_exist=True |
create_output_directory(dir_path) ¶
Create output directory if it doesn't exist
| PARAMETER | DESCRIPTION |
|---|---|
dir_path | Directory path TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Path | Created directory path |
get_timestamp() ¶
Get current timestamp as string
| RETURNS | DESCRIPTION |
|---|---|
str | Timestamp in ISO format |
load_catalyst_data(file_path) ¶
save_results(data, file_path) ¶
calculate_element_cost(element, database=None) ¶
Get cost for a single element.
| PARAMETER | DESCRIPTION |
|---|---|
element | Element symbol (e.g., 'Pt', 'Cu', 'Ni') TYPE: |
database | Custom cost database. If None, uses default values. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
float | Cost in $/kg |
Notes
Default costs based on USGS Commodity data (2024). Values are approximate and should be updated periodically.
Examples:
get_periodic_table_data() ¶
Get periodic table data for common elements.
Returns comprehensive element information including: - Atomic number - Atomic mass - Element name - Common oxidation states - Electronegativity
| RETURNS | DESCRIPTION |
|---|---|
dict | Dictionary with element symbols as keys, properties as values |
Examples:
calculate_composition_cost(composition, cost_database=None) ¶
Calculate composition-weighted cost for alloys.
| PARAMETER | DESCRIPTION |
|---|---|
composition | Dictionary of element symbols to atomic fractions Example: {'Cu': 0.7, 'Ni': 0.3} TYPE: |
cost_database | Custom cost database. If None, uses default values. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
float | Composition-weighted cost in $/kg |
Examples:
generate_unique_labels(df, label_col='display_label') ¶
Generate unique display labels for catalysts in ranking plots.
Creates unambiguous labels by combining chemical formula with surface facet. If duplicates still exist, adds a numerical suffix.
Format: "CuZn(211)" or "CuZn(211)#2" if still not unique
| PARAMETER | DESCRIPTION |
|---|---|
df | DataFrame with catalyst data. Must contain 'symbol' column. Optionally contains 'slab_millers' for facet information. TYPE: |
label_col | Name of the new column for unique labels (default: 'display_label') TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame | DataFrame with added unique label column |
Examples:
>>> df = generate_unique_labels(results)
>>> print(df[['symbol', 'slab_millers', 'display_label']].head())
symbol slab_millers display_label
0 CuZn 211 CuZn(211)
1 CuZn 111 CuZn(111)
2 CuZn 100 CuZn(100)
3 Nb2Pt6 110 Nb2Pt6(110)
4 Nb2Pt6 110 Nb2Pt6(110)#2
Notes
This function is essential for ranking plots where the same chemical formula may appear multiple times with different surface facets or configurations.
get_display_labels(df, n_top=10) ¶
Get unique display labels for top N catalysts.
Convenience function for visualization code.
| PARAMETER | DESCRIPTION |
|---|---|
df | DataFrame with catalyst data (should be sorted by ranking) TYPE: |
n_top | Number of top catalysts to label TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
List[str] | List of unique display labels |
Examples:
format_scientific(value, precision=3) ¶
Format number in scientific notation.
| PARAMETER | DESCRIPTION |
|---|---|
value | Number to format TYPE: |
precision | Number of significant figures TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | Formatted scientific notation string |
Examples:
Functions¶
generate_unique_labels¶
Generate unique labels for catalysts with duplicate symbols.
from ascicat.utils import generate_unique_labels
df = generate_unique_labels(df, label_col='display_label')
sample_stratified¶
Stratified sampling by ASCI score.
from ascicat.utils import sample_stratified
sampled = sample_stratified(
df,
n_samples=2000,
strata_col='ASCI',
n_strata=4
)
validate_data¶
Validate input data format.
from ascicat.utils import validate_data
is_valid, errors = validate_data(df)
if not is_valid:
print("Validation errors:", errors)
compute_pareto_mask¶
Identify Pareto-optimal points.
from ascicat.utils import compute_pareto_mask
import numpy as np
# Objectives to minimize (1 - score)
objectives = np.column_stack([
1 - df['activity_score'],
1 - df['stability_score'],
1 - df['cost_score']
])
pareto_mask = compute_pareto_mask(objectives)
pareto_catalysts = df[pareto_mask]
format_results_table¶
Format results for display.
from ascicat.utils import format_results_table
table = format_results_table(
results.head(10),
columns=['symbol', 'ASCI', 'activity_score']
)
print(table)
Data Utilities¶
load_example_data¶
Load built-in example datasets.
from ascicat.utils import load_example_data
# Load HER data
her_data = load_example_data('HER')
# Load CO2RR data
co2rr_data = load_example_data('CO2RR', pathway='CO')
export_results¶
Export results in various formats.
from ascicat.utils import export_results
export_results(
results,
output_path='results.csv',
format='csv' # or 'xlsx', 'json'
)
Visualization Helpers¶
setup_figure_style¶
Configure matplotlib for high-quality output.
get_colorblind_palette¶
Get colorblind-safe color palette.