Skip to content

Utilities

Helper functions and utilities.

Module Reference

ascicat/utils.py Utility functions for ASCICat package Helper functions for data manipulation, validation, and formatting

Functions

format_catalyst_name(row)

Format catalyst name from composition

PARAMETER DESCRIPTION
row

Data row containing composition info

TYPE: Series

RETURNS DESCRIPTION
str

Formatted catalyst name

format_surface(row)

Format surface description

PARAMETER DESCRIPTION
row

Data row containing surface info

TYPE: Series

RETURNS DESCRIPTION
str

Formatted surface description

calculate_distance_from_optimal(delta_E, optimal_E)

Calculate deviation from optimal binding energy

PARAMETER DESCRIPTION
delta_E

Adsorption energy (eV)

TYPE: float or array - like

optimal_E

Optimal binding energy (eV)

TYPE: float

RETURNS DESCRIPTION
float or ndarray

Absolute deviation from optimum (eV)

normalize_scores(scores)

Min-max normalize scores to [0, 1]

PARAMETER DESCRIPTION
scores

Raw scores

TYPE: Series

RETURNS DESCRIPTION
Series

Normalized scores [0, 1]

rank_by_column(df, column, ascending=False)

Rank DataFrame by specified column

PARAMETER DESCRIPTION
df

Data to rank

TYPE: DataFrame

column

Column to rank by

TYPE: str

ascending

Rank in ascending order (default: False for descending)

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
DataFrame

Ranked DataFrame with 'rank' column

filter_by_threshold(df, column, threshold, greater_than=True)

Filter DataFrame by threshold value

PARAMETER DESCRIPTION
df

Data to filter

TYPE: DataFrame

column

Column to filter on

TYPE: str

threshold

Threshold value

TYPE: float

greater_than

If True, keep values > threshold, else < threshold

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
DataFrame

Filtered DataFrame

get_pareto_front(df, objectives, maximize)

Extract Pareto front from multi-objective data

A solution is Pareto optimal if no other solution is better in all objectives simultaneously.

PARAMETER DESCRIPTION
df

Data with objective columns

TYPE: DataFrame

objectives

List of objective column names

TYPE: List[str]

maximize

Whether to maximize each objective (True) or minimize (False)

TYPE: List[bool]

RETURNS DESCRIPTION
DataFrame

Pareto optimal solutions

Examples:

>>> pareto = get_pareto_front(
...     df, 
...     objectives=['activity_score', 'cost_score'],
...     maximize=[True, True]
... )

calculate_correlation_matrix(df, columns)

Calculate correlation matrix for specified columns

PARAMETER DESCRIPTION
df

Data

TYPE: DataFrame

columns

Columns to include in correlation

TYPE: List[str]

RETURNS DESCRIPTION
DataFrame

Correlation matrix

save_to_json(data, file_path)

Save dictionary to JSON file

PARAMETER DESCRIPTION
data

Data to save

TYPE: dict

file_path

Output file path

TYPE: str

load_from_json(file_path)

Load dictionary from JSON file

PARAMETER DESCRIPTION
file_path

Input file path

TYPE: str

RETURNS DESCRIPTION
dict

Loaded data

create_metadata(results_df, config, weights)

Create metadata for ASCI results

PARAMETER DESCRIPTION
results_df

ASCI results

TYPE: DataFrame

config

Reaction configuration

TYPE: dict

weights

(w_a, w_s, w_c)

TYPE: tuple

RETURNS DESCRIPTION
dict

Metadata dictionary

format_number(value, decimals=3)

Format number for display

PARAMETER DESCRIPTION
value

Number to format

TYPE: float

decimals

Number of decimal places

TYPE: int DEFAULT: 3

RETURNS DESCRIPTION
str

Formatted string

print_table(df, columns=None, max_rows=20)

Print DataFrame as formatted table

PARAMETER DESCRIPTION
df

Data to print

TYPE: DataFrame

columns

Columns to include (default: all)

TYPE: List[str] DEFAULT: None

max_rows

Maximum rows to print

TYPE: int DEFAULT: 20

validate_file_path(file_path, must_exist=False)

Validate and convert file path

PARAMETER DESCRIPTION
file_path

File path to validate

TYPE: str

must_exist

If True, raise error if file doesn't exist

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Path

Validated Path object

RAISES DESCRIPTION
FileNotFoundError

If file doesn't exist and must_exist=True

create_output_directory(dir_path)

Create output directory if it doesn't exist

PARAMETER DESCRIPTION
dir_path

Directory path

TYPE: str

RETURNS DESCRIPTION
Path

Created directory path

get_timestamp()

Get current timestamp as string

RETURNS DESCRIPTION
str

Timestamp in ISO format

load_catalyst_data(file_path)

Load catalyst data from file.

Convenience wrapper for loading CSV data.

PARAMETER DESCRIPTION
file_path

Path to data file

TYPE: str

RETURNS DESCRIPTION
DataFrame

Loaded data

Examples:

>>> data = load_catalyst_data('data/HER_clean.csv')
>>> print(data.shape)
(200, 10)

save_results(data, file_path)

Save results to file.

PARAMETER DESCRIPTION
data

Results data

TYPE: DataFrame

file_path

Output file path

TYPE: str

Examples:

>>> save_results(results, 'output/HER_results.csv')

calculate_element_cost(element, database=None)

Get cost for a single element.

PARAMETER DESCRIPTION
element

Element symbol (e.g., 'Pt', 'Cu', 'Ni')

TYPE: str

database

Custom cost database. If None, uses default values.

TYPE: dict DEFAULT: None

RETURNS DESCRIPTION
float

Cost in $/kg

Notes

Default costs based on USGS Commodity data (2024). Values are approximate and should be updated periodically.

Examples:

>>> cost_pt = calculate_element_cost('Pt')
>>> print(f"Platinum: ${cost_pt:,.0f}/kg")
Platinum: $30,000/kg
>>> cost_cu = calculate_element_cost('Cu')
>>> print(f"Copper: ${cost_cu:.2f}/kg")
Copper: $8.50/kg

get_periodic_table_data()

Get periodic table data for common elements.

Returns comprehensive element information including: - Atomic number - Atomic mass - Element name - Common oxidation states - Electronegativity

RETURNS DESCRIPTION
dict

Dictionary with element symbols as keys, properties as values

Examples:

>>> pt_data = get_periodic_table_data()
>>> pt_info = pt_data['Pt']
>>> print(f"{pt_info['name']}: Z={pt_info['number']}, M={pt_info['mass']:.3f}")
Platinum: Z=78, M=195.084

calculate_composition_cost(composition, cost_database=None)

Calculate composition-weighted cost for alloys.

PARAMETER DESCRIPTION
composition

Dictionary of element symbols to atomic fractions Example: {'Cu': 0.7, 'Ni': 0.3}

TYPE: dict

cost_database

Custom cost database. If None, uses default values.

TYPE: dict DEFAULT: None

RETURNS DESCRIPTION
float

Composition-weighted cost in $/kg

Examples:

>>> # CuNi alloy (70% Cu, 30% Ni)
>>> cost = calculate_composition_cost({'Cu': 0.7, 'Ni': 0.3})
>>> print(f"CuNi alloy: ${cost:.2f}/kg")
CuNi alloy: $11.35/kg
>>> # PtRu alloy (50% Pt, 50% Ru)
>>> cost = calculate_composition_cost({'Pt': 0.5, 'Ru': 0.5})
>>> print(f"PtRu alloy: ${cost:,.0f}/kg")
PtRu alloy: $21,000/kg

generate_unique_labels(df, label_col='display_label')

Generate unique display labels for catalysts in ranking plots.

Creates unambiguous labels by combining chemical formula with surface facet. If duplicates still exist, adds a numerical suffix.

Format: "CuZn(211)" or "CuZn(211)#2" if still not unique

PARAMETER DESCRIPTION
df

DataFrame with catalyst data. Must contain 'symbol' column. Optionally contains 'slab_millers' for facet information.

TYPE: DataFrame

label_col

Name of the new column for unique labels (default: 'display_label')

TYPE: str DEFAULT: 'display_label'

RETURNS DESCRIPTION
DataFrame

DataFrame with added unique label column

Examples:

>>> df = generate_unique_labels(results)
>>> print(df[['symbol', 'slab_millers', 'display_label']].head())
    symbol  slab_millers  display_label
0     CuZn          211       CuZn(211)
1     CuZn          111       CuZn(111)
2     CuZn          100       CuZn(100)
3   Nb2Pt6          110     Nb2Pt6(110)
4   Nb2Pt6          110   Nb2Pt6(110)#2
Notes

This function is essential for ranking plots where the same chemical formula may appear multiple times with different surface facets or configurations.

get_display_labels(df, n_top=10)

Get unique display labels for top N catalysts.

Convenience function for visualization code.

PARAMETER DESCRIPTION
df

DataFrame with catalyst data (should be sorted by ranking)

TYPE: DataFrame

n_top

Number of top catalysts to label

TYPE: int DEFAULT: 10

RETURNS DESCRIPTION
List[str]

List of unique display labels

Examples:

>>> labels = get_display_labels(results.head(10))
>>> print(labels)
['CuZn(211)', 'AgAu(111)', 'PdZn(100)', ...]

format_scientific(value, precision=3)

Format number in scientific notation.

PARAMETER DESCRIPTION
value

Number to format

TYPE: float

precision

Number of significant figures

TYPE: int DEFAULT: 3

RETURNS DESCRIPTION
str

Formatted scientific notation string

Examples:

>>> format_scientific(0.000123)
'1.23×10⁻⁴'
>>> format_scientific(1234567)
'1.23×10⁶'

Functions

generate_unique_labels

Generate unique labels for catalysts with duplicate symbols.

from ascicat.utils import generate_unique_labels

df = generate_unique_labels(df, label_col='display_label')

sample_stratified

Stratified sampling by ASCI score.

from ascicat.utils import sample_stratified

sampled = sample_stratified(
    df,
    n_samples=2000,
    strata_col='ASCI',
    n_strata=4
)

validate_data

Validate input data format.

from ascicat.utils import validate_data

is_valid, errors = validate_data(df)
if not is_valid:
    print("Validation errors:", errors)

compute_pareto_mask

Identify Pareto-optimal points.

from ascicat.utils import compute_pareto_mask
import numpy as np

# Objectives to minimize (1 - score)
objectives = np.column_stack([
    1 - df['activity_score'],
    1 - df['stability_score'],
    1 - df['cost_score']
])

pareto_mask = compute_pareto_mask(objectives)
pareto_catalysts = df[pareto_mask]

format_results_table

Format results for display.

from ascicat.utils import format_results_table

table = format_results_table(
    results.head(10),
    columns=['symbol', 'ASCI', 'activity_score']
)
print(table)

Data Utilities

load_example_data

Load built-in example datasets.

from ascicat.utils import load_example_data

# Load HER data
her_data = load_example_data('HER')

# Load CO2RR data
co2rr_data = load_example_data('CO2RR', pathway='CO')

export_results

Export results in various formats.

from ascicat.utils import export_results

export_results(
    results,
    output_path='results.csv',
    format='csv'  # or 'xlsx', 'json'
)

Visualization Helpers

setup_figure_style

Configure matplotlib for high-quality output.

from ascicat.utils import setup_figure_style

setup_figure_style(
    font_scale=1.2,
    dpi=600
)

get_colorblind_palette

Get colorblind-safe color palette.

from ascicat.utils import get_colorblind_palette

colors = get_colorblind_palette(n_colors=5)