Standard Analysis module#

class standard_analysis.StandardExperiment#

Bases: object

Class for analyzing experimental data, comparing filtered vs unfiltered approaches. Processes asset data, calculates statistics, and generates visualization heatmaps.

calculate_averages(asset_data: Dict[str, DataFrame]) DefaultDict[str, DefaultDict[str, float64]]#

Calculate average values for each asset across all samples.

Args:

asset_data: Dictionary mapping task names to DataFrames with asset data

Returns:

Nested dictionary with task names, asset names, and their average values

clean_up_asset_name(asset_name: str) str#

Remove suffixes and map to cleaned names.

This function removes the _no_htl and _random extension from the asset. For example: AutoFilter_Chen_Like_no_htl would be transformed into AutoFilter_Chen_Like.

Args:

asset_name: The name of the asset to be (potentially) changed

Returns:

The cleaned asset name

collect_asset_paths(asset_paths: Path | None = None) DefaultDict[str, List]#

Collect asset paths organized by task name.

Args:
asset_paths: Path to the directory containing assets.

Defaults to BASE_PATH/cache/assets/COMET_WORKSPACE.

Returns:

A dictionary mapping task names to lists of asset paths.

create_comparison_df(data: DefaultDict[str, DefaultDict[str, float64]], filter_condition: Callable[[str], bool]) DataFrame#

Create a DataFrame comparing filtered vs unfiltered approaches.

Args:

data: Nested dictionary with task names, asset names, and their values filter_condition: Function to determine which assets to include

Returns:

DataFrame with mean F1-Score differences between filtered and unfiltered approaches

create_heatmap(data: DataFrame, ax, title: str)#

Create a heatmap visualization of comparison data.

Args:

data: DataFrame containing comparison data ax: Matplotlib axis to plot on title: Title for the heatmap

filter_no_htl(asset_name: str) bool#

Check if an asset name has the ‘_no_htl’ suffix.

Used to compare No HTL with HTL.

Args:

asset_name: The asset name to check

Returns:

True if the asset ends with _no_htl, False otherwise

filter_random(asset_name: str) bool#

Check if an asset name has the ‘_random’ suffix.

Used to compare Random (Filled Up) with HTL.

Args:

asset_name: The asset name to check

Returns:

True if the asset ends with _random, False otherwise

load_asset_data(workspace_data: DefaultDict[str, List[Path]]) Dict[str, DataFrame]#

Load asset data from files into pandas DataFrames.

Args:

workspace_data: Dictionary mapping task names to lists of asset directory paths.

Returns:

Dictionary mapping task names to DataFrames containing loaded asset data.

prepare_data()#

Load and prepare data for analysis.

Returns:

Summarized data with average values for each asset and task

run()#

Main execution method. Prepares data, creates visualizations, and saves results.

save_visualization(filename: str, format: str = 'pdf', dpi: int = 300)#

Save visualization to file.

Args:

filename: Base filename for the output format: File format (pdf, png, etc.) dpi: Resolution in dots per inch

transform_into_mean_difference(asset_df: DataFrame) DataFrame#

Transform data to show percentage differences relative to HTL baseline.

Args:

asset_df: DataFrame containing asset values

Returns:

DataFrame with values transformed to percentage differences