Standard Analysis module#

class standard_analysis.StandardExperiment#

Bases: object

Class for analyzing experimental data, comparing filtered vs unfiltered approaches. Processes asset data, calculates statistics, and generates visualization heatmaps.

calculate_averages(asset_data: Dict[str, DataFrame]) → DefaultDict[str, DefaultDict[str, float64]]#

Calculate average values for each asset across all samples.

Args:: asset_data: Dictionary mapping task names to DataFrames with asset data
Returns:: Nested dictionary with task names, asset names, and their average values

clean_up_asset_name(asset_name: str) → str#

Remove suffixes and map to cleaned names.

This function removes the _no_htl and _random extension from the asset. For example: AutoFilter_Chen_Like_no_htl would be transformed into AutoFilter_Chen_Like.

Args:: asset_name: The name of the asset to be (potentially) changed
Returns:: The cleaned asset name

collect_asset_paths(asset_paths: Path | None = None) → DefaultDict[str, List]#

Collect asset paths organized by task name.

Args:

asset_paths: Path to the directory containing assets.: Defaults to BASE_PATH/cache/assets/COMET_WORKSPACE.

Returns:

A dictionary mapping task names to lists of asset paths.

create_comparison_df(data: DefaultDict[str, DefaultDict[str, float64]], filter_condition: Callable[[str], bool]) → DataFrame#

Create a DataFrame comparing filtered vs unfiltered approaches.

Args:: data: Nested dictionary with task names, asset names, and their values filter_condition: Function to determine which assets to include
Returns:: DataFrame with mean F1-Score differences between filtered and unfiltered approaches

create_heatmap(data: DataFrame, ax, title: str)#

Create a heatmap visualization of comparison data.

Args:: data: DataFrame containing comparison data ax: Matplotlib axis to plot on title: Title for the heatmap

filter_no_htl(asset_name: str) → bool#

Check if an asset name has the ‘_no_htl’ suffix.

Used to compare No HTL with HTL.

Args:: asset_name: The asset name to check
Returns:: True if the asset ends with _no_htl, False otherwise

filter_random(asset_name: str) → bool#

Check if an asset name has the ‘_random’ suffix.

Used to compare Random (Filled Up) with HTL.

Args:: asset_name: The asset name to check
Returns:: True if the asset ends with _random, False otherwise

load_asset_data(workspace_data: DefaultDict[str, List[Path]]) → Dict[str, DataFrame]#

Load asset data from files into pandas DataFrames.

Args:: workspace_data: Dictionary mapping task names to lists of asset directory paths.
Returns:: Dictionary mapping task names to DataFrames containing loaded asset data.

prepare_data()#

Load and prepare data for analysis.

Returns:: Summarized data with average values for each asset and task

run()#: Main execution method. Prepares data, creates visualizations, and saves results.

save_visualization(filename: str, format: str = 'pdf', dpi: int = 300)#

Save visualization to file.

Args:: filename: Base filename for the output format: File format (pdf, png, etc.) dpi: Resolution in dots per inch

transform_into_mean_difference(asset_df: DataFrame) → DataFrame#

Transform data to show percentage differences relative to HTL baseline.

Args:: asset_df: DataFrame containing asset values
Returns:: DataFrame with values transformed to percentage differences

Standard Analysis module#

This Page