Standard Analysis module#
- class standard_analysis.StandardExperiment#
Bases:
object
Class for analyzing experimental data, comparing filtered vs unfiltered approaches. Processes asset data, calculates statistics, and generates visualization heatmaps.
- calculate_averages(asset_data: Dict[str, DataFrame]) DefaultDict[str, DefaultDict[str, float64]] #
Calculate average values for each asset across all samples.
- Args:
asset_data: Dictionary mapping task names to DataFrames with asset data
- Returns:
Nested dictionary with task names, asset names, and their average values
- clean_up_asset_name(asset_name: str) str #
Remove suffixes and map to cleaned names.
This function removes the _no_htl and _random extension from the asset. For example: AutoFilter_Chen_Like_no_htl would be transformed into AutoFilter_Chen_Like.
- Args:
asset_name: The name of the asset to be (potentially) changed
- Returns:
The cleaned asset name
- collect_asset_paths(asset_paths: Path | None = None) DefaultDict[str, List] #
Collect asset paths organized by task name.
- Args:
- asset_paths: Path to the directory containing assets.
Defaults to BASE_PATH/cache/assets/COMET_WORKSPACE.
- Returns:
A dictionary mapping task names to lists of asset paths.
- create_comparison_df(data: DefaultDict[str, DefaultDict[str, float64]], filter_condition: Callable[[str], bool]) DataFrame #
Create a DataFrame comparing filtered vs unfiltered approaches.
- Args:
data: Nested dictionary with task names, asset names, and their values filter_condition: Function to determine which assets to include
- Returns:
DataFrame with mean F1-Score differences between filtered and unfiltered approaches
- create_heatmap(data: DataFrame, ax, title: str)#
Create a heatmap visualization of comparison data.
- Args:
data: DataFrame containing comparison data ax: Matplotlib axis to plot on title: Title for the heatmap
- filter_no_htl(asset_name: str) bool #
Check if an asset name has the ‘_no_htl’ suffix.
Used to compare No HTL with HTL.
- Args:
asset_name: The asset name to check
- Returns:
True if the asset ends with _no_htl, False otherwise
- filter_random(asset_name: str) bool #
Check if an asset name has the ‘_random’ suffix.
Used to compare Random (Filled Up) with HTL.
- Args:
asset_name: The asset name to check
- Returns:
True if the asset ends with _random, False otherwise
- load_asset_data(workspace_data: DefaultDict[str, List[Path]]) Dict[str, DataFrame] #
Load asset data from files into pandas DataFrames.
- Args:
workspace_data: Dictionary mapping task names to lists of asset directory paths.
- Returns:
Dictionary mapping task names to DataFrames containing loaded asset data.
- prepare_data()#
Load and prepare data for analysis.
- Returns:
Summarized data with average values for each asset and task
- run()#
Main execution method. Prepares data, creates visualizations, and saves results.
- save_visualization(filename: str, format: str = 'pdf', dpi: int = 300)#
Save visualization to file.
- Args:
filename: Base filename for the output format: File format (pdf, png, etc.) dpi: Resolution in dots per inch
- transform_into_mean_difference(asset_df: DataFrame) DataFrame #
Transform data to show percentage differences relative to HTL baseline.
- Args:
asset_df: DataFrame containing asset values
- Returns:
DataFrame with values transformed to percentage differences