Analyzing Results¶

After running benchmarks with GeoBench, you'll want to analyze the results to draw meaningful conclusions. This guide explains how to interpret and analyze GeoBench results.

Result Directory Structure¶

When you run a benchmark, GeoBench creates an output directory with the following structure:

benchmark-name/
├── report.html           # HTML report with charts and tables
├── result.json           # Overall benchmark results
└── set_N/                # For each parameter set
    └── run_M/            # For each run repetition
        ├── result.json   # Individual run results
        ├── summary.json  # Run summary
        ├── [input files] # Copied input files (if applicable)
        └── [output files]# Generated output files

Understanding Result Files¶

result.json¶

The main result.json file contains:

Benchmark configuration
System information
Baseline and endline system metrics
Overall benchmark results

set_N/run_M/result.json¶

Each run-specific result.json file contains:

Run parameters
Execution time
System resource usage during execution
Start and end timestamps
Command executed
Return code and any output

set_N/run_M/summary.json¶

The summary.json files contain statistical summaries of the runs:

Minimum, maximum, and average execution times
Resource usage statistics
Standard deviations

HTML Report¶

The report.html file provides a visual representation of the benchmark results:

Overview Section: Shows benchmark configuration and system information
Performance Charts: Displays execution times across different parameter sets
Resource Usage Charts: Shows CPU, memory, and disk usage
Comparison Tables: Compares metrics across different parameter combinations
Statistical Summary: Provides statistical analysis of the benchmark results

Interpreting Benchmark Results¶

Performance Analysis¶

Execution Time: The primary metric is execution time. Look at both average times and their variability.
Scaling Behavior: Analyze how performance scales with different parameters (e.g., input sizes, core counts).
Resource Usage: Examine CPU, memory, and disk I/O to identify potential bottlenecks.
Outliers: Pay attention to runs with unusually high or low execution times.

Common Performance Patterns¶

Linear Scaling: Execution time increases linearly with input size
Diminishing Returns: Adding more cores provides less benefit after a certain point
Resource Saturation: Performance plateaus when a resource (CPU, memory, disk) is saturated
Cache Effects: Unusual performance jumps when data size exceeds cache sizes

Comparing Different Scenarios¶

To compare different benchmark scenarios:

Run multiple benchmarks with consistent configuration settings
Compare the generated HTML reports
Look for differences in execution time and resource usage
Consider creating custom comparison charts using the data from result.json files

Command-Line Analysis¶

You can use standard command-line tools to extract specific metrics from result files:

# Get execution times for all runs
jq '.duration' benchmark/*/run_*/result.json

# Get average execution time
jq '.duration' benchmark/*/run_*/result.json | awk '{ sum += $1; n++ } END { print sum / n }'

# Compare CPU usage across different parameter sets
jq '.cpu_percent' benchmark/*/run_*/result.json

Custom Analysis¶

For more advanced analysis:

Use the JSON files as data sources for custom scripts
Import the data into tools like Python with pandas and matplotlib
Create custom visualizations to highlight specific aspects of performance
Combine results from multiple benchmark runs

Example Analysis Script¶

Here's a simple Python script to analyze multiple benchmark results:

import json
import glob
import pandas as pd
import matplotlib.pyplot as plt
import os

# Load all result files
result_files = glob.glob('benchmark/*/run_*/result.json')
results = []

for file in result_files:
    parts = file.split('/')
    set_id = int(parts[-3].split('_')[1])
    run_id = int(parts[-2].split('_')[1])

    with open(file, 'r') as f:
        data = json.load(f)

    # Extract parameters from the path
    data['set_id'] = set_id
    data['run_id'] = run_id

    results.append(data)

# Convert to DataFrame
df = pd.DataFrame(results)

# Create charts
plt.figure(figsize=(10, 6))
df.groupby('set_id')['duration'].mean().plot(kind='bar')
plt.title('Average Execution Time by Parameter Set')
plt.xlabel('Parameter Set')
plt.ylabel('Time (s)')
plt.savefig('execution_time.png')

Best Practices for Result Analysis¶

Run multiple repetitions to get statistically significant results
Control for external factors (background processes, system load)
Compare similar configurations to isolate the impact of specific parameters
Look for anomalies that might indicate measurement errors or system issues
Validate findings by rerunning important benchmarks
Document the context of your benchmark (hardware, software versions)
Share all result files when comparing with others, not just summary statistics