Jupyter Notebook Benchmarking¶
GeoBench supports benchmarking directly within Jupyter notebooks, allowing you to benchmark Python code interactively. This is particularly useful for data science workflows, geospatial analysis, and exploratory benchmarking where you want to iterate and test different approaches.
Overview¶
Jupyter notebook benchmarking in GeoBench provides:
- Interactive benchmarking within Jupyter notebook cells
- Real-time monitoring of system resources during execution
- Automatic report generation with performance metrics
- Flexible API with multiple usage patterns
- Integration with existing notebook workflows
Usage Methods¶
There are three ways to use Jupyter benchmarking functionality in GeoBench:
1. Using the Geobench Class¶
The Geobench class provides full control over the benchmarking process:
from geobench import Geobench
# Create a benchmark instance
bench = Geobench(
name="my-benchmark",
outdir="results",
run_monitor=2.0, # Monitor every 2 seconds
clean=True # Clean output directory before running
)
# Start benchmarking
bench.start("my-function")
# Run your code to benchmark
result = my_function()
# Finish benchmarking
bench.stop(True) # Pass True for success, False for failure
# Generate HTML report
bench.generate_report()
Parameters¶
name: Name of the benchmark (used for output directory and report)outdir: Output directory for results and reportsrun_monitor: Monitoring interval in seconds (optional)clean: Whether to clean the output directory before running
Methods¶
start(operation_name): Start monitoring and timing for an operationstop(success): Stop monitoring and record the resultgenerate_report(): Generate an HTML report with results
2. Using the Benchmark Decorator¶
The decorator provides a simpler way to benchmark individual functions:
from geobench import geobench
@geobench(name="my-function-benchmark", outdir="results", clean=True)
def my_function():
# Your code here
import time
time.sleep(2) # Simulate some work
return "result"
# Call the decorated function
result = my_function()
The decorator automatically: - Creates a benchmark instance - Starts monitoring before function execution - Stops monitoring after function completion - Generates a report with the results
Decorator Parameters¶
name: Name of the benchmarkoutdir: Output directory for resultsclean: Whether to clean output directoryrun_monitor: Monitoring interval in seconds
3. Using Context Manager (Alternative Pattern)¶
You can also use the Geobench class as a context manager:
from geobench import Geobench
with Geobench(name="context-benchmark", outdir="results") as bench:
bench.start("processing")
# Your code here
result = process_data()
bench.stop(True)
Example: Benchmarking Geospatial Operations¶
Here's a complete example of benchmarking a geospatial operation in a Jupyter notebook:
# Cell 1: Import libraries
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
from geobench import geobench
# Cell 2: Create sample data
def create_sample_data(n_points=10000):
"""Create sample point data for benchmarking"""
import numpy as np
# Generate random points
x = np.random.uniform(-180, 180, n_points)
y = np.random.uniform(-90, 90, n_points)
# Create GeoDataFrame
geometry = [Point(xi, yi) for xi, yi in zip(x, y)]
gdf = gpd.GeoDataFrame({'id': range(n_points)}, geometry=geometry)
gdf.crs = "EPSG:4326"
return gdf
# Cell 3: Benchmark buffer operation
@geobench(name="buffer-operation", outdir="benchmark_results")
def benchmark_buffer_operation():
# Create sample data
gdf = create_sample_data(50000)
# Perform buffer operation
buffered = gdf.to_crs("EPSG:3857").buffer(1000) # 1km buffer
return len(buffered)
# Cell 4: Run the benchmark
result = benchmark_buffer_operation()
print(f"Processed {result} features")
Advanced Example: Parameter Sweeping¶
You can manually implement parameter sweeping in Jupyter notebooks:
from geobench import Geobench
import time
# Parameters to test
buffer_distances = [100, 500, 1000, 5000]
point_counts = [1000, 5000, 10000]
# Run benchmarks for different parameter combinations
for n_points in point_counts:
for distance in buffer_distances:
bench_name = f"buffer-{n_points}pts-{distance}m"
bench = Geobench(
name=bench_name,
outdir="parameter_sweep_results",
clean=True
)
bench.start(f"buffer_{distance}m")
# Create data and run operation
gdf = create_sample_data(n_points)
buffered = gdf.to_crs("EPSG:3857").buffer(distance)
bench.stop(True)
bench.generate_report()
print(f"Completed: {n_points} points, {distance}m buffer")
Understanding the Output¶
When running Jupyter benchmarks, GeoBench creates:
- Output directory: Named after your benchmark
- Performance data: JSON files with timing and resource usage metrics
- HTML reports: Visual reports showing performance characteristics
- System monitoring data: CPU, memory, and I/O usage during execution
Report Contents¶
The generated HTML reports include:
- Execution summary: Total time, success/failure status
- Resource usage graphs: CPU, memory, disk I/O over time
- Performance metrics: Min, max, average resource usage
- System information: Hardware and software environment details
Best Practices¶
1. Meaningful Benchmark Names¶
Use descriptive names that include key parameters:
bench_name = f"ndvi-calculation-{tile_size}x{tile_size}-{band_count}bands"
2. Clean Output Directories¶
Set clean=True to ensure fresh results:
@geobench(name="my-test", clean=True)
def my_function():
# ...
3. Handle Errors Gracefully¶
When using the class directly, handle potential errors:
bench = Geobench(name="error-handling-example")
bench.start("risky-operation")
try:
result = risky_operation()
bench.stop(True)
except Exception as e:
print(f"Error: {e}")
bench.stop(False)
4. Monitor Resource Usage¶
Use appropriate monitoring intervals based on operation duration:
# For long operations (>1 minute)
bench = Geobench(name="long-op", run_monitor=5.0)
# For short operations (<30 seconds)
bench = Geobench(name="short-op", run_monitor=0.5)
Integration with Existing Workflows¶
Jupyter benchmarking integrates seamlessly with existing data science and geospatial workflows:
- Data exploration: Benchmark different data loading strategies
- Algorithm comparison: Compare performance of different algorithms
- Parameter optimization: Find optimal parameters for processing
- Scalability testing: Test performance with different data sizes
- Resource profiling: Understand resource requirements for operations
This makes GeoBench's Jupyter support ideal for research, development, and optimization of geospatial processing workflows.