In the Python data ecosystem, Pandas has long been the de facto library for data manipulation and analysis. However, a relatively new library called Polars is gaining attention for its speed, memory efficiency, and advanced features like lazy evaluation. Polars uses Apache Arrow under the hood and is written in Rust, providing a modern columnar engine that can outperform Pandas in many scenarios.

In this article, we’ll walk through:

What Polars is and the core idea behind it
Our performance test setup
Benchmark results comparing Pandas and Polars
Conclusions and when to consider each library

What Is Polars?

Polars is a fast, multi-threaded DataFrame library designed to leverage columnar data processing. Its core advantages include:

Columnar memory layout: This design can lead to faster queries and lower memory usage.
Lazy execution: Polars can optimize query plans before actually running them.
Rust-based: Polars is written in Rust, providing performance and safety.
Arrow integration: It uses Apache Arrow memory format, making it easy to interoperate with other Arrow-based tools.

Because of these features, Polars can handle large datasets more efficiently than Pandas in many cases.

Also, it’s also pretty similar to Pandas in structure and syntax, and you can convert Polars DataFrames to Pandas (and vice versa) without much hassle.

Test Setup

Hardware & Environment

Hardware: Typical laptop/desktop with multiple cores and sufficient RAM
Software: Python 3.9+ (Polars supports older versions, too), Pandas 1.x, Polars latest version
Operating System: Tested on Linux/Mac/Windows (depending on your environment)

Data Generation

We generated synthetic data for four dataset sizes:

10,000 rows
1,000,000 rows
10,000,000 rows
100,000,000 rows

Each dataset includes:

numeric1 and numeric2: floating-point columns (Gaussian random)
integer: an integer column with random values between 0 and 99
category: a categorical column with values A, B, or C

Operations Tested

Filter: Select rows based on a condition (numeric1 > 0).
Select: Select two columns and force materialization by summing them (so we actually measure real work).
GroupBy: Group by the categorical column and compute mean, sum, and max on different numeric columns.
Join (only for data sizes below 50,000 to prevent memory issues): Perform an inner join on a generated key.

Code Overview

The code is organized into these sections:

Data generation
Operation functions (Pandas vs Polars)
Benchmark execution with a timer decorator
Results analysis and printing a summary
Visualization (plots saved in benchmark_results/plots/)

Code is included in the end

Results & Plots

Below is the final summary of the benchmark, followed by some plots illustrating the performance. Note that the join operation was skipped for data sizes larger than 50,000 rows, so it’s only reflected in the smallest dataset (10,000 rows).

Benchmark Summary

PANDAS VS POLARS BENCHMARK SUMMARY
==================================================

RESULTS FOR 10,000 ROWS
--------------------------------------------------
Operation  | Pandas (s)   | Polars (s)   | Speedup    | Winner
----------------------------------------------------------------------
Filter     | 0.001773     | 0.003143     | 0.56       | Pandas
GroupBy    | 0.005679     | 0.003496     | 1.62       | Polars
Join       | 1.969916     | 0.143235     | 13.75      | Polars
Select     | 0.003076     | 0.001214     | 2.53       | Polars
----------------------------------------------------------------------
Operations where Polars wins: 3
Operations where Pandas wins: 1

RESULTS FOR 1,000,000 ROWS
--------------------------------------------------
Operation  | Pandas (s)   | Polars (s)   | Speedup    | Winner
----------------------------------------------------------------------
Filter     | 0.027700     | 0.016381     | 1.69       | Polars
GroupBy    | 0.081032     | 0.051752     | 1.57       | Polars
Select     | 0.016906     | 0.001551     | 10.90      | Polars
----------------------------------------------------------------------
Operations where Polars wins: 3
Operations where Pandas wins: 0

RESULTS FOR 10,000,000 ROWS
--------------------------------------------------
Operation  | Pandas (s)   | Polars (s)   | Speedup    | Winner
----------------------------------------------------------------------
Filter     | 0.182068     | 0.049499     | 3.68       | Polars
GroupBy    | 0.562881     | 0.065699     | 8.57       | Polars
Select     | 0.098851     | 0.016809     | 5.88       | Polars
----------------------------------------------------------------------
Operations where Polars wins: 3
Operations where Pandas wins: 0

RESULTS FOR 100,000,000 ROWS
--------------------------------------------------
Operation  | Pandas (s)   | Polars (s)   | Speedup    | Winner
----------------------------------------------------------------------
Filter     | 2.050741     | 0.714646     | 2.87       | Polars
GroupBy    | 4.858952     | 1.035741     | 4.69       | Polars
Select     | 0.937131     | 0.097260     | 9.64       | Polars
----------------------------------------------------------------------
Operations where Polars wins: 3
Operations where Pandas wins: 0

=== OVERALL CONCLUSIONS ===
Total operations where Polars wins: 12
Total operations where Pandas wins: 1

From these numbers:

Polars consistently outperforms Pandas in GroupBy and Select operations.
Filter is sometimes faster in Pandas for small datasets, but Polars overtakes Pandas for larger data sizes.
For Join (at 10,000 rows), Polars is significantly faster.

Visualization

Below are some sample plots that illustrate these results. Each figure was generated automatically by the benchmark script.

Bar Chart for 10,000 Rows

Filter is slightly faster in Pandas (0.56× speedup vs Polars).
GroupBy, Join, Select are all faster in Polars, with Join showing a huge 13.75× advantage.

Bar Chart for 1,000,000 Rows

Filter and GroupBy are around 1.6–1.7× faster in Polars.
Select is 10.90× faster in Polars, reflecting the efficiency of columnar operations.

Bar Chart for 10,000,000 Rows

Filter is 3.68× faster in Polars.
GroupBy sees an 8.57× advantage.
Select is nearly 6× faster.

Bar Chart for 100,000,000 Rows

Filter is about 2.87× faster in Polars.
GroupBy is 4.69× faster.
Select sees a 9.64× speedup.

As we can see, Polars clearly dominates as data size grows.

Heatmap of Speedup Factors

Here, we have a heatmap where we can see the speedup factor per different task per different size ( Join only done for 10,000 datasets due to the memory restriction)

Scaling of the Select Operation

The bar charts show absolute execution times for Pandas vs Polars at each data size, while the heatmap visualizes the speedup factor (Pandas time / Polars time). The scaling chart illustrates how execution time grows with data size, often on a log scale for clarity.

Conclusion

Polars demonstrates clear performance gains over Pandas for medium and large datasets, especially for group-by and column selection. For smaller datasets (10,000 rows), Pandas can occasionally be faster in simpler operations like filtering, but Polars still shows advantages in more complex tasks like joins and large-scale aggregations.

When to Use Polars vs Pandas

Use Polars if:

You deal with large datasets (millions to hundreds of millions of rows)
You need multi-threading or lazy evaluation optimizations
You prefer or need a Rust-based, modern columnar engine

Use Pandas if:

You already have a stable Pandas-based workflow or rely on libraries deeply integrated with Pandas
Your data is relatively small, and performance is acceptable
You need Pandas-specific features or older code relies on Pandas APIs

Both libraries have their place. For Python data engineers and data scientists looking to optimize performance, Polars is an excellent addition to the toolbox, offering significant speedups and efficient memory usage at scale.

Here you can see the code if you want to modify it or run it yourself

import pandas as pd
import polars as pl
import numpy as np
import time
import os
import gc
import matplotlib.pyplot as plt
import seaborn as sns
from functools import wraps

# =========================================================================
# Pandas vs Polars Performance Benchmark Suite
# =========================================================================
# This script performs comprehensive benchmarking between Pandas and Polars,
# two popular DataFrame libraries for Python. It compares performance across
# several common operations (filtering, column selection, groupby aggregation,
# and joins) on datasets of various sizes.
#
# The benchmark:
# 1. Generates test data of configurable sizes
# 2. Times identical operations in both libraries
# 3. Calculates speedup ratios
# 4. Creates visualizations of the results
# 5. Outputs a detailed summary
#

# =========================================================================

# Create output directories if they don't exist
os.makedirs("benchmark_results", exist_ok=True)
os.makedirs("benchmark_results/plots", exist_ok=True)

# Set consistent visualization style for better readability
plt.style.use('ggplot')
sns.set_palette("Set2")  # Using a color-blind friendly palette

# Global join threshold
JOIN_THRESHOLD = 50_000

# ---------------------------------------------------------------------
# Timer Decorator for Benchmarking
# ---------------------------------------------------------------------
def timer(func):
    """
    Decorator that measures function execution time with proper memory management.
    
    Ensures accurate timing by:
    1. Forcing garbage collection before timing starts
    2. Capturing start and end times with high precision using perf_counter
    3. Forcing garbage collection after execution completes
    
    Returns:
        tuple: (function_result, execution_time_in_seconds)
    """
    @wraps(func)
    def wrapper(*args, **kwargs):
        gc.collect()
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        execution_time = end_time - start_time
        gc.collect()
        return result, execution_time
    return wrapper

# ---------------------------------------------------------------------
# Data Generation
# ---------------------------------------------------------------------
def generate_data(size):
    np.random.seed(42)
    numeric_data1 = np.random.randn(size)
    numeric_data2 = np.random.randn(size) * 10
    # Ensure the integer column is int64
    integers = np.random.randint(0, 100, size=size).astype('int64')
    categories = ['A', 'B', 'C']
    categorical_data = np.random.choice(categories, size=size)
    
    df_pandas = pd.DataFrame({
        'numeric1': numeric_data1,
        'numeric2': numeric_data2,
        'integer': integers,
        'category': categorical_data
    })
    df_polars = pl.from_pandas(df_pandas)
    
    return df_pandas, df_polars

# ---------------------------------------------------------------------
# Benchmark Operations
# ---------------------------------------------------------------------
@timer
def pandas_filter(df):
    """Filter rows in Pandas where numeric1 > 0."""
    return df[df['numeric1'] > 0]

@timer
def polars_filter(df):
    """Filter rows in Polars where numeric1 > 0."""
    return df.filter(pl.col('numeric1') > 0)

@timer
def pandas_select(df):
    """
    Select two columns in Pandas and force materialization 
    by computing a sum of those columns.
    """
    selected = df[['numeric1', 'numeric2']]
    # Force actual computation
    sums = selected.sum()
    return sums

@timer
def polars_select(df):
    """
    Select two columns in Polars and force materialization 
    by computing a sum of those columns.
    """
    selected = df.select(['numeric1', 'numeric2'])
    # Force actual computation
    sums = selected.sum()
    return sums

@timer
def pandas_groupby_agg(df):
    """Group by category and perform multiple aggregations in Pandas."""
    return df.groupby('category').agg({
        'numeric1': 'mean',
        'numeric2': 'sum',
        'integer': 'max'
    })

@timer
def polars_groupby_agg(df):
    """Group by category and perform multiple aggregations in Polars."""
    return df.group_by('category').agg([
        pl.col('numeric1').mean(),
        pl.col('numeric2').sum(),
        pl.col('integer').max()
    ])

@timer
def pandas_join(df1, df2, threshold=JOIN_THRESHOLD):
    """
    Perform a join operation in Pandas only if both DataFrames are smaller than threshold.
    
    For large datasets, the join is skipped to avoid memory issues.
    """
    if len(df1) >= threshold or len(df2) >= threshold:
        print("Skipping Pandas join: one or both DataFrames exceed the threshold size.")
        return pd.DataFrame()  # Return an empty DataFrame.
    
    df1_sample = df1.copy()
    df2_sample = df2.copy()
    df1_sample['key'] = df1_sample['integer'] % 10
    df2_sample['key'] = df2_sample['integer'] % 10
    return df1_sample.merge(df2_sample, on='key')

@timer
def polars_join(df1, df2, threshold=JOIN_THRESHOLD):
    """
    Perform a join operation in Polars only if both DataFrames are smaller than threshold.
    
    For large datasets, the join is skipped to prevent memory crashes.
    """
    if len(df1) >= threshold or len(df2) >= threshold:
        print("Skipping Polars join: one or both DataFrames exceed the threshold size.")
        return pl.DataFrame({col: [] for col in df1.columns})
    
    # Drop any pre-existing "key" column
    if "key" in df1.columns:
        df1 = df1.drop("key")
    if "key" in df2.columns:
        df2 = df2.drop("key")
    
    # Create key expression with explicit casts:
    key_expr = ((pl.col("integer").cast(pl.Int64) % pl.lit(10, dtype=pl.Int64))
                .cast(pl.Int64)
                .alias("key"))
    
    df1_sample = df1.with_columns([key_expr])
    df2_sample = df2.with_columns([key_expr])
    
    return df1_sample.join(df2_sample, on="key", how="inner")

# ---------------------------------------------------------------------
# Benchmark Execution
# ---------------------------------------------------------------------
def run_benchmarks():
    """
    Execute the full benchmark suite across multiple data sizes and operations.
    
    The benchmark compares Pandas and Polars on:
    - Filtering
    - Column selection (with forced materialization)
    - GroupBy aggregation
    - Join operations (skipped for sizes > JOIN_THRESHOLD)
    
    Returns:
        list: Results containing timing and speedup data for all operations
    """
    data_sizes = [10_000, 1_000_000, 10_000_000, 100_000_000]  # Adjust as needed.
    all_results = []
    
    for size in data_sizes:
        print(f"\n=== Running benchmarks for {size:,} rows ===")
        
        try:
            pandas_df, polars_df = generate_data(size)
            # Define the operations; note that join operations are defined separately.
            operations = [
                ('Filter', pandas_filter, polars_filter, pandas_df, polars_df),
                ('Select', pandas_select, polars_select, pandas_df, polars_df),
                ('GroupBy', pandas_groupby_agg, polars_groupby_agg, pandas_df, polars_df)
            ]
            
            # Run non-join operations
            for op in operations:
                op_name, pandas_func, polars_func, pd_arg, pl_arg = op
                try:
                    _, pandas_time = pandas_func(pd_arg)
                    _, polars_time = polars_func(pl_arg)
                    
                    min_time = 1e-8
                    polars_time_safe = max(polars_time, min_time)
                    speedup = pandas_time / polars_time_safe if polars_time != 0 else float('inf')
                    display_speedup = min(speedup, 1000) if speedup != float('inf') else 1000
                    winner = "Polars" if speedup > 1 else "Pandas"
                    
                    print(f"\n{op_name} - {size:,} rows:")
                    print(f"  Pandas: {pandas_time:.6f} seconds")
                    print(f"  Polars: {polars_time:.6f} seconds")
                    print(f"  Ratio:  {display_speedup:.2f}x → {winner} is faster")
                    
                    all_results.append({
                        'Operation': op_name,
                        'Size': size,
                        'Pandas Time': pandas_time,
                        'Polars Time': polars_time,
                        'Original Speedup': speedup,
                        'Speedup': display_speedup
                    })
                    gc.collect()
                    
                except Exception as e:
                    print(f"Error running {op_name} benchmark: {str(e)}")
            
            # Run join operations only if size is within threshold.
            if size <= JOIN_THRESHOLD:
                print(f"\nJoin - {size:,} rows:")
                _, pandas_time = pandas_join(pandas_df, pandas_df)
                _, polars_time = polars_join(polars_df, polars_df)
                
                min_time = 1e-8
                polars_time_safe = max(polars_time, min_time)
                speedup = pandas_time / polars_time_safe if polars_time != 0 else float('inf')
                display_speedup = min(speedup, 1000) if speedup != float('inf') else 1000
                winner = "Polars" if speedup > 1 else "Pandas"
                
                print(f"  Pandas: {pandas_time:.6f} seconds")
                print(f"  Polars: {polars_time:.6f} seconds")
                print(f"  Ratio:  {display_speedup:.2f}x → {winner} is faster")
                
                all_results.append({
                    'Operation': 'Join',
                    'Size': size,
                    'Pandas Time': pandas_time,
                    'Polars Time': polars_time,
                    'Original Speedup': speedup,
                    'Speedup': display_speedup
                })
            else:
                print(f"\nSkipping join benchmark for {size:,} rows as it exceeds the threshold.")
            
            pandas_df = None
            polars_df = None
            gc.collect()
            
        except Exception as e:
            print(f"Error generating data for size {size}: {str(e)}")
            continue
    
    return all_results

# ---------------------------------------------------------------------
# Results Analysis and Reporting
# ---------------------------------------------------------------------
def print_summary(results):
    """
    Analyze benchmark results and generate a comprehensive summary.
    
    Creates both console output and a detailed text file report organizing
    results by data size and operation type.
    
    Args:
        results (list): Benchmark results from run_benchmarks()
        
    Returns:
        pandas.DataFrame: Processed results in DataFrame format for further analysis
    """
    if not results:
        print("No benchmark results to summarize")
        return None
    
    df_results = pd.DataFrame(results)
    
    print("\n======= BENCHMARK SUMMARY =======\n")
    
    for size in sorted(df_results['Size'].unique()):
        size_results = df_results[df_results['Size'] == size]
        
        print(f"\n--- Results for {size:,} rows ---")
        print(f"{'Operation':<10} | {'Pandas (s)':<12} | {'Polars (s)':<12} | {'Speedup':<10} | {'Winner'}")
        print("-" * 70)
        
        for _, result in size_results.sort_values('Operation').iterrows():
            pandas_time = result['Pandas Time']
            polars_time = result['Polars Time']
            speedup = result['Speedup']
            original_speedup = result['Original Speedup']
            
            winner = "Polars" if original_speedup > 1 else "Pandas"
            speedup_text = f"{speedup:.2f}" if original_speedup < 1000 else ">1000"
            
            print(f"{result['Operation']:<10} | {pandas_time:<12.6f} | {polars_time:<12.6f} | {speedup_text:<10} | {winner}")
        
        wins_polars = (size_results['Original Speedup'] > 1).sum()
        wins_pandas = len(size_results) - wins_polars
        
        print("-" * 70)
        print(f"Operations where Polars wins: {wins_polars}")
        print(f"Operations where Pandas wins: {wins_pandas}")
    
    with open('benchmark_results/summary.txt', 'w') as f:
        f.write("PANDAS VS POLARS BENCHMARK SUMMARY\n")
        f.write("=" * 50 + "\n\n")
        
        for size in sorted(df_results['Size'].unique()):
            size_results = df_results[df_results['Size'] == size]
            
            f.write(f"\nRESULTS FOR {size:,} ROWS\n")
            f.write("-" * 50 + "\n")
            f.write(f"{'Operation':<10} | {'Pandas (s)':<12} | {'Polars (s)':<12} | {'Speedup':<10} | {'Winner'}\n")
            f.write("-" * 70 + "\n")
            
            for _, result in size_results.sort_values('Operation').iterrows():
                pandas_time = result['Pandas Time']
                polars_time = result['Polars Time']
                speedup = result['Speedup']
                original_speedup = result['Original Speedup']
                
                winner = "Polars" if original_speedup > 1 else "Pandas"
                speedup_text = f"{speedup:.2f}" if original_speedup < 1000 else ">1000"
                
                f.write(f"{result['Operation']:<10} | {pandas_time:<12.6f} | {polars_time:<12.6f} | {speedup_text:<10} | {winner}\n")
            
            wins_polars = (size_results['Original Speedup'] > 1).sum()
            wins_pandas = len(size_results) - wins_polars
            
            f.write("-" * 70 + "\n")
            f.write(f"Operations where Polars wins: {wins_polars}\n")
            f.write(f"Operations where Pandas wins: {wins_pandas}\n")
    
    print("\n=== OVERALL CONCLUSIONS ===")
    wins_polars_overall = (df_results['Original Speedup'] > 1).sum()
    wins_pandas_overall = len(df_results) - wins_polars_overall
    
    print(f"Total operations where Polars wins: {wins_polars_overall}")
    print(f"Total operations where Pandas wins: {wins_pandas_overall}")
    
    with open('benchmark_results/summary.txt', 'a') as f:
        f.write("\n\n=== OVERALL CONCLUSIONS ===\n")
        f.write(f"Total operations where Polars wins: {wins_polars_overall}\n")
        f.write(f"Total operations where Pandas wins: {wins_pandas_overall}\n")
    
    return df_results

# ---------------------------------------------------------------------
# Visualization Generation
# ---------------------------------------------------------------------
def create_improved_visualizations(results_df):
    """
    Create a comprehensive set of visualizations from benchmark results.
    
    Generates:
    1. Bar charts comparing operation times for each data size.
    2. A heatmap of speedup factors.
    3. Summary charts of average speedup by data size (linear and log scale).
    4. Line charts showing performance scaling per operation.
    
    Args:
        results_df (pandas.DataFrame): Benchmark results in DataFrame format
    """
    if results_df.empty:
        print("No results to visualize")
        return
    
    min_time_threshold = 0.0001  # 0.1 millisecond minimum
    
    # 1. Bar charts for each data size
    for size in results_df['Size'].unique():
        size_df = results_df[results_df['Size'] == size]
        plt.figure(figsize=(12, 7))
        
        operations = size_df['Operation'].tolist()
        pandas_times = size_df['Pandas Time'].tolist()
        polars_times_raw = size_df['Polars Time'].tolist()
        polars_times_display = [max(t, min_time_threshold) for t in polars_times_raw]
        
        x = np.arange(len(operations))
        width = 0.35
        
        plt.bar(x - width/2, pandas_times, width, label='Pandas')
        plt.bar(x + width/2, polars_times_display, width, label='Polars')
        
        for i, (p_time, pl_time_display, pl_time_raw) in enumerate(zip(pandas_times, polars_times_display, polars_times_raw)):
            speedup_row = size_df.iloc[i]
            speedup = speedup_row['Speedup']
            speedup_text = f'{speedup:.1f}x' if speedup < 1000 else ">1000x"
            y_pos = max(p_time, pl_time_display) * 1.1
            plt.text(i, y_pos, speedup_text, ha='center', fontweight='bold', fontsize=11)
            
            # Optionally, annotate exact times on the bars (uncomment if desired):
            # plt.text(i - width/2, p_time / 2, f'{p_time:.4f}s', ha='center', va='center', 
            #          color='white', fontweight='bold', fontsize=9)
            # plt.text(i + width/2, pl_time_display / 2, f'{pl_time_raw:.4f}s', ha='center', va='center', 
            #          color='white', fontweight='bold', fontsize=9)
        
        plt.xlabel('Operation', fontsize=12)
        plt.ylabel('Time (seconds)', fontsize=12)
        plt.title(f'Pandas vs Polars - {size:,} rows', fontsize=14, fontweight='bold')
        plt.xticks(x, operations, fontsize=11)
        plt.legend(fontsize=11)
        plt.grid(axis='y', alpha=0.3)
        plt.tight_layout()
        plt.savefig(f'benchmark_results/plots/barchart_{size}.png', dpi=300)
        plt.close()
    
    # 2. Heatmap of speedup factors
    try:
        pivot_df = results_df.pivot(index='Operation', columns='Size', values='Speedup')
        plt.figure(figsize=(12, 7))
        hm = sns.heatmap(pivot_df, cmap=plt.cm.viridis, cbar_kws={'label': 'Speedup Factor (Pandas/Polars)'})
        
        for i in range(len(pivot_df.index)):
            for j in range(len(pivot_df.columns)):
                value = pivot_df.iloc[i, j]
                text = ">1000x" if value >= 1000 else f"{value:.1f}x"
                text_color = "white" if value > 5 else "black"
                hm.text(j + 0.5, i + 0.5, text, ha="center", va="center", color=text_color)
        
        plt.title('Speedup Factors: Pandas/Polars Ratio', fontsize=14, fontweight='bold')
        plt.xlabel('Data Size (rows)', fontsize=12)
        plt.ylabel('Operation', fontsize=12)
        plt.xticks(ticks=np.arange(len(pivot_df.columns)) + 0.5, 
                   labels=[f'{int(size):,}' for size in pivot_df.columns], fontsize=10)
        plt.yticks(fontsize=10)
        plt.tight_layout()
        plt.savefig('benchmark_results/plots/heatmap_speedup.png', dpi=300)
        plt.close()
    except Exception as e:
        print(f"Error creating heatmap: {str(e)}")
    
    # 3. Summary bar charts for average speedup by data size
    try:
        avg_by_size = results_df.groupby('Size')['Speedup'].mean().reset_index()
        plt.figure(figsize=(12, 6))
        sizes_formatted = [f'{int(size):,}' for size in avg_by_size['Size']]
        plt.bar(sizes_formatted, avg_by_size['Speedup'], color='#3498db', width=0.7)
        plt.axhline(y=1, color='black', linestyle='--', alpha=0.5)
        
        for i, speedup in enumerate(avg_by_size['Speedup']):
            text = ">1000x" if speedup >= 1000 else f"{speedup:.1f}x"
            plt.text(i, speedup * 1.05, text, ha='center', color='#2c3e50', fontweight='bold', fontsize=11)
        
        plt.xlabel('Data Size (rows)', fontsize=12)
        plt.ylabel('Average Speedup Factor (Pandas/Polars)', fontsize=12)
        plt.title('Average Performance Advantage of Polars by Data Size', fontsize=14, fontweight='bold')
        plt.grid(axis='y', alpha=0.3)
        plt.tight_layout()
        plt.savefig('benchmark_results/plots/average_speedup.png', dpi=300)
        plt.close()
        
        # Log-scale version
        plt.figure(figsize=(12, 6))
        plt.bar(sizes_formatted, avg_by_size['Speedup'], color='#3498db', width=0.7)
        plt.yscale('log')
        plt.axhline(y=1, color='black', linestyle='--', alpha=0.5)
        
        for i, speedup in enumerate(avg_by_size['Speedup']):
            text = ">1000x" if speedup >= 1000 else f"{speedup:.1f}x"
            y_pos = speedup * 1.2
            plt.text(i, y_pos, text, ha='center', color='#2c3e50', fontweight='bold', fontsize=11)
        
        plt.xlabel('Data Size (rows)', fontsize=12)
        plt.ylabel('Average Speedup Factor (log scale)', fontsize=12)
        plt.title('Average Performance Advantage of Polars by Data Size (Log Scale)', fontsize=14, fontweight='bold')
        plt.grid(axis='y', alpha=0.3)
        plt.tight_layout()
        plt.savefig('benchmark_results/plots/average_speedup_log.png', dpi=300)
        plt.close()
    except Exception as e:
        print(f"Error creating summary bar charts: {str(e)}")
    
    # 4. Line charts showing performance scaling per operation
    operations = results_df['Operation'].unique()
    for operation in operations:
        op_df = results_df[results_df['Operation'] == operation]
        if len(op_df) < 2:
            continue
        
        plt.figure(figsize=(12, 7))
        sizes = op_df['Size'].tolist()
        pandas_times = op_df['Pandas Time'].tolist()
        polars_times_raw = op_df['Polars Time'].tolist()
        polars_times_display = [max(t, min_time_threshold) for t in polars_times_raw]
        sizes_formatted = [f'{int(size):,}' for size in sizes]
        
        plt.plot(sizes_formatted, pandas_times, 'o-', label='Pandas', linewidth=2, markersize=8)
        plt.plot(sizes_formatted, polars_times_display, 'o-', label='Polars', linewidth=2, markersize=8)
        
        for i, speedup in enumerate(op_df['Speedup']):
            speedup_text = f'{speedup:.1f}x' if speedup < 1000 else ">1000x"
            y_pos = max(pandas_times[i], polars_times_display[i]) * 1.15
            plt.annotate(speedup_text, xy=(sizes_formatted[i], y_pos), ha='center', va='bottom',
                         fontweight='bold', bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="gray", alpha=0.8))
        
        plt.xlabel('Data Size (rows)', fontsize=12)
        plt.ylabel('Time (seconds)', fontsize=12)
        plt.title(f'{operation} Performance Scaling', fontsize=14, fontweight='bold')
        plt.legend(fontsize=11)
        plt.grid(alpha=0.3)
        
        max_time = max(pandas_times + polars_times_display)
        min_time = min(t for t in pandas_times + polars_times_display if t > 0)
        if max_time / min_time > 5:
            plt.yscale('log')
            plt.ylabel('Time (seconds, log scale)', fontsize=12)
        
        plt.tight_layout()
        plt.savefig(f'benchmark_results/plots/scaling_{operation}.png', dpi=300)
        plt.close()

# ---------------------------------------------------------------------
# Main Execution
# ---------------------------------------------------------------------
if __name__ == "__main__":
    print("Starting Pandas vs Polars Benchmark with Visualizations")
    print("=" * 60)
    print("Comparing performance across DataFrame operations and data sizes")
    print("This may take several minutes to complete, especially for larger datasets")
    print("=" * 60)
    
    results = run_benchmarks()
    df_results = print_summary(results)
    
    if df_results is not None:
        create_improved_visualizations(df_results)
    else:
        print("Skipping visualizations due to no valid results")
    
    print("\nBenchmark completed!")
    print("Results summary saved to 'benchmark_results/summary.txt'")
    print("Visualizations saved to 'benchmark_results/plots/' directory (if generated)")
    print("\nTo analyze these results further or create custom visualizations,")
    print("the raw benchmark data is available in the returned DataFrame (if successful)")

Pandas vs Polars: A Comprehensive Performance Comparison