How to Split an Excel Master Sheet into Multiple Files Using Python

If you regularly work with Excel, you’ve likely faced the tedious task of splitting a master spreadsheet into multiple files—whether by department, region, project, or category. Manual filtering, copying, and saving separate files is not only time-consuming but also prone to errors. Today, I’ll show you how to automate this entire process with Python, transforming hours of work into seconds of execution.

Consider this scenario: Your company has a comprehensive purchase.xlsx file containing procurement data for all departments—R&D, Production, IT, Marketing, and more. Each month, you need to:

  • Create separate Excel files for each department
  • Include only relevant columns in each report
  • Maintain the original master sheet unchanged
  • Ensure consistent formatting and data integrity

Doing this manually isn’t just tedious—it’s error-prone and time-consuming. Fortunately, Python provides an elegant, automated solution that takes minutes to implement and saves hours of repetitive work.

Why Python for Excel Automation?

While Excel offers basic filtering and splitting capabilities, Python brings:

  • Complete automation – Run once, process thousands of rows
  • Customizable output – Control exactly which columns appear in each report
  • Scalability – Handle files too large for Excel
  • Reproducibility – Consistent results every time
  • Integration potential – Connect with databases, APIs, and other systems

The Complete Solution: A Python Script for Intelligent Excel Splitting

Let’s dive into the practical implementation using a procurement dataset example.

What You’ll Need to Get Started

1. Install Python (First-Time Users)

If you don’t have Python installed, here’s the quick setup:

Windows/Mac:

  • Visit python.org
  • Download the latest version
  • Important: During installation (Windows), check “Add Python to PATH
  • Verify installation by opening terminal/command prompt and typing:
python --version

2. Install Required Libraries

Python needs two key libraries for Excel automation:

pip install pandas openpyxl

What these do:

  • pandas: Data manipulation powerhouse (think “Excel on steroids”)
  • openpyxl: Reads and writes Excel files natively

The Splitting Script

Here’s the complete Python script that transforms your workflow:

import pandas as pd
import os
from pathlib import Path

def split_purchase_by_department():
    """
    Split purchase data by Requesting Department into separate Excel workbooks.
    Each department's data is saved as an independent workbook in 'sub_table' folder.

    """
    # ==============================================
    # CONFIGURATION SECTION - MODIFY HERE
    # ==============================================
    # Define ALL available columns in your purchase list
    ALL_COLUMNS = [
        'Purchaser',
        'Requesting Department',  # REQUIRED - This is used for splitting
        'PO Number',
        'PO Status',
        'Vendor',
        'Vendor ID',
        'Item Number',
        'Description',
        'Model',
        'Unit',
        'Quantity',
        'Unit Price',
        'Amount',
        'Currency (USD)',
        'Request Date',
        'PO Date',
        'Required Delivery Date',
        'Promised Delivery Date',
        'Actual Receipt Date',
        'Warehouse Receipt Date'
    ]
    # ==============================================
    # SELECT OUTPUT COLUMNS - UNCOMMENT AS NEEDED
    # ==============================================
    # Choose which columns to include in output files
    # By default, ALL columns are included
    # OPTION 1: Include all columns (default)
    OUTPUT_COLUMNS = ALL_COLUMNS.copy()

    # OPTION 2: Include specific columns only
    # OUTPUT_COLUMNS = [
    #     'PO Number',
    #     'Requesting Department',
    #     'Vendor',
    #     'Item Number',
    #     'Description',
    #     'Quantity',
    #     'Unit Price',
    #     'Amount',
    #     'PO Date'
    # ]

    # ==============================================
    # FILE CONFIGURATION
    # ==============================================

    INPUT_FILE = 'purchase.xlsx'  # Your main Excel file
    OUTPUT_FOLDER = 'sub_table'   # Folder for department files

    # ==============================================
    # MAIN PROCESSING LOGIC
    # ==============================================

    print("Starting purchase data split process...")
    # Step 1: Check if input file exists
    if not os.path.exists(INPUT_FILE):
        print(f"ERROR: Input file '{INPUT_FILE}' not found.")
        return
    # Step 2: Read the Excel file
    try:
        df = pd.read_excel(INPUT_FILE)
        print(f"✓ Successfully read: {INPUT_FILE}")
        print(f"✓ Total records: {len(df):,}")
    except Exception as e:
        print(f"ERROR: Failed to read Excel file - {str(e)}")
        return

    # Step 3: Verify required column exists
    if 'Requesting Department' not in df.columns:
        print("ERROR: 'Requesting Department' column not found.")
        return

    # Step 4: Validate output columns
    valid_columns = [col for col in OUTPUT_COLUMNS if col in df.columns]
    print(f"✓ Output will include {len(valid_columns)} columns")

    # Step 5: Create output folder
    output_path = Path(OUTPUT_FOLDER)
    output_path.mkdir(exist_ok=True)

    # Step 6: Get unique departments
    departments = df['Requesting Department'].dropna().unique()
    print(f"✓ Found {len(departments)} unique departments")

    # Step 7: Split and save data for each department

    files_created = 0
    for dept in sorted(departments):
        dept_data = df[df['Requesting Department'] == dept]
        if len(dept_data) > 0:
            dept_data_filtered = dept_data[valid_columns]
            # Create safe filename
            safe_dept_name = str(dept).strip()
            for char in ['/', '\\', ':', '*', '?', '"', '<', '>', '|']:
                safe_dept_name = safe_dept_name.replace(char, '_')
            output_filename = f"purchase_{safe_dept_name}.xlsx"
            output_filepath = output_path / output_filename
            dept_data_filtered.to_excel(output_filepath, index=False)
            files_created += 1
            print(f"✓ Created: {output_filename} ({len(dept_data)} rows)")
    # Summary

    print(f"\nPROCESS COMPLETE")
    print(f"Files created: {files_created}")
    print(f"Output location: {output_path.absolute()}")
    print("\nOriginal file remains unchanged.")

if __name__ == "__main__":
    split_purchase_by_department()

How to Use This Script: 4 Simple Steps

1. Prepare Your Excel File

  • Ensure your master file is named purchase.xlsx
  • Confirm it contains a column named Requesting Department
  • Place the file in the same folder as your Python script

2. Customize the Configuration

Modify the script based on your needs:

A. Change which columns to output:

# Uncomment and modify this section:
OUTPUT_COLUMNS = [
    'PO Number',
    'Requesting Department',
    'Vendor',
    'Description',
    'Quantity',
    'Unit Price',
    'Amount',
    'PO Date'
]

B. Change input/output file names:

INPUT_FILE = 'your_master_data.xlsx'
OUTPUT_FOLDER = 'department_reports'

3. Run the Script

python split_purchase.py

4. Check the Results

The script creates a sub_table folder containing all sub tables. See example below.

sub_table/
├── purchase_R&D.xlsx
├── purchase_Production.xlsx
├── purchase_IT.xlsx
├── purchase_Marketing.xlsx
└── ... (one file per department)

Real-World Application Examples

Example 1: Split Sales Data by Region

INPUT_FILE = 'sales_2024.xlsx'
SPLIT_COLUMN = 'Region'
OUTPUT_COLUMNS = ['Order_ID', 'Region', 'Product', 'Quantity', 'Revenue', 'Date']

Example 2: Separate Employee Records by Department

INPUT_FILE = 'employees.xlsx'
SPLIT_COLUMN = 'Department'
OUTPUT_COLUMNS = ['Employee_ID', 'Name', 'Department', 'Position', 'Salary', 'Start_Date']

Example 3: Organize Inventory by Category

INPUT_FILE = 'inventory.xlsx'
SPLIT_COLUMN = 'Category'
OUTPUT_COLUMNS = ['SKU', 'Category', 'Product_Name', 'Quantity', 'Price', 'Supplier']

Troubleshooting Common Issues

Problem 1: “ModuleNotFoundError: No module named ‘pandas’”

Solution:

pip install pandas openpyxl

Problem 2: “File not found” error

Solution:

  • Ensure Excel file is in the same directory as the script
  • Check for typos in INPUT_FILE variable
  • Use absolute path if needed: INPUT_FILE = ‘C:/Users/Name/Documents/data.xlsx’

Problem 3: Column not found

Solution:

  • Verify column name exactly matches Excel header (case-sensitive)
  • Print available columns: print(list(df.columns))
  • Ensure no leading/trailing spaces in column names

Why This Approach Beats Manual Excel Methods

CriteriaManual ExcelPython Automation
Time Required30+ minutes for 1000 rows10 seconds
AccuracyProne to copy-paste errors100% consistent
ScalabilitySlows with more dataHandles millions of rows
CustomizationManual column selectionProgrammable logic
ReproducibilityHard to replicate exactlyIdentical results every time

Getting Started with Minimal Setup

For those new to Python, here’s the quickest path to success:

  1. Install Python (5 minutes)
  2. Run one command:
pip install pandas openpyxl
  1. Save the script as split_purchase.py
  2. Place your Excel file in the same folder
  3. Run:
python split_purchase.py

Conclusion: Transform Your Excel Workflow

This Python solution represents a significant leap forward from manual Excel operations. By automating the tedious process of splitting master sheets into departmental reports, you free up valuable time for analysis and decision-making rather than data manipulation.

The script provided here is just the beginning. As you become comfortable with Python’s capabilities, you’ll discover endless possibilities for enhancing your Excel workflows.

Need Help or Have Questions?

We’re here to help! If you encounter any issues implementing this solution or want to adapt it to your specific needs:

  1. Leave a comment below with your question
  2. Share your use case – we might feature it in a future article
  3. Request specific tutorials on related Excel automation topics

Your Excel challenges are opportunities for automation. Let us know what repetitive tasks are consuming your time, and we’ll explore Python solutions together.

Leave a Reply