How to Split an Excel Master Sheet into Multiple Files Using Python

If you regularly work with Excel, you’ve likely faced the tedious task of splitting a master spreadsheet into multiple files—whether by department, region, project, or category. Manual filtering, copying, and saving separate files is not only time-consuming but also prone to errors. Today, I’ll show you how to automate this entire process with Python, transforming hours of work into seconds of execution.

Consider this scenario: Your company has a comprehensive purchase.xlsx file containing procurement data for all departments—R&D, Production, IT, Marketing, and more. Each month, you need to:

Create separate Excel files for each department
Include only relevant columns in each report
Maintain the original master sheet unchanged
Ensure consistent formatting and data integrity

Doing this manually isn’t just tedious—it’s error-prone and time-consuming. Fortunately, Python provides an elegant, automated solution that takes minutes to implement and saves hours of repetitive work.

Table of Contents

Why Python for Excel Automation?
- The Complete Solution: A Python Script for Intelligent Excel Splitting
Real-World Application Examples
Troubleshooting Common Issues
Why This Approach Beats Manual Excel Methods
Getting Started with Minimal Setup
Conclusion: Transform Your Excel Workflow

Why Python for Excel Automation?

While Excel offers basic filtering and splitting capabilities, Python brings:

Complete automation – Run once, process thousands of rows
Customizable output – Control exactly which columns appear in each report
Scalability – Handle files too large for Excel
Reproducibility – Consistent results every time
Integration potential – Connect with databases, APIs, and other systems

The Complete Solution: A Python Script for Intelligent Excel Splitting

Let’s dive into the practical implementation using a procurement dataset example.

What You’ll Need to Get Started

1. Install Python (First-Time Users)

If you don’t have Python installed, here’s the quick setup:

Windows/Mac:

Visit python.org
Download the latest version
Important: During installation (Windows), check “Add Python to PATH“
Verify installation by opening terminal/command prompt and typing:

python --version

2. Install Required Libraries

Python needs two key libraries for Excel automation:

pip install pandas openpyxl

What these do:

pandas: Data manipulation powerhouse (think “Excel on steroids”)
openpyxl: Reads and writes Excel files natively

The Splitting Script

Here’s the complete Python script that transforms your workflow:

import pandas as pd
import os
from pathlib import Path

def split_purchase_by_department():
    """
    Split purchase data by Requesting Department into separate Excel workbooks.
    Each department's data is saved as an independent workbook in 'sub_table' folder.

    """
    # ==============================================
    # CONFIGURATION SECTION - MODIFY HERE
    # ==============================================
    # Define ALL available columns in your purchase list
    ALL_COLUMNS = [
        'Purchaser',
        'Requesting Department',  # REQUIRED - This is used for splitting
        'PO Number',
        'PO Status',
        'Vendor',
        'Vendor ID',
        'Item Number',
        'Description',
        'Model',
        'Unit',
        'Quantity',
        'Unit Price',
        'Amount',
        'Currency (USD)',
        'Request Date',
        'PO Date',
        'Required Delivery Date',
        'Promised Delivery Date',
        'Actual Receipt Date',
        'Warehouse Receipt Date'
    ]
    # ==============================================
    # SELECT OUTPUT COLUMNS - UNCOMMENT AS NEEDED
    # ==============================================
    # Choose which columns to include in output files
    # By default, ALL columns are included
    # OPTION 1: Include all columns (default)
    OUTPUT_COLUMNS = ALL_COLUMNS.copy()

    # OPTION 2: Include specific columns only
    # OUTPUT_COLUMNS = [
    #     'PO Number',
    #     'Requesting Department',
    #     'Vendor',
    #     'Item Number',
    #     'Description',
    #     'Quantity',
    #     'Unit Price',
    #     'Amount',
    #     'PO Date'
    # ]

    # ==============================================
    # FILE CONFIGURATION
    # ==============================================

    INPUT_FILE = 'purchase.xlsx'  # Your main Excel file
    OUTPUT_FOLDER = 'sub_table'   # Folder for department files

    # ==============================================
    # MAIN PROCESSING LOGIC
    # ==============================================

    print("Starting purchase data split process...")
    # Step 1: Check if input file exists
    if not os.path.exists(INPUT_FILE):
        print(f"ERROR: Input file '{INPUT_FILE}' not found.")
        return
    # Step 2: Read the Excel file
    try:
        df = pd.read_excel(INPUT_FILE)
        print(f"✓ Successfully read: {INPUT_FILE}")
        print(f"✓ Total records: {len(df):,}")
    except Exception as e:
        print(f"ERROR: Failed to read Excel file - {str(e)}")
        return

    # Step 3: Verify required column exists
    if 'Requesting Department' not in df.columns:
        print("ERROR: 'Requesting Department' column not found.")
        return

    # Step 4: Validate output columns
    valid_columns = [col for col in OUTPUT_COLUMNS if col in df.columns]
    print(f"✓ Output will include {len(valid_columns)} columns")

    # Step 5: Create output folder
    output_path = Path(OUTPUT_FOLDER)
    output_path.mkdir(exist_ok=True)

    # Step 6: Get unique departments
    departments = df['Requesting Department'].dropna().unique()
    print(f"✓ Found {len(departments)} unique departments")

    # Step 7: Split and save data for each department

    files_created = 0
    for dept in sorted(departments):
        dept_data = df[df['Requesting Department'] == dept]
        if len(dept_data) > 0:
            dept_data_filtered = dept_data[valid_columns]
            # Create safe filename
            safe_dept_name = str(dept).strip()
            for char in ['/', '\\', ':', '*', '?', '"', '<', '>', '|']:
                safe_dept_name = safe_dept_name.replace(char, '_')
            output_filename = f"purchase_{safe_dept_name}.xlsx"
            output_filepath = output_path / output_filename
            dept_data_filtered.to_excel(output_filepath, index=False)
            files_created += 1
            print(f"✓ Created: {output_filename} ({len(dept_data)} rows)")
    # Summary

    print(f"\nPROCESS COMPLETE")
    print(f"Files created: {files_created}")
    print(f"Output location: {output_path.absolute()}")
    print("\nOriginal file remains unchanged.")

if __name__ == "__main__":
    split_purchase_by_department()

How to Use This Script: 4 Simple Steps

1. Prepare Your Excel File

Ensure your master file is named purchase.xlsx
Confirm it contains a column named Requesting Department
Place the file in the same folder as your Python script

2. Customize the Configuration

Modify the script based on your needs:

A. Change which columns to output:

# Uncomment and modify this section:
OUTPUT_COLUMNS = [
    'PO Number',
    'Requesting Department',
    'Vendor',
    'Description',
    'Quantity',
    'Unit Price',
    'Amount',
    'PO Date'
]

B. Change input/output file names:

INPUT_FILE = 'your_master_data.xlsx'
OUTPUT_FOLDER = 'department_reports'

3. Run the Script

python split_purchase.py

4. Check the Results

The script creates a sub_table folder containing all sub tables. See example below.

sub_table/
├── purchase_R&D.xlsx
├── purchase_Production.xlsx
├── purchase_IT.xlsx
├── purchase_Marketing.xlsx
└── ... (one file per department)

Real-World Application Examples

Example 1: Split Sales Data by Region

INPUT_FILE = 'sales_2024.xlsx'
SPLIT_COLUMN = 'Region'
OUTPUT_COLUMNS = ['Order_ID', 'Region', 'Product', 'Quantity', 'Revenue', 'Date']

Example 2: Separate Employee Records by Department

INPUT_FILE = 'employees.xlsx'
SPLIT_COLUMN = 'Department'
OUTPUT_COLUMNS = ['Employee_ID', 'Name', 'Department', 'Position', 'Salary', 'Start_Date']

Example 3: Organize Inventory by Category

INPUT_FILE = 'inventory.xlsx'
SPLIT_COLUMN = 'Category'
OUTPUT_COLUMNS = ['SKU', 'Category', 'Product_Name', 'Quantity', 'Price', 'Supplier']

Troubleshooting Common Issues

Problem 1: “ModuleNotFoundError: No module named ‘pandas’”

Solution:

pip install pandas openpyxl

Problem 2: “File not found” error

Solution:

Ensure Excel file is in the same directory as the script
Check for typos in INPUT_FILE variable
Use absolute path if needed: INPUT_FILE = ‘C:/Users/Name/Documents/data.xlsx’

Problem 3: Column not found

Solution:

Verify column name exactly matches Excel header (case-sensitive)
Print available columns: print(list(df.columns))
Ensure no leading/trailing spaces in column names

Why This Approach Beats Manual Excel Methods

Criteria	Manual Excel	Python Automation
Time Required	30+ minutes for 1000 rows	10 seconds
Accuracy	Prone to copy-paste errors	100% consistent
Scalability	Slows with more data	Handles millions of rows
Customization	Manual column selection	Programmable logic
Reproducibility	Hard to replicate exactly	Identical results every time

Getting Started with Minimal Setup

For those new to Python, here’s the quickest path to success:

Install Python (5 minutes)
Run one command:

pip install pandas openpyxl

Save the script as split_purchase.py
Place your Excel file in the same folder
Run:

python split_purchase.py

Conclusion: Transform Your Excel Workflow

This Python solution represents a significant leap forward from manual Excel operations. By automating the tedious process of splitting master sheets into departmental reports, you free up valuable time for analysis and decision-making rather than data manipulation.

The script provided here is just the beginning. As you become comfortable with Python’s capabilities, you’ll discover endless possibilities for enhancing your Excel workflows.

Need Help or Have Questions?

We’re here to help! If you encounter any issues implementing this solution or want to adapt it to your specific needs:

Leave a comment below with your question
Share your use case – we might feature it in a future article
Request specific tutorials on related Excel automation topics

Your Excel challenges are opportunities for automation. Let us know what repetitive tasks are consuming your time, and we’ll explore Python solutions together.

ExcelHow

How to Split an Excel Master Sheet into Multiple Files Using Python

Why Python for Excel Automation?

The Complete Solution: A Python Script for Intelligent Excel Splitting

What You’ll Need to Get Started

How to Use This Script: 4 Simple Steps

Real-World Application Examples

Example 1: Split Sales Data by Region

Example 2: Separate Employee Records by Department

Example 3: Organize Inventory by Category

Troubleshooting Common Issues

Problem 1: “ModuleNotFoundError: No module named ‘pandas’”

Problem 2: “File not found” error

Problem 3: Column not found

Why This Approach Beats Manual Excel Methods

Getting Started with Minimal Setup

Conclusion: Transform Your Excel Workflow

Recent Posts

Leave a Reply Cancel reply