If you regularly work with Excel, you’ve likely faced the tedious task of splitting a master spreadsheet into multiple files—whether by department, region, project, or category. Manual filtering, copying, and saving separate files is not only time-consuming but also prone to errors. Today, I’ll show you how to automate this entire process with Python, transforming hours of work into seconds of execution.
Consider this scenario: Your company has a comprehensive purchase.xlsx file containing procurement data for all departments—R&D, Production, IT, Marketing, and more. Each month, you need to:
- Create separate Excel files for each department
- Include only relevant columns in each report
- Maintain the original master sheet unchanged
- Ensure consistent formatting and data integrity
Doing this manually isn’t just tedious—it’s error-prone and time-consuming. Fortunately, Python provides an elegant, automated solution that takes minutes to implement and saves hours of repetitive work.
Table of Contents
Why Python for Excel Automation?
While Excel offers basic filtering and splitting capabilities, Python brings:
- Complete automation – Run once, process thousands of rows
- Customizable output – Control exactly which columns appear in each report
- Scalability – Handle files too large for Excel
- Reproducibility – Consistent results every time
- Integration potential – Connect with databases, APIs, and other systems
The Complete Solution: A Python Script for Intelligent Excel Splitting
Let’s dive into the practical implementation using a procurement dataset example.
What You’ll Need to Get Started
1. Install Python (First-Time Users)
If you don’t have Python installed, here’s the quick setup:
Windows/Mac:
- Visit python.org
- Download the latest version
- Important: During installation (Windows), check “Add Python to PATH“
- Verify installation by opening terminal/command prompt and typing:
python --version
2. Install Required Libraries
Python needs two key libraries for Excel automation:
pip install pandas openpyxl
What these do:
- pandas: Data manipulation powerhouse (think “Excel on steroids”)
- openpyxl: Reads and writes Excel files natively
The Splitting Script
Here’s the complete Python script that transforms your workflow:
import pandas as pd
import os
from pathlib import Path
def split_purchase_by_department():
"""
Split purchase data by Requesting Department into separate Excel workbooks.
Each department's data is saved as an independent workbook in 'sub_table' folder.
"""
# ==============================================
# CONFIGURATION SECTION - MODIFY HERE
# ==============================================
# Define ALL available columns in your purchase list
ALL_COLUMNS = [
'Purchaser',
'Requesting Department', # REQUIRED - This is used for splitting
'PO Number',
'PO Status',
'Vendor',
'Vendor ID',
'Item Number',
'Description',
'Model',
'Unit',
'Quantity',
'Unit Price',
'Amount',
'Currency (USD)',
'Request Date',
'PO Date',
'Required Delivery Date',
'Promised Delivery Date',
'Actual Receipt Date',
'Warehouse Receipt Date'
]
# ==============================================
# SELECT OUTPUT COLUMNS - UNCOMMENT AS NEEDED
# ==============================================
# Choose which columns to include in output files
# By default, ALL columns are included
# OPTION 1: Include all columns (default)
OUTPUT_COLUMNS = ALL_COLUMNS.copy()
# OPTION 2: Include specific columns only
# OUTPUT_COLUMNS = [
# 'PO Number',
# 'Requesting Department',
# 'Vendor',
# 'Item Number',
# 'Description',
# 'Quantity',
# 'Unit Price',
# 'Amount',
# 'PO Date'
# ]
# ==============================================
# FILE CONFIGURATION
# ==============================================
INPUT_FILE = 'purchase.xlsx' # Your main Excel file
OUTPUT_FOLDER = 'sub_table' # Folder for department files
# ==============================================
# MAIN PROCESSING LOGIC
# ==============================================
print("Starting purchase data split process...")
# Step 1: Check if input file exists
if not os.path.exists(INPUT_FILE):
print(f"ERROR: Input file '{INPUT_FILE}' not found.")
return
# Step 2: Read the Excel file
try:
df = pd.read_excel(INPUT_FILE)
print(f"✓ Successfully read: {INPUT_FILE}")
print(f"✓ Total records: {len(df):,}")
except Exception as e:
print(f"ERROR: Failed to read Excel file - {str(e)}")
return
# Step 3: Verify required column exists
if 'Requesting Department' not in df.columns:
print("ERROR: 'Requesting Department' column not found.")
return
# Step 4: Validate output columns
valid_columns = [col for col in OUTPUT_COLUMNS if col in df.columns]
print(f"✓ Output will include {len(valid_columns)} columns")
# Step 5: Create output folder
output_path = Path(OUTPUT_FOLDER)
output_path.mkdir(exist_ok=True)
# Step 6: Get unique departments
departments = df['Requesting Department'].dropna().unique()
print(f"✓ Found {len(departments)} unique departments")
# Step 7: Split and save data for each department
files_created = 0
for dept in sorted(departments):
dept_data = df[df['Requesting Department'] == dept]
if len(dept_data) > 0:
dept_data_filtered = dept_data[valid_columns]
# Create safe filename
safe_dept_name = str(dept).strip()
for char in ['/', '\\', ':', '*', '?', '"', '<', '>', '|']:
safe_dept_name = safe_dept_name.replace(char, '_')
output_filename = f"purchase_{safe_dept_name}.xlsx"
output_filepath = output_path / output_filename
dept_data_filtered.to_excel(output_filepath, index=False)
files_created += 1
print(f"✓ Created: {output_filename} ({len(dept_data)} rows)")
# Summary
print(f"\nPROCESS COMPLETE")
print(f"Files created: {files_created}")
print(f"Output location: {output_path.absolute()}")
print("\nOriginal file remains unchanged.")
if __name__ == "__main__":
split_purchase_by_department()
How to Use This Script: 4 Simple Steps
1. Prepare Your Excel File
- Ensure your master file is named purchase.xlsx
- Confirm it contains a column named Requesting Department
- Place the file in the same folder as your Python script
2. Customize the Configuration
Modify the script based on your needs:
A. Change which columns to output:
# Uncomment and modify this section:
OUTPUT_COLUMNS = [
'PO Number',
'Requesting Department',
'Vendor',
'Description',
'Quantity',
'Unit Price',
'Amount',
'PO Date'
]
B. Change input/output file names:
INPUT_FILE = 'your_master_data.xlsx'
OUTPUT_FOLDER = 'department_reports'
3. Run the Script
python split_purchase.py
4. Check the Results
The script creates a sub_table folder containing all sub tables. See example below.
sub_table/
├── purchase_R&D.xlsx
├── purchase_Production.xlsx
├── purchase_IT.xlsx
├── purchase_Marketing.xlsx
└── ... (one file per department)
Real-World Application Examples
Example 1: Split Sales Data by Region
INPUT_FILE = 'sales_2024.xlsx'
SPLIT_COLUMN = 'Region'
OUTPUT_COLUMNS = ['Order_ID', 'Region', 'Product', 'Quantity', 'Revenue', 'Date']
Example 2: Separate Employee Records by Department
INPUT_FILE = 'employees.xlsx'
SPLIT_COLUMN = 'Department'
OUTPUT_COLUMNS = ['Employee_ID', 'Name', 'Department', 'Position', 'Salary', 'Start_Date']
Example 3: Organize Inventory by Category
INPUT_FILE = 'inventory.xlsx'
SPLIT_COLUMN = 'Category'
OUTPUT_COLUMNS = ['SKU', 'Category', 'Product_Name', 'Quantity', 'Price', 'Supplier']
Troubleshooting Common Issues
Problem 1: “ModuleNotFoundError: No module named ‘pandas’”
Solution:
pip install pandas openpyxl
Problem 2: “File not found” error
Solution:
- Ensure Excel file is in the same directory as the script
- Check for typos in INPUT_FILE variable
- Use absolute path if needed: INPUT_FILE = ‘C:/Users/Name/Documents/data.xlsx’
Problem 3: Column not found
Solution:
- Verify column name exactly matches Excel header (case-sensitive)
- Print available columns: print(list(df.columns))
- Ensure no leading/trailing spaces in column names
Why This Approach Beats Manual Excel Methods
| Criteria | Manual Excel | Python Automation |
| Time Required | 30+ minutes for 1000 rows | 10 seconds |
| Accuracy | Prone to copy-paste errors | 100% consistent |
| Scalability | Slows with more data | Handles millions of rows |
| Customization | Manual column selection | Programmable logic |
| Reproducibility | Hard to replicate exactly | Identical results every time |
Getting Started with Minimal Setup
For those new to Python, here’s the quickest path to success:
- Install Python (5 minutes)
- Run one command:
pip install pandas openpyxl
- Save the script as split_purchase.py
- Place your Excel file in the same folder
- Run:
python split_purchase.py
Conclusion: Transform Your Excel Workflow
This Python solution represents a significant leap forward from manual Excel operations. By automating the tedious process of splitting master sheets into departmental reports, you free up valuable time for analysis and decision-making rather than data manipulation.
The script provided here is just the beginning. As you become comfortable with Python’s capabilities, you’ll discover endless possibilities for enhancing your Excel workflows.
Need Help or Have Questions?
We’re here to help! If you encounter any issues implementing this solution or want to adapt it to your specific needs:
- Leave a comment below with your question
- Share your use case – we might feature it in a future article
- Request specific tutorials on related Excel automation topics
Your Excel challenges are opportunities for automation. Let us know what repetitive tasks are consuming your time, and we’ll explore Python solutions together.
Leave a Reply
You must be logged in to post a comment.