Excel Cleaning & Compare App
App Overview & Commentary
This Streamlit app is designed to make working with Excel files faster and without manually having to restructure data. It focuses on two core workflows: cleaning invoices and tabular data that often come from PDF exports,and comparing two related datasets after they’ve been standardized.
The app supports multiple patterns you commonly see in real-world spreadsheets: multiple header rows, shifted or extra columns, extra notes at the top or bottom of the file, “Totals” rows mixed into the data, and inconsistent date formats. Instead of fixing each file manually in Excel, this tool guides you through a repeatable process and lets you download a clean version of the data for further analysis.
There are two main tools inside the app: the Aftermath App (tailored to a specific recurring workflow) and the General App (built to handle a wider range of layouts while still being guided and structured). Both are backed by the same cleaning logic, but expose different levels of flexibility depending on how messy or standardized your input files are.
Here’s a breakdown of the project:
- Cleaning Excel files
- The app helps you clean the header row, drop extra top rows, and trim off unneeded data at the bottom of the sheet. This is especially useful for files exported from PDF, where tables rarely start cleanly on row 1 and often include titles or notes above the headers.
- You can control where the data starts (by column) and how many columns you expect to see, which helps catch cases where columns shift left or right between files. The app reshapes the incoming data into a clean dataframe so downstream comparisons are less fragile.
- Standardizing columns and dates
- Many real-world spreadsheets have missing information that needs to be filled to have a clean dataframe. This cleaning process helps fill those gaps in several ways (forward fill, backwards fill, or custom fill) by column.
- Date handling is a major pain point, so the tool offers options for strict and flexible date parsing, filling missing values, and enforcing a consistent date format. This makes it much easier to filter, join, and aggregate data later on.
- There is also an option to add extra columns that will track if there was any information filled in a missing cell so there is no ambiguity on what was the original data.
- Comparing two related files
- Once two files have been cleaned into a clean structure, the app supports comparing them side-by-side as long as they have a shared unique identifier column.
- The comparison highlights mismatches, missing rows, or differences in values between the two datasets for the same unique ID. This is especially valuable for reconciliation work, where you need to quickly see where numbers diverge and why.
- Guided workflows and sample files
- The app is built to be approachable, even if you’re not comfortable writing code. The interface walks you through each step: uploading files, defining the header row, selecting cleaning options, and finally downloading the cleaned or compared output.
- Sample files (including Aftermath-style invoices and more general invoice layouts) are provided so you can follow along with suggested walkthroughs. This makes it easy to understand how the cleaning engine behaves before you plug in your own data.
- Why build it as a Streamlit app?
- Packaging this workflow in Streamlit makes it accessible through the browser, with no need to install Python locally. It also allows for easy access for the user with an interface that can be accessed anywhere as long as there is an internet connection. Also, future iteration for the cleaning options or comparison modes can be added without changing the way users access the tool.
- The app is deployed on Streamlit Community Cloud, which handles hosting and makes it simple to share a single link on this portfolio page or with the end users. By embedding the app directly here, visitors can explore the functionality in context alongside the written explanation.
← Back to Previous Page