I’ve always used Excel for quick calculations and simple tables. But aside from common formulas and basic data manipulation techniques, I never felt the need to learn extra Excel functionalities—until my projects started getting complex.
The Problem That Made Me Finally Pay Attention
Due to several market factors and import duties, buying PC components in my area is often more expensive than in the US. I wanted to find out how much more I was paying for the same components and whether it would be better to order directly from Amazon or Newegg instead of local retailers. So, I collected a few months of pricing data on key PC components (CPUs, GPUs, RAM) that local shops typically import. Simple tracking project, right? Wrong.
I quickly ended up with a complete mess of data. Each retailer exported their information using different formatting conventions which made combining files nearly impossible. Amazon provided dates in MM/DD/YYYY format, and Newegg went with YYYYMMDD, and Shopee (my local shop) used DD-MM-YYYY.
The inconsistencies didn’t stop there. Column names varied wildly. Newegg labeled prices as “retail_price”, while Amazon used “unit_price_usd”, and Shopee chose “price_php”. Price formatting was equally problematic, with some files showing “₱18,600” including currency symbols, while others displayed plain numbers such as “320”. Even brand names lacked consistency, appearing as “gigabyte,” “GIGABYTE INC.,” or “Gigabyte Tech” for the same manufacturer across different files.
Manually cleaning and combining this data was already taking me hours. I had to copy and paste between files, find and replace inconsistent values, and delete blank rows one by one. Converting PHP to USD for price comparisons meant constantly glancing at another monitor for exchange rates. Overall, it was tedious and error-prone work that nearly made me give up.
That’s when I finally considered using the one feature Excel enthusiasts always talk about—Power Query. There are several other power features that Excel provides, but I heard Power Query was the perfect tool for my specific problem. So, after watching a few YouTube tutorials, I immediately realized how much time I could save once I started using Power Query Editor to clean up all the messy data I had scraped from the internet.
How I Use Power Query to Clean Dirty Data
Before long, I settled on a simple step-by-step process in Power Query Editor. Here’s exactly how I cleaned up those messy CSV exports and turned them into a consistent, well-structured spreadsheet.
First, I imported my data into the Power Query Editor by opening a blank workbook, clicking Data on the ribbon, and selecting From Text/CSV. I then selected my CSV file and clicked Transform Data to open it with Power Query Editor.
I started by fixing the Date column. Since I was collecting data from two sources with timelines that were 12 hours apart, I needed to standardize the dates. It turned out to be incredibly simple. I selected the Date column, right-clicked to open the context menu, and chose Change Type > Using Locale. In the pop-up menu, I set the type to Date and selected English (United States) to ensure consistent formatting. Power Query then automatically recognized different formats, such as MM/DD/YYYY, YYYY/MM/DD, and variants using symbols like DD-MM-YY, then standardized them all to a single date format.
Now that the date format was fixed, I just needed to clean up the column. There are different ways to clean up an Excel spreadsheet, but since all the errors are bad entries caused by my scraping tool, I simply opted to use the Remove Errors filter to remove those entries. This step eliminated null values and any remaining problematic data that wasn’t recorded properly, leaving me with clean, consistent dates across all my files.
Next, I tackled the brand name chaos using the Replace Values function. Like earlier, I selected the target column, then right-clicked to open the context menu and selected Replace Values. In the pop-up, I entered the inconsistent value in the Value to Find field and my standard value in the Replace With field.
I did it about two more times and finally turned all those “gigabyte” and “GIGABTYE Inc.” into a single consistent “GIGABYTE” across all my files. I did the same with AMD and now the entire Brand column for GPUs uses standard brand names.
Lastly, I needed to standardize the Price column. This was a bit tricky because many of the entries were just numbers, without any indication of whether they were in US dollars or Philippine pesos, which made conversion difficult. Thankfully, the Store column was right beside it, allowing me to easily convert all prices from Shoppe PH into USD.
The way I handled this problem was to first use Replace Values to remove all $, ₱, USD, and PHP from the Price column. Then, I converted the values into decimals by using Change Type > Decimal. With a clean column containing only decimals, I created a new column by clicking Column (ribbon) > Custom Column and named it “Price USD”. I then entered the following formula:
if [Store] = "Shopee PH"
then Number.Round([Price] / 55, 2)
else Number.Round([Price], 2)
Then hit Okay. This converted all prices from Shopee PH into USD for easy comparison.
The formula is written in the Power Query M formula language. It’s a fairly straightforward scripting language with a simple core concept that’s easy to grasp. You don’t need to learn the entire language to use it effectively—I didn’t. Instead, I just use it as a reference whenever I need to build my own formulas.
Now that everything’s fixed, I simply closed the Power Query Editor and chose Keep to save all the changes in the created workbook.
Power Query Saved Me Hours of My Time
One reason I avoided Power Query was that I assumed it would be another complicated feature that would take too much time to learn. But it turned out to be much easier than I expected. Instead of running endless find-and-replace commands, I could use Power Query to quickly and automatically clean the data from my scraping tools.
What surprised me even more about Power Query was that every command I made was recorded and could be repeated over and over again. This essentially gives you an automated cleanup script that can transform messy CSV files into clean, structured spreadsheets—perfect if you’re creating custom datasets with web scraping, since those tools often output dirty data.
For anyone dealing with repetitive data cleanup, inconsistent formats, or multiple data sources, Power Query turns those burdens into a simple, automated process. Instead of carving out hours every week for manual fixes, you press Refresh and start analyzing. It’s the Excel feature I wish I’d embraced long ago. Once you experience the power of an automated, reproducible cleanup script, there’s no going back.