Excel Formulas for Data Cleaning
Data Cleaning in Excel involves using Excel formulas to remove extra spaces, delete duplicate entries, fill in missing values, and change the data formatting to make it better.
The main aim of using Excel formulas for data cleaning is to detect and correct any errors, inconsistencies, and other presentation issues. We can also manipulate data in Excel to get accurate, reliable, and complete data.
This article provides the top 10 Excel formulas for data cleaning and manipulation.
- TRIM Function
- TEXTJOIN Function
- Remove Duplicates Tool
- Find and Replace Feature
- CLEAN Function
- Clear Formatting Feature
- Text to Column Tool
- LOWER, UPPER, PROPER Functions
- Go to Special Tool
- Paste Special Tool
1. TRIM Function
You can use the TRIM function to remove all extra spaces from the beginning, middle, or end of the text (except single spaces).
Example: Let’s say we have a list of idioms with extra spaces between them, and we want to remove those spaces.
Solution:
Step 1: Select “Cell B2” and enter the Excel formula,
=TRIM(A2)
Step 2: Press “Enter”.
This Excel formula removes all the extra spaces from Cell A2.
Step 3: Now, drag the formula to the rest of the cells.
Result: The function successfully removes extra spaces between words and keeps only a single space between words.
2. TEXTJOIN Function
The TEXTJOIN function is useful when you want to combine text from several columns or rows into one cell.
Example: The below data is present in different rows. Here, we want to join data from 6 rows into 1 row.
Solution:
Step 1: Select “Cell B8” and enter the formula:
TEXTJOIN(“ “,TRUE,B1:B6)
Step 2: Press” Enter”.
Result: The formula combines sentences from different cells into one.
Note: Enrolling in an advanced Excel course can further enhance your skills and efficiency in handling complicated data tasks like this.
3. Remove Duplicates
You can use it to remove duplicate values from the selected range.
Example: Consider the below data of cities and their famous attractions for a travel itinerary. In this list, the data for New York appears twice. We want to eliminate this duplicate data for New York and retain only the unique city names.
Solution
Step 1: Follow the given steps:
- Select the data
- Go to the “Data” tab
- Select “Remove Duplicates” under the “Data Tools” group.
Step 2: A Remove Duplicates dialog box will appear.
Click the “Select All” option and select “OK”.
Step 3: A message box will appear showing how many duplicated values are found and removed and how many unique values remain.
Click on “OK”.
Result: Excel has removed New York duplicate rows from the list and displays only unique city names.
4. Find and Replace
You can use the Find and Replace feature to select a specific value or text and replace it with another value.
Example: The data below shows a list of products and their inventory status. Here, we want to replace “Out of Stock” with “In Stock“.
Solution:
Step 1:
- Select “Column C”
- Go to the “Home” tab
- Select “Find & Select” under the “Editing” group
- Click on “Replace”.
Step 2: A “Find and Replace” window will open.
- Enter “Out of Stock” in “Find What”
- Enter “In Stock” in “Replace with”
- Then click “Replace All”.
Step 3: A pop-up message highlighting the number of replacements done in the data will appear.
Result: We have successfully replaced “Out of Stock” with “In Stock”, as shown below.
5. CLEAN Function
You can use the CLEAN function to remove line breaks and non-printable characters (spaces, page breaks, etc.)
Example: The below data consists of contact numbers from an e-commerce website. Here, we want to remove non-printable characters from the data.
Solution:
Step 1: Select “Cell C2” and enter the formula:
=CLEAN(B2)
Step 2: Press “Enter”.
The formula cleans the data of Cell B2.
Step 3: Drag the formula to the rest of the cells.
Result: The formula removes all non-printing characters and displays the clean data.
6. Clear Formatting
If you want to clear or delete all formatting (bold, colors, font styles, etc.) from your data, you can use this feature.
Example: In the below data, we have highlighted cells with total sales over $500 in green and cells below $100 in orange. We want to remove this formatting.
Solution:
Step 1:
- Select the data
- Go to the “Home” tab
- Select “Conditional Formatting” under the “Styles” group
Step 2: Now, select “Clear Rules from Selected cells” under the “Clear Rules” option, as shown below.
Result: The feature removes all the applied formatting from the data.
7. Text to Column
You can use Text to Column to split the content of a single cell into multiple different columns.
Example: Suppose we have some Instagram users. We want usernames, followers, and places in different columns.
Solution:
Step 1: Insert three new columns, as shown below.
Step 2:
- Select the data.
- Select the “Data” Tab
- Click the “Text to Column” in the “Data Tools” group.
Step 3: A dialog box for “Convert Text to Column Wizard” will appear.
- Go to the “Original Data Type” section
- Select the “Delimited” option
- Click “Next”.
Step 4: Now, select “Comma” from the Delimiters.
Step 5:
- Click on “General” under “Column data format”
- Select the “Destination”
- Click on the “Finish” button, as shown below.
Result: We have successfully separated username, followers, and place using the “Text to Column” option.
8. LOWER, UPPER, PROPER Functions
You can change the capitalization of any letter(s) in a text string as per your requirement using the following functions:
=LOWER (text) = Convert to lower case,
=UPPER(text) = Convert to upper case,
=PROPER (text) = Convert to sentence case (capitalize the first letter for all words in a sentence).
Example: We will convert the sentence “cLEaR aS CrYStaL” into lower, upper, and sentence cases using the LOWER, UPPER, and PROPER functions, respectively.
Solution:
Enter the below formula in the respective cell and press “Enter”.
Result: The final conversion will be as follows:
9. Go to Special Tool
Go to Special is a special tool for data cleaning in Excel that allows you to find and select various random cells depending on the data it has. That is, you can find cells that are blank, have Excel formulas, constants, conditional formatting, etc.
Example: Consider the following monthly expenditure ($) from January to May. We want to find and delete blank rows.
Solution:
Step 1:
- Select the data table
- Go to the “Home” tab
- Select the “Find & Select” option under “Editing” group
- Click on the “Go to Special” option.
Step 2: A “Go To Special” dialog window will appear.
- Select the checkbox next to the “Blanks” option
- Click “OK”.
This highlights all the blank cells in the data, as shown below.
Step 3: Now, to delete blank cells,
- Go to the “Home” tab
- Select “Delete” under the “Editing” group
- Click on “Delete Sheet Rows”.
Result: The Excel formula successfully removed all blanks from the data.
10. Paste Special
You can use the Paste Special method to convert numbers stored as text into numbers.
Example: In the below data, students’ scores are present in text format. We have to convert the text format of the score into the numeric format.
Solution:
Step 1: Select “Cell D2” and enter 1.
Step 2: Copy Cell D2 and select B2: B6 cell range.
Step 3: Right-click the selected range and select the “Paste Special” option, as shown below.
A “Paste Special” window will appear.
Step 4: Click on the “All” option under “Paste” and “Multiply” under the “Operation” section.
Step 5: Click “OK”.
Result: The format of the data changed to a numeric format.
These were the top 10 tools, Excel formulas, and functions you can use for data cleaning and manipulation.
Frequently Asked Questions (FAQs)
Q1. Can you automate data cleaning in Excel?
Answer: Yes, you can automate data cleaning in Excel using the power query in Excel. Using this tool, you simply perform the steps once, and the query records your every action. Thus, whenever you run the query, it will automatically perform all the actions in the same order.
Q2. How do I remove incomplete data in Excel?
Answer: You can use several tools to remove incomplete data, like extra spaces, missing information, or incorrect information in Excel. The best features are the Go to Special and Paste Special dialog box. These tools have numerous functions and operations that you can combine to perform several cleaning functions.
Q3. What are the benefits of data cleaning?
Answer: Data cleaning in Excel makes data organized, clear, and presentable. Here are some advantages of cleaning data in Excel:
- With organized data tables, you can quickly and easily find any piece of information
- When the data is neat and presentable, you clearly know what decisions can benefit your business.
- With clean and orderly data, there is a low chance of any repetitions, missing information, and other errors.
Recommended Articles
This article is a complete guide to the top 10 Excel formulas for data cleaning in Excel. It provides a detailed explanation with examples and a downloadable Excel template. To learn more, check out our other articles below.