📌 Disclosure: This post contains affiliate links. If you purchase through these links, I may earn a small commission at no extra cost to you. I only recommend resources I genuinely believe will help you.
Duplicate rows in a database are one of the most common — and most frustrating — data quality problems every SQL user faces. Whether you're cleaning financial data, preparing a dataset for analysis, or maintaining a production database, knowing how to delete duplicate rows in SQL while keeping one is an essential skill. This guide covers every method, for every situation, with clear examples.
DELETE statement on real data. Deleted rows cannot be recovered without a backup.
- What Are Duplicate Rows and Why Do They Happen?
- How to Find Duplicate Rows First
- Method 1: Using ROW_NUMBER() with a CTE (Recommended)
- Method 2: Using a Temporary Table
- Method 3: Using GROUP BY and MIN()
- Method 4: Without a Primary Key
- Method 5: Using DISTINCT to Recreate the Table
- Which Method Should You Use?
- Best Books and Courses to Master SQL
- Frequently Asked Questions
1. What Are Duplicate Rows and Why Do They Happen?
A duplicate row is when two or more rows in a database table contain identical values across all — or the key — columns. Here's a simple example of a customers table with duplicates:
| id | name | city | |
|---|---|---|---|
| 1 | Sara Ahmed | sara@email.com | Istanbul |
| 2 | Sara Ahmed | sara@email.com | Istanbul |
| 3 | James Brown | james@email.com | London |
| 4 | Mia Chen | mia@email.com | Singapore |
| 5 | Mia Chen | mia@email.com | Singapore |
| 6 | Mia Chen | mia@email.com | Singapore |
■ Red rows = duplicates to delete | ■ Green rows = rows to keep
Duplicates typically happen due to:
- 🔁 Data imports — the same file uploaded twice
- 🔗 Failed JOIN operations — producing multiple matching rows
- 🖱️ User error — submitting a form twice
- 🔄 ETL pipeline bugs — data loaded multiple times during processing
- ❌ Missing constraints — no UNIQUE or PRIMARY KEY to prevent duplicates
2. How to Find Duplicate Rows First
Before deleting anything, always find and confirm your duplicates first. Here's how:
name, email, city,
COUNT(*) AS duplicate_count
FROM customers
GROUP BY name, email, city
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;
| name | city | duplicate_count | |
|---|---|---|---|
| Mia Chen | mia@email.com | Singapore | 3 |
| Sara Ahmed | sara@email.com | Istanbul | 2 |
3. Method 1: Using ROW_NUMBER() with a CTE
SELECT
id,
ROW_NUMBER() OVER (
PARTITION BY name, email, city
ORDER BY id
) AS row_num
FROM customers
)
SELECT * FROM duplicates_cte WHERE row_num > 1;
-- Shows you exactly which rows will be deleted
SELECT
id,
ROW_NUMBER() OVER (
PARTITION BY name, email, city
ORDER BY id
) AS row_num
FROM customers
)
DELETE FROM duplicates_cte WHERE row_num > 1;
PARTITION BY groups rows with the same name, email, and city together. ROW_NUMBER() assigns 1 to the first row in each group and 2, 3, 4... to the rest. We then delete everything with row_num > 1 — keeping exactly one row per group.
4. Method 2: Using a Temporary Table
CREATE TABLE customers_clean AS
SELECT DISTINCT * FROM customers;
-- Step 2: Delete all rows from the original table
DELETE FROM customers;
-- Step 3: Re-insert the clean unique rows
INSERT INTO customers
SELECT * FROM customers_clean;
-- Step 4: Drop the temp table
DROP TABLE customers_clean;
SELECT DISTINCT — which works well when ALL columns are identical duplicates. If your duplicates differ in some columns (like an auto-increment ID), use Method 1 instead.
5. Method 3: Using GROUP BY and MIN()
id.
WHERE id NOT IN (
SELECT MIN(id)
FROM customers
GROUP BY name, email, city
);
id for each unique combination of name, email, and city. The outer DELETE removes every row whose ID is NOT in that list — effectively keeping only the first occurrence of each duplicate.
6. Method 4: Delete Duplicate Rows Without a Primary Key
Consider this table with no ID column — every duplicate row is completely identical:
| name | city | |
|---|---|---|
| Sara Ahmed | sara@email.com | Istanbul |
| Sara Ahmed | sara@email.com | Istanbul |
| James Brown | james@email.com | London |
| James Brown | james@email.com | London |
ALTER TABLE customers ADD COLUMN temp_id INT AUTO_INCREMENT PRIMARY KEY;
-- Step 2: Delete duplicates using the new temp_id
DELETE FROM customers
WHERE temp_id NOT IN (
SELECT MIN(temp_id)
FROM customers
GROUP BY name, email, city
);
-- Step 3: Remove the temporary column
ALTER TABLE customers DROP COLUMN temp_id;
DELETE FROM customers
WHERE ctid NOT IN (
SELECT MIN(ctid)
FROM customers
GROUP BY name, email, city
);
ctid method is the cleanest way to remove duplicates without a primary key — no table alteration needed. It uses PostgreSQL's internal row identifier to distinguish otherwise identical rows.
7. Method 5: Using DISTINCT to Recreate the Table
SELECT DISTINCT and rename it. This works well for large tables where performance matters, or when you want a completely fresh start.
CREATE TABLE customers_new AS
SELECT DISTINCT name, email, city
FROM customers;
-- Step 2: Drop the original table
DROP TABLE customers;
-- Step 3: Rename the new table to the original name
ALTER TABLE customers_new RENAME TO customers;
DROP TABLE. Double-check row counts: SELECT COUNT(*) FROM customers_new;
8. Which Method Should You Use?
| Situation | Best Method |
|---|---|
| Table has a primary key (ID column) | Method 1 — ROW_NUMBER() CTE ✅ Best |
| No primary key, all columns identical | Method 4 — Add temp ID or use ctid |
| You want maximum simplicity | Method 3 — GROUP BY + MIN() |
| Large table, performance matters | Method 5 — DISTINCT into new table |
| You want a safe, reversible process | Method 2 — Temporary table |
| PostgreSQL specifically | Method 4 — ctid approach |
| MySQL 8+ | Method 1 — ROW_NUMBER() CTE |
| Older MySQL versions (<8) | Method 3 — GROUP BY + MIN() |
9. Best Books and Courses to Master SQL
If you want to go beyond removing duplicates and truly master SQL for data analysis, these resources will take you there:
Recommended Online Courses:
- 🎓 Introduction to SQL on DataCamp — Hands-on interactive SQL learning. Type real queries in the browser and get instant feedback. (Affiliate link — coming soon)
- 🎓 SQLZoo — Free, interactive SQL tutorials. Practice window functions and data cleaning queries directly in the browser at sqlzoo.net — completely free.
Key Takeaways
- ✅ Always find duplicates first with a
SELECT ... GROUP BY ... HAVING COUNT(*) > 1query before deleting anything. - ✅ The ROW_NUMBER() + CTE method is the most reliable approach and works in all modern databases.
- ✅ For tables without a primary key, add a temporary ID column or use PostgreSQL's built-in
ctid. - ✅ The GROUP BY + MIN() method is the simplest approach when you have an ID column and want to keep the lowest ID.
- ✅ Always back up your data or test on a copy before running any DELETE statement in production.
- ✅ Prevent future duplicates by adding a
UNIQUEconstraint:ALTER TABLE customers ADD CONSTRAINT unique_email UNIQUE (email);
Frequently Asked Questions
ORDER BY id ASC (which keeps the lowest ID / oldest row), use ORDER BY created_date DESC or ORDER BY id DESC to keep the most recent row in each duplicate group.DELETE removes specific rows based on a condition — perfect for removing duplicates while keeping others. TRUNCATE removes ALL rows from a table instantly with no conditions. Always use DELETE with a WHERE clause when dealing with duplicates — never TRUNCATE unless you want to empty the entire table.ALTER TABLE customers ADD CONSTRAINT unique_email UNIQUE (email); — this prevents any future INSERT from adding a row with a duplicate email address. The database will throw an error if someone tries.SELECT DISTINCT returns unique rows in your query results but does NOT delete duplicates from the actual table. To permanently remove duplicates, you need to use one of the DELETE methods covered in this guide.PARTITION BY tells SQL which columns to use when grouping rows together as "duplicates." For example, PARTITION BY name, email, city groups rows that have the same name, email, AND city. ROW_NUMBER() then assigns 1 to the first row in each group and increments from there. Change the PARTITION BY columns to match whichever columns define "duplicate" in your specific table.Conclusion: Clean Data is Better Data
Duplicate rows are a fact of life in any real-world database — but now you have every tool you need to handle them confidently. Whether your table has a primary key or not, whether you're using MySQL, PostgreSQL, or SQL Server, there's a method in this guide that works for your exact situation.
Remember the golden rule: always preview before you delete. Run your SELECT version first, confirm the rows look right, then switch to DELETE. That one habit will save you from a lot of headaches.
Data cleaning is one of the most valuable skills a SQL analyst can have — and removing duplicates is where it starts. Master this, and you'll be handling real-world data problems like a professional. 🚀
📌 Keep Building Your SQL Skills!
Check out our Complete Beginner's Guide to SQL and How to Use SQL for Financial Analysis — perfect next reads to keep growing your skills.

0 Comments