How to Delete Duplicate Rows in SQL and Keep One: Complete Guide with Examples

📌 Disclosure: This post contains affiliate links. If you purchase through these links, I may earn a small commission at no extra cost to you. I only recommend resources I genuinely believe will help you.

Duplicate rows in a database are one of the most common — and most frustrating — data quality problems every SQL user faces. Whether you're cleaning financial data, preparing a dataset for analysis, or maintaining a production database, knowing how to delete duplicate rows in SQL while keeping one is an essential skill. This guide covers every method, for every situation, with clear examples.

⚠️ Before you start: Always back up your table or test your queries on a copy before running any DELETE statement on real data. Deleted rows cannot be recovered without a backup.

1. What Are Duplicate Rows and Why Do They Happen?

A duplicate row is when two or more rows in a database table contain identical values across all — or the key — columns. Here's a simple example of a customers table with duplicates:

idnameemailcity
1Sara Ahmedsara@email.comIstanbul
2Sara Ahmedsara@email.comIstanbul
3James Brownjames@email.comLondon
4Mia Chenmia@email.comSingapore
5Mia Chenmia@email.comSingapore
6Mia Chenmia@email.comSingapore

■ Red rows = duplicates to delete  |  ■ Green rows = rows to keep

Duplicates typically happen due to:

  • 🔁 Data imports — the same file uploaded twice
  • 🔗 Failed JOIN operations — producing multiple matching rows
  • 🖱️ User error — submitting a form twice
  • 🔄 ETL pipeline bugs — data loaded multiple times during processing
  • Missing constraints — no UNIQUE or PRIMARY KEY to prevent duplicates

2. How to Find Duplicate Rows First

Before deleting anything, always find and confirm your duplicates first. Here's how:

Find all duplicate rows:
SELECT
  name, email, city,
  COUNT(*) AS duplicate_count
FROM customers
GROUP BY name, email, city
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;
Output:
nameemailcityduplicate_count
Mia Chenmia@email.comSingapore3
Sara Ahmedsara@email.comIstanbul2
Good habit: Always run this SELECT query first and review the results before writing any DELETE statement. Know exactly what you're removing before you remove it.

3. Method 1: Using ROW_NUMBER() with a CTE

1
ROW_NUMBER() + CTE
✅ Recommended
This is the most reliable and widely used method for removing duplicates in SQL. It uses a CTE (Common Table Expression) to assign a row number to each duplicate group — then deletes all rows where the row number is greater than 1, keeping only the first occurrence of each duplicate.
Works in:
SQL Server PostgreSQL MySQL 8+ SQLite 3.25+
Step 1 — Preview which rows will be deleted (run first!):
WITH duplicates_cte AS (
  SELECT
    id,
    ROW_NUMBER() OVER (
      PARTITION BY name, email, city
      ORDER BY id
    ) AS row_num
  FROM customers
)
SELECT * FROM duplicates_cte WHERE row_num > 1;
-- Shows you exactly which rows will be deleted
Step 2 — Delete the duplicates, keep one:
WITH duplicates_cte AS (
  SELECT
    id,
    ROW_NUMBER() OVER (
      PARTITION BY name, email, city
      ORDER BY id
    ) AS row_num
  FROM customers
)
DELETE FROM duplicates_cte WHERE row_num > 1;
💡 How it works: PARTITION BY groups rows with the same name, email, and city together. ROW_NUMBER() assigns 1 to the first row in each group and 2, 3, 4... to the rest. We then delete everything with row_num > 1 — keeping exactly one row per group.

4. Method 2: Using a Temporary Table

2
Temporary Table Method
🔵 Simple & Safe
This method creates a temporary copy of your unique rows, clears the original table, and re-inserts the clean data. It's more verbose but very easy to understand — great for beginners or when you want maximum control over the process.
Works in:
MySQL SQL Server PostgreSQL SQLite
SQL Query:
-- Step 1: Create a temp table with only unique rows
CREATE TABLE customers_clean AS
SELECT DISTINCT * FROM customers;

-- Step 2: Delete all rows from the original table
DELETE FROM customers;

-- Step 3: Re-insert the clean unique rows
INSERT INTO customers
SELECT * FROM customers_clean;

-- Step 4: Drop the temp table
DROP TABLE customers_clean;
⚠️ Note: This method uses SELECT DISTINCT — which works well when ALL columns are identical duplicates. If your duplicates differ in some columns (like an auto-increment ID), use Method 1 instead.

5. Method 3: Using GROUP BY and MIN()

3
DELETE with GROUP BY + MIN()
🔵 Classic Method
This method keeps the row with the smallest ID (or any column you choose) from each group of duplicates and deletes all others. It's a classic approach that works well when you have a primary key column like id.
Works in:
MySQL SQL Server PostgreSQL
SQL Query:
DELETE FROM customers
WHERE id NOT IN (
  SELECT MIN(id)
  FROM customers
  GROUP BY name, email, city
);
💡 How it works: The subquery finds the minimum (lowest) id for each unique combination of name, email, and city. The outer DELETE removes every row whose ID is NOT in that list — effectively keeping only the first occurrence of each duplicate.

6. Method 4: Delete Duplicate Rows Without a Primary Key

4
No Primary Key Method
🟡 No Primary Key
This is the scenario many people struggle with — your table has no primary key or unique ID column, and all duplicate rows are truly identical across every column. Here's the safest way to handle it.

Consider this table with no ID column — every duplicate row is completely identical:

nameemailcity
Sara Ahmedsara@email.comIstanbul
Sara Ahmedsara@email.comIstanbul
James Brownjames@email.comLondon
James Brownjames@email.comLondon
Works in:
MySQL SQL Server PostgreSQL
SQL Query — Add a temp row number, then delete duplicates:
-- Step 1: Add a temporary auto-increment column
ALTER TABLE customers ADD COLUMN temp_id INT AUTO_INCREMENT PRIMARY KEY;

-- Step 2: Delete duplicates using the new temp_id
DELETE FROM customers
WHERE temp_id NOT IN (
  SELECT MIN(temp_id)
  FROM customers
  GROUP BY name, email, city
);

-- Step 3: Remove the temporary column
ALTER TABLE customers DROP COLUMN temp_id;
Alternative — PostgreSQL using ctid (no ALTER needed):
-- PostgreSQL only: ctid is a built-in hidden row identifier
DELETE FROM customers
WHERE ctid NOT IN (
  SELECT MIN(ctid)
  FROM customers
  GROUP BY name, email, city
);
💡 PostgreSQL users: The ctid method is the cleanest way to remove duplicates without a primary key — no table alteration needed. It uses PostgreSQL's internal row identifier to distinguish otherwise identical rows.

7. Method 5: Using DISTINCT to Recreate the Table

5
SELECT DISTINCT into New Table
🟣 Clean Slate
Sometimes the simplest approach is to create a brand new clean table using SELECT DISTINCT and rename it. This works well for large tables where performance matters, or when you want a completely fresh start.
Works in:
MySQL SQL Server PostgreSQL SQLite
SQL Query:
-- Step 1: Create a new clean table with no duplicates
CREATE TABLE customers_new AS
SELECT DISTINCT name, email, city
FROM customers;

-- Step 2: Drop the original table
DROP TABLE customers;

-- Step 3: Rename the new table to the original name
ALTER TABLE customers_new RENAME TO customers;
🚨 Warning: This method drops your original table entirely. Make absolutely sure your new table contains all the data you need before running DROP TABLE. Double-check row counts: SELECT COUNT(*) FROM customers_new;

8. Which Method Should You Use?

SituationBest Method
Table has a primary key (ID column)Method 1 — ROW_NUMBER() CTE ✅ Best
No primary key, all columns identicalMethod 4 — Add temp ID or use ctid
You want maximum simplicityMethod 3 — GROUP BY + MIN()
Large table, performance mattersMethod 5 — DISTINCT into new table
You want a safe, reversible processMethod 2 — Temporary table
PostgreSQL specificallyMethod 4 — ctid approach
MySQL 8+Method 1 — ROW_NUMBER() CTE
Older MySQL versions (<8)Method 3 — GROUP BY + MIN()
Our recommendation: If you're unsure which to use, start with Method 1 (ROW_NUMBER + CTE). It works in all modern databases, is the most precise, and lets you preview exactly what will be deleted before committing.

9. Best Books and Courses to Master SQL

If you want to go beyond removing duplicates and truly master SQL for data analysis, these resources will take you there:

📚 Practical SQL, 2nd Edition by Anthony DeBarros — The #1 recommended SQL book for beginners and intermediate users. Covers real-world data tasks including data cleaning, aggregations, and advanced queries. Exactly the kind of SQL skills you need in any analyst role.
📚 SQL for Data Analysis by Cathy Tanimura — Written specifically for analysts who use SQL to solve real problems. Covers window functions (like ROW_NUMBER used above), time-series queries, and data quality checks in depth. Highly practical.
📚 Learning SQL by Alan Beaulieu — A comprehensive, beginner-friendly SQL reference covering all major commands and concepts. Clear explanations with plenty of examples. Great if you want to build a solid foundation from scratch.

Recommended Online Courses:

  • 🎓 Introduction to SQL on DataCamp — Hands-on interactive SQL learning. Type real queries in the browser and get instant feedback. (Affiliate link — coming soon)
  • 🎓 SQLZoo — Free, interactive SQL tutorials. Practice window functions and data cleaning queries directly in the browser at sqlzoo.net — completely free.

Key Takeaways

  • ✅ Always find duplicates first with a SELECT ... GROUP BY ... HAVING COUNT(*) > 1 query before deleting anything.
  • ✅ The ROW_NUMBER() + CTE method is the most reliable approach and works in all modern databases.
  • ✅ For tables without a primary key, add a temporary ID column or use PostgreSQL's built-in ctid.
  • ✅ The GROUP BY + MIN() method is the simplest approach when you have an ID column and want to keep the lowest ID.
  • Always back up your data or test on a copy before running any DELETE statement in production.
  • ✅ Prevent future duplicates by adding a UNIQUE constraint: ALTER TABLE customers ADD CONSTRAINT unique_email UNIQUE (email);

Frequently Asked Questions

❓ How do I delete duplicate rows in SQL and keep the most recent one?
Use the ROW_NUMBER() method (Method 1) but change the ORDER BY inside the window function. Instead of ORDER BY id ASC (which keeps the lowest ID / oldest row), use ORDER BY created_date DESC or ORDER BY id DESC to keep the most recent row in each duplicate group.
❓ How do I delete duplicates in MySQL specifically?
For MySQL 8.0 and above, Method 1 (ROW_NUMBER + CTE) works perfectly. For older MySQL versions (below 8.0) that don't support window functions, use Method 3 (GROUP BY + MIN()) or Method 2 (Temporary Table) instead.
❓ What is the difference between DELETE and TRUNCATE in SQL?
DELETE removes specific rows based on a condition — perfect for removing duplicates while keeping others. TRUNCATE removes ALL rows from a table instantly with no conditions. Always use DELETE with a WHERE clause when dealing with duplicates — never TRUNCATE unless you want to empty the entire table.
❓ How can I prevent duplicate rows from being inserted in the future?
Add a UNIQUE constraint on the column(s) that should be unique. For example: ALTER TABLE customers ADD CONSTRAINT unique_email UNIQUE (email); — this prevents any future INSERT from adding a row with a duplicate email address. The database will throw an error if someone tries.
❓ Can I use SELECT DISTINCT instead of DELETE to remove duplicates?
Yes — but only for reading data, not permanently cleaning it. SELECT DISTINCT returns unique rows in your query results but does NOT delete duplicates from the actual table. To permanently remove duplicates, you need to use one of the DELETE methods covered in this guide.
❓ What does PARTITION BY do in ROW_NUMBER()?
PARTITION BY tells SQL which columns to use when grouping rows together as "duplicates." For example, PARTITION BY name, email, city groups rows that have the same name, email, AND city. ROW_NUMBER() then assigns 1 to the first row in each group and increments from there. Change the PARTITION BY columns to match whichever columns define "duplicate" in your specific table.

Conclusion: Clean Data is Better Data

Duplicate rows are a fact of life in any real-world database — but now you have every tool you need to handle them confidently. Whether your table has a primary key or not, whether you're using MySQL, PostgreSQL, or SQL Server, there's a method in this guide that works for your exact situation.

Remember the golden rule: always preview before you delete. Run your SELECT version first, confirm the rows look right, then switch to DELETE. That one habit will save you from a lot of headaches.

Data cleaning is one of the most valuable skills a SQL analyst can have — and removing duplicates is where it starts. Master this, and you'll be handling real-world data problems like a professional. 🚀

📌 Keep Building Your SQL Skills!

Check out our Complete Beginner's Guide to SQL and How to Use SQL for Financial Analysis — perfect next reads to keep growing your skills.

Post a Comment

0 Comments