How to remove duplicate rows in SQL (works with Postgres)

This snippet was handy for to remove duplicate rows in SQL. These had crept into a Postgres database table as a result of a dodgy join. This is sort of a note to self, really.

There are probably more than a couple of ways of doing this but the best example I came up with after reading around is:

DELETE FROM table_name
WHERE ctid NOT IN (
   SELECT min(ctid)
   FROM   table_name
   GROUP  BY field1, field2);

The reason this works revolves around the ctid field. This is a default field you don’t have to specify that exists everywhere, so you can use it to remove duplicate rows (which are identified here by duplicates/groups created with the fields field1 and field2).

Useful references:

This approach https://stackoverflow.com/questions/26769454/how-to-delete-duplicate-rows-without-unique-identifier
Slightly alternative syntax at https://stackoverflow.com/questions/6583916/delete-duplicate-records-in-postgresql/45606037#45606037
Further details at https://www.postgresonline.com/journal/archives/22-Deleting-Duplicate-Records-in-a-Table.html for the ctidinfo.

The excellent PostgreSQL logo!

It might well be that if you’re using MySQL/MariaDB/MS SQL or any of those other SQL variants, that this doesn’t apply… I’ll update this if I ever need to use those versions to remove duplicate lines from a table. Meanwhile you might be interested in a list of SQL tips and tricks that I’m putting together — not so much to do with syntax as approaches in the wild.

Related Posts

Leave a Comment