This snippet was handy for to remove duplicate rows in SQL. These had crept into a Postgres database table as a result of a dodgy join. This is sort of a note to self, really.
There are probably more than a couple of ways of doing this but the best example I came up with after reading around is:
DELETE FROM table_name
WHERE ctid NOT IN (
SELECT min(ctid)
FROM table_name
GROUP BY field1, field2);
The reason this works revolves around the ctid
field. This is a default field you don’t have to specify that exists everywhere, so you can use it to remove duplicate rows (which are identified here by duplicates/groups created with the fields field1
and field2
).
Useful references:
- This approach https://stackoverflow.com/questions/26769454/how-to-delete-duplicate-rows-without-unique-identifier
- Slightly alternative syntax at https://stackoverflow.com/questions/6583916/delete-duplicate-records-in-postgresql/45606037#45606037
- Further details at https://www.postgresonline.com/journal/archives/22-Deleting-Duplicate-Records-in-a-Table.html for the
ctid
info.
It might well be that if you’re using MySQL/MariaDB/MS SQL or any of those other SQL variants, that this doesn’t apply… I’ll update this if I ever need to use those versions to remove duplicate lines from a table. Meanwhile you might be interested in a list of SQL tips and tricks that I’m putting together — not so much to do with syntax as approaches in the wild.