I discovered yesterday some amazing functions available in PostgreSQL when you’re using a GROUP BY
query. I’ve wanted these functions since at least 2003, when I was working mostly with Oracle, and now it turns out that Postgres has them! Imagine you want a report on duplicate item names in your database, so you write a query like this:
SELECT name, COUNT(*)
FROM items
GROUP BY name
HAVING COUNT(*) > 1;
That will show you the duplicate names, but you’d really like a list of their IDs also. You could do this with a sub-query, but what if there were a function that could take all the grouped items and concatenate them? This function would be sort of like SUM, but for strings (or things you could coerce to a string). Even better would be if it works like the join
method you find in Perl/Python/Ruby. Well, Postgres has this function:
SELECT string_agg(id::text, ', '), name, COUNT(*)
FROM items
GROUP BY name
HAVING COUNT(*) > 1;
The string_agg
function takes two strings: first the strings to join, and second the delimiter. (Here we have to cast id
to text because it is an integer type.) So now you can see the individual IDs of your duplicates, which makes it easy to operate on them if that’s what you want. Of course these functions are useful in more cases than just finding duplicates, but finding duplicates is a simple example.
There are similar methods called array_agg
and xmlagg
to get things in non-string form.