One of the things that I often had a difficult time with was finding a way to select the newest records from a table that had one or more columns to group by and then one column to order by. Let’s use the following table as an example:
Version | Name | Group |
---|---|---|
1 | Viejo | A |
2 | Joven | A |
1 | Oldest | B |
2 | Old | B |
3 | New | B |
1 | Jovencito | C |
4 | Newest | B |
Using the above table, I want to be able to pull all of the rows that have the newest version
based on the_group
. I also only want to have to query the table once. With this in mind, after a lot of research, I found two solutions. The first solution, which is the shorter one, involves using the DISTINCT ON
clause:
SELECT DISTINCT ON (the_group) *
FROM my_table
ORDER BY the_group DESC, version DESC;
SELECT DISTINCT ON (the_group) *
FROM (
SELECT 1 AS version, 'Viejo' AS name, 'A' AS the_group
UNION SELECT 2 AS version, 'Joven' AS name, 'A' AS the_group
UNION SELECT 1 AS version, 'Oldest' AS name, 'B' AS the_group
UNION SELECT 2 AS version, 'Old' AS name, 'B' AS the_group
UNION SELECT 3 AS version, 'New' AS name, 'B' AS the_group
UNION SELECT 1 AS version, 'Jovencito' AS name, 'C' AS the_group
UNION SELECT 4 AS version, 'Newest' AS name, 'B' AS the_group
) t
ORDER BY the_group DESC, version DESC;
The second solution that I was able to concoct is a bit less elegant and uses a WINDOW FRAME
:
SELECT version, name, the_group
FROM (
SELECT *,
first_value(version) OVER (PARTITION BY the_group ORDER BY version DESC) AS latest_version
FROM my_table
) t
WHERE version = latest_version;
SELECT version, name, the_group
FROM (
SELECT *,
first_value(version) OVER (PARTITION BY the_group ORDER BY version DESC) AS latest_version
FROM (
SELECT 1 AS version, 'Viejo' AS name, 'A' AS the_group
UNION SELECT 2 AS version, 'Joven' AS name, 'A' AS the_group
UNION SELECT 1 AS version, 'Oldest' AS name, 'B' AS the_group
UNION SELECT 2 AS version, 'Old' AS name, 'B' AS the_group
UNION SELECT 3 AS version, 'New' AS name, 'B' AS the_group
UNION SELECT 1 AS version, 'Jovencito' AS name, 'C' AS the_group
UNION SELECT 4 AS version, 'Newest' AS name, 'B' AS the_group
) t
) t
WHERE version = latest_version;
The downside of the above solution is the fact that you are using at least one sub-query. The other downside is the fact that you have to list each field that you want to bring back if you don’t want to bring back the extra field indicating the latest_version
. One last downside about the second solution is the fact that it does seem to take a little bit longer than the first solution. For these reason I feel that the solution using the DISTINCT ON
clause is the best one.