One of the things that I often had a difficult time with was finding a way to select the newest records from a table that had one or more columns to group by and then one column to order by. Let’s use the following table as an example:

Version Name Group
1 Viejo A
2 Joven A
1 Oldest B
2 Old B
3 New B
1 Jovencito C
4 Newest B

Using the above table, I want to be able to pull all of the rows that have the newest version based on the_group. I also only want to have to query the table once. With this in mind, after a lot of research, I found two solutions. The first solution, which is the shorter one, involves using the DISTINCT ON clause:


SELECT DISTINCT ON (the_group) *
FROM my_table
ORDER BY the_group DESC, version DESC;

SELECT DISTINCT ON (the_group) *
FROM (
  SELECT 1 AS version, 'Viejo' AS name, 'A' AS the_group
  UNION SELECT 2 AS version, 'Joven' AS name, 'A' AS the_group
  UNION SELECT 1 AS version, 'Oldest' AS name, 'B' AS the_group
  UNION SELECT 2 AS version, 'Old' AS name, 'B' AS the_group
  UNION SELECT 3 AS version, 'New' AS name, 'B' AS the_group
  UNION SELECT 1 AS version, 'Jovencito' AS name, 'C' AS the_group
  UNION SELECT 4 AS version, 'Newest' AS name, 'B' AS the_group
) t
ORDER BY the_group DESC, version DESC;

The second solution that I was able to concoct is a bit less elegant and uses a WINDOW FRAME:


SELECT version, name, the_group
FROM (
  SELECT *,
    first_value(version) OVER (PARTITION BY the_group ORDER BY version DESC) AS latest_version
  FROM my_table
) t
WHERE version = latest_version;

SELECT version, name, the_group
FROM (
  SELECT *,
    first_value(version) OVER (PARTITION BY the_group ORDER BY version DESC) AS latest_version
  FROM (
    SELECT 1 AS version, 'Viejo' AS name, 'A' AS the_group
    UNION SELECT 2 AS version, 'Joven' AS name, 'A' AS the_group
    UNION SELECT 1 AS version, 'Oldest' AS name, 'B' AS the_group
    UNION SELECT 2 AS version, 'Old' AS name, 'B' AS the_group
    UNION SELECT 3 AS version, 'New' AS name, 'B' AS the_group
    UNION SELECT 1 AS version, 'Jovencito' AS name, 'C' AS the_group
    UNION SELECT 4 AS version, 'Newest' AS name, 'B' AS the_group
  ) t
) t
WHERE version = latest_version;

The downside of the above solution is the fact that you are using at least one sub-query. The other downside is the fact that you have to list each field that you want to bring back if you don’t want to bring back the extra field indicating the latest_version. One last downside about the second solution is the fact that it does seem to take a little bit longer than the first solution. For these reason I feel that the solution using the DISTINCT ON clause is the best one.

Categories: BlogSQL

Leave a Reply

Your email address will not be published. Required fields are marked *