Wednesday, March 21, 2012

Query for "latest version" of a given row

I am maintaining historical versions of a row in a table. The idea is that a row is not really modified but a new row is created with a reference to the original row. (I guess a sort of generational view or row versioning...)

My problem is that I cannot come up with a query that will give me a list of the rows with only the latest version of the row included. My confusion is the self join -- the possibility (or lack of info before-hand) of how many parent/child rows there are from a given original row.)

Given the following table and data:

Code Snippet

CREATE TABLE Orders (
[OrderId] INT IDENTITY (1, 1),
[ClientName] VARCHAR(10),
[OriginalOrderId] INT
)
GO

INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Peter', NULL) -- should go in as 1
INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Joseph', NULL) -- should go in as 2
INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Mary', NULL) -- should go in as 3
INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Jane', 2) -- should go in as 4 ("replaces" 2)
INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Paul', 4) -- should go in as 5 ("replaces" 4)
GO

SELECT * FROM Orders
-- need a query that will return me rows 1, 3, and 5
GO

DROP TABLE Orders
GO

The query should return me

Peter

Paul

Mary

I think another approach would be to replace the IDENTITY column with just an INT while adding a [Generation] or [Version] column. The join would be a simple self-join that figures out the maximum version value for a given Id. I will use this if I have to but I kinda like the idea of not having to manage the Id values myself.

What you have is fine. You just need to add an ORDER BY clause to the SELECT:

SELECT * FROM Orders
ORDER BY OrderID

or if you want the top 3:

SELECT TOP 3 * FROM Orders

ORDER BY OrderID

Adamus

|||

Ok revisiting this...you want to replace records?

This is done through UPDATE not INSERT.

The solution to your problem is adding a LastModified smalldatetime in conjuntion with the IDENTITY field.

Adamus

|||

Thank you for responding Adamus but you are not reading my post fully.

I am fully aware that UPDATE is for modifying existing records but that is not what I am trying to accomplish. My requirement is that instead of overwriting the existing row with the new columns, I need to insert a new row with the new values with that new row maintaining a "reference" to the original row. What I am looking for is a query that will return that "newest" from the set of chained rows. (It is clearly layed out if you look at the comments in the code snippet.)

|||

To do it with this design would likely require cursors or some other iterative approach. Your second idea is the right way to go as that would be a more set based approach like below:


Code Snippet

CREATE TABLE Orders (
[OrderId] INT IDENTITY (1, 1),
[ClientName] VARCHAR(10),
[ClientID] INT
)

INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Peter', 1) -- should go in as 1
INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Joseph', 2) -- should go in as 2
INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Mary', 3) -- should go in as 3
INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Jane', 2) -- should go in as 4 ("replaces" 2)
INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Paul', 2) -- should go in as 5 ("replaces" 4)


SELECT ClientName
FROM
Orders
INNER JOIN
(
SELECT ClientID, MAX(OrderID) AS MaxorderID
FROM Orders
GROUP BY ClientID
) gr
ON Orders.ClientID = gr.ClientID
AND Orders.OrderID = gr.Maxorderid

go

|||

For clarity and organization, this should be accomplished with a dbo.History table.

You should not be trying to accomplish this in a single table. The documentation and query would be a maintenance nightmare.

You shouldn't be trying to re-invent the wheel on this. History tables are common practice.

Adamus

|||

Thanks Dave. I figured that was the way I would have to do it but I like to double check these things. In my case I need to maintain the history order (i.e. Joseph --> Jane --> Paul) so I will get rid of the IDENTITY column and all three of those rows will have the same OrderId value. I will also add a [Generation] column and the set them to 1, 2, 3 respectively (for J-->J-->P).

- Jason

|||

Adamus,

I greatly simplified the issue for purposes of posting. I cannot use a history file in this case because technically the "old" rows are still valid -- just happens that there is a "newer" version available. So in this case they all must be in the same table.

Just curious, how would your suggestion of a history table solve my query question? I would still be in the same boat trying to determine the newest version of a given set of rows. Like David mentioned, you would have to resort to cursors or recursive queries since maintaining the parent-child relationship does not automatically tell you how many levels you need to go down to find the last node.

|||

Jason Callas wrote:

Adamus,

I greatly simplified the issue for purposes of posting. I cannot use a history file in this case because technically the "old" rows are still valid -- just happens that there is a "newer" version available. So in this case they all must be in the same table.

Just curious, how would your suggestion of a history table solve my query question? I would still be in the same boat trying to determine the newest version of a given set of rows. Like David mentioned, you would have to resort to cursors or recursive queries since maintaining the parent-child relationship does not automatically tell you how many levels you need to go down to find the last node.

Jason,

As I have mentioned, it is very common that duplicate records are stored in a history table that houses the duplicate with a timestamp. An Instead Of INSERT trigger is used on the table to check for duplicates and insert duplicates into the history table.

A simply join to the history table will expose the chronology. No cursor is required.

Happy coding,

Adamus

|||

Jason Callas wrote:

Thanks Dave. I figured that was the way I would have to do it but I like to double check these things. In my case I need to maintain the history order (i.e. Joseph --> Jane --> Paul) so I will get rid of the IDENTITY column and all three of those rows will have the same OrderId value. I will also add a [Generation] column and the set them to 1, 2, 3 respectively (for J-->J-->P).

- Jason

(praying to God that I will never have to touch this code)

Adamus

No comments:

Post a Comment