I am maintaining historical versions of a row in a table. The idea is that a row is not really modified but a new row is created with a reference to the original row. (I guess a sort of generational view or row versioning...)
My problem is that I cannot come up with a query that will give me a list of the rows with only the latest version of the row included. My confusion is the self join -- the possibility (or lack of info before-hand) of how many parent/child rows there are from a given original row.)
Given the following table and data:
Code Snippet
CREATE TABLE Orders (
[OrderId] INT IDENTITY (1, 1),
[ClientName] VARCHAR(10),
[OriginalOrderId] INT
)
GO
INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Peter', NULL) -- should go in as 1
INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Joseph', NULL) -- should go in as 2
INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Mary', NULL) -- should go in as 3
INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Jane', 2) -- should go in as 4 ("replaces" 2)
INSERT INTO Orders ([ClientName], [OriginalOrderId]) VALUES ('Paul', 4) -- should go in as 5 ("replaces" 4)
GO
SELECT * FROM Orders
-- need a query that will return me rows 1, 3, and 5
GO
DROP TABLE Orders
GO
The query should return me
Peter
Paul
Mary
I think another approach would be to replace the IDENTITY column with just an INT while adding a [Generation] or [Version] column. The join would be a simple self-join that figures out the maximum version value for a given Id. I will use this if I have to but I kinda like the idea of not having to manage the Id values myself.
What you have is fine. You just need to add an ORDER BY clause to the SELECT:
SELECT * FROM Orders
ORDER BY OrderID
or if you want the top 3:
SELECT TOP 3 * FROM Orders
ORDER BY OrderID
Adamus
|||Ok revisiting this...you want to replace records?
This is done through UPDATE not INSERT.
The solution to your problem is adding a LastModified smalldatetime in conjuntion with the IDENTITY field.
Adamus
|||Thank you for responding Adamus but you are not reading my post fully.
I am fully aware that UPDATE is for modifying existing records but that is not what I am trying to accomplish. My requirement is that instead of overwriting the existing row with the new columns, I need to insert a new row with the new values with that new row maintaining a "reference" to the original row. What I am looking for is a query that will return that "newest" from the set of chained rows. (It is clearly layed out if you look at the comments in the code snippet.)
|||To do it with this design would likely require cursors or some other iterative approach. Your second idea is the right way to go as that would be a more set based approach like below:
Code Snippet
CREATE TABLE Orders (
[OrderId] INT IDENTITY (1, 1),
[ClientName] VARCHAR(10),
[ClientID] INT
)
INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Peter', 1) -- should go in as 1
INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Joseph', 2) -- should go in as 2
INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Mary', 3) -- should go in as 3
INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Jane', 2) -- should go in as 4 ("replaces" 2)
INSERT INTO Orders ([ClientName], [ClientID]) VALUES ('Paul', 2) -- should go in as 5 ("replaces" 4)
SELECT ClientName
FROM
Orders
INNER JOIN
(
SELECT ClientID, MAX(OrderID) AS MaxorderID
FROM Orders
GROUP BY ClientID
) gr
ON Orders.ClientID = gr.ClientID
AND Orders.OrderID = gr.Maxorderid
go
|||For clarity and organization, this should be accomplished with a dbo.History table.
You should not be trying to accomplish this in a single table. The documentation and query would be a maintenance nightmare.
You shouldn't be trying to re-invent the wheel on this. History tables are common practice.
Adamus
|||Thanks Dave. I figured that was the way I would have to do it but I like to double check these things. In my case I need to maintain the history order (i.e. Joseph --> Jane --> Paul) so I will get rid of the IDENTITY column and all three of those rows will have the same OrderId value. I will also add a [Generation] column and the set them to 1, 2, 3 respectively (for J-->J-->P).
- Jason
|||Adamus,
I greatly simplified the issue for purposes of posting. I cannot use a history file in this case because technically the "old" rows are still valid -- just happens that there is a "newer" version available. So in this case they all must be in the same table.
Just curious, how would your suggestion of a history table solve my query question? I would still be in the same boat trying to determine the newest version of a given set of rows. Like David mentioned, you would have to resort to cursors or recursive queries since maintaining the parent-child relationship does not automatically tell you how many levels you need to go down to find the last node.
|||
Jason Callas wrote:
Adamus,
I greatly simplified the issue for purposes of posting. I cannot use a history file in this case because technically the "old" rows are still valid -- just happens that there is a "newer" version available. So in this case they all must be in the same table.
Just curious, how would your suggestion of a history table solve my query question? I would still be in the same boat trying to determine the newest version of a given set of rows. Like David mentioned, you would have to resort to cursors or recursive queries since maintaining the parent-child relationship does not automatically tell you how many levels you need to go down to find the last node.
Jason,
As I have mentioned, it is very common that duplicate records are stored in a history table that houses the duplicate with a timestamp. An Instead Of INSERT trigger is used on the table to check for duplicates and insert duplicates into the history table.
A simply join to the history table will expose the chronology. No cursor is required.
Happy coding,
Adamus
|||
Jason Callas wrote:
Thanks Dave. I figured that was the way I would have to do it but I like to double check these things. In my case I need to maintain the history order (i.e. Joseph --> Jane --> Paul) so I will get rid of the IDENTITY column and all three of those rows will have the same OrderId value. I will also add a [Generation] column and the set them to 1, 2, 3 respectively (for J-->J-->P).
- Jason
(praying to God that I will never have to touch this code)
Adamus
No comments:
Post a Comment