Saturday, February 25, 2012

Possible help with query...

Ok, I have a parent table that has a child table, it is a 1-Many relationship. I want to select the single newest child from the child table with the parent table information for each parent value in the table (so if there is a parent record, i only want the newest child record). I have it working, but I am not satisfied with how I have written it. I wanted to see if there were any other opinions on the matter. The way I have it written there *could* be duplicates if somehow the date/time stamps are the same on the created column


here is an example:

DECLARE @.ParentTableTABLE ( ParentIdUNIQUEIDENTIFIER , TitleNVARCHAR(20) )
DECLARE @.ChildTableTABLE ( ChildIdUNIQUEIDENTIFIER , ParentIdUNIQUEIDENTIFIER , TitleNVARCHAR(20) , CreatedDATETIME )

INSERT INTO @.ParentTable ( ParentId , Title )VALUES ('00000000-0000-0000-0000-000000000001' ,'Option A' )
INSERT INTO @.ParentTable ( ParentId , Title )VALUES ('00000000-0000-0000-0000-000000000002' ,'Option B' )
INSERT INTO @.ParentTable ( ParentId , Title )VALUES ('00000000-0000-0000-0000-000000000003' ,'Option C' )

INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('10000000-0000-0000-0000-000000000001' ,'00000000-0000-0000-0000-000000000001' ,'Child A - 1' ,'5/4/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('10000000-0000-0000-0000-000000000002' ,'00000000-0000-0000-0000-000000000001' ,'Child A - 2' ,'5/3/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('10000000-0000-0000-0000-000000000003' ,'00000000-0000-0000-0000-000000000001' ,'Child A - 3' ,'5/2/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('10000000-0000-0000-0000-000000000004' ,'00000000-0000-0000-0000-000000000001' ,'Child A - 4' ,'5/1/2007 4:00:00 PM' )

INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('20000000-0000-0000-0000-000000000001' ,'00000000-0000-0000-0000-000000000002' ,'Child B - 1' ,'5/1/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('20000000-0000-0000-0000-000000000002' ,'00000000-0000-0000-0000-000000000002' ,'Child B - 2' ,'5/10/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('20000000-0000-0000-0000-000000000003' ,'00000000-0000-0000-0000-000000000002' ,'Child B - 3' ,'5/2/2007 4:00:00 PM' )

INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('30000000-0000-0000-0000-000000000001' ,'00000000-0000-0000-0000-000000000003' ,'Child C - 1' ,'5/1/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('30000000-0000-0000-0000-000000000002' ,'00000000-0000-0000-0000-000000000003' ,'Child C - 2' ,'5/2/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('30000000-0000-0000-0000-000000000003' ,'00000000-0000-0000-0000-000000000003' ,'Child C - 3' ,'5/3/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('30000000-0000-0000-0000-000000000004' ,'00000000-0000-0000-0000-000000000003' ,'Child C - 4' ,'5/4/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('30000000-0000-0000-0000-000000000005' ,'00000000-0000-0000-0000-000000000003' ,'Child C - 5' ,'5/5/2007 4:00:00 PM' )
INSERT INTO @.ChildTable ( ChildId , ParentId , Title , Created )VALUES ('30000000-0000-0000-0000-000000000006' ,'00000000-0000-0000-0000-000000000003' ,'Child C - 6' ,'5/6/2007 4:00:00 PM' )

/**
I want to select the information from the child table and the parent table,
selecting ONLY the oldest record. There should only be 3 results.
**/
SELECT
c.ParentId ,
p.TitleAS ParentTitle ,
c.ChildId ,
c.TitleAS ChildTitle ,
c.Created
FROM
@.ChildTable c
INNERJOIN @.ParentTable pON c.ParentId = p.ParentId
INNERJOIN
(
SELECT
subc.ParentId ,
MAX(subc.Created)AS Created
FROM
@.ChildTable subc
INNERJOIN @.ParentTable subptON subc.ParentId = subpt.ParentId
GROUP BY
subc.ParentId
)AS SubPTON p.ParentId = SubPT.ParentIdAND SubPT.Created = c.Created

here are the expected results:

ParentId ParentTitle ChildId ChildTitle Created
----------- ------- ----------- ------- --------
00000000-0000-0000-0000-000000000001 Option A 10000000-0000-0000-0000-000000000001 Child A - 1 2007-05-04 16:00:00.000
00000000-0000-0000-0000-000000000002 Option B 20000000-0000-0000-0000-000000000002 Child B - 2 2007-05-10 16:00:00.000
00000000-0000-0000-0000-000000000003 Option C 30000000-0000-0000-0000-000000000006 Child C - 6 2007-05-06 16:00:00.000

Anyone have any ideas?

I suggest that you change ParentId and ChildId to be identity integer (Add a UNIQUEIDENTIFIER if need be). The child table can now be sorted in time sequence by sorting on Id.|||I added an "Alt" id to it using bigint, but i wanted to keep with Guid's for several design considerations. I could change it and might look into it. I know of "reverse grouping" using the primary key when it is an integer type (using the same premise of MAX(PrimaryKey) and grouping by the parent object.|||

double post ;-(

|||
SELECT *FROM (SELECT c.ParentId,p.TitleAS ParentTitle,c.ChildId,c.TitleAS ChildTitle,c.Created,row_number()over(PARTITIONBY c.ParentIDORDER BY CreatedDESC,ChildIDDESC)AS tFROM @.ChildTable cINNERJOIN @.ParentTable pON c.ParentId = p.ParentId) t1WHERE t=1
|||

So, I went to try and do this as a learning experiment... I am no SQL Guru or anything, but I thought I was ok. This excersixe showed me I have lots to learn :-)

This is what I came up with, it is probably worse than yours. It is actually more code, and seems a lot worse. But I think I had some creative ideas at least :-)
The following works using 2 extra temp tables ( This is what you are doing in your sub select inner joins, I tried to find a cleaner way, in the end I think your way beat mine)

-- The workItems table actually has the three rows we want, slected from the ordered child table. But I did not know how to join with the groups, and sub queris. So again... I think yours beat mine by a lot, but I am curious what other sorts of results people might come up with.

DECLARE @.orderdChildTableTABLE ( ChildIdUNIQUEIDENTIFIER , ParentIdUNIQUEIDENTIFIER , TitleNVARCHAR(20) , CreatedDATETIME )INSERT INTO @.orderdChildTableSELECT c.ChildId, c.ParentID, c.Title, c.CreatedFROM @.ChildTable cORDER BY c.CreatedDESC
DECLARE @.workItemsTABLE (ParentIdUNIQUEIDENTIFIER, CreatedDATETIME)INSERT INTO @.workItemsSelect oct.ParentID,max(oct.Created)From @.orderdChildTable octGROUP BY oct.ParentIDSelect oct.ParentId, p.titleas ParentTitle, oct.ChildId, oct.TitleAS ChildTitle, oct.CreatedFrom @.workItems wiINNERJOIN @.orderdChildTable octON (wi.ParentID = oct.ParentID)AND (wi.Created = oct.Created)LEFTJOIN @.ParentTable pon p.ParentID = wi.ParentIDORDER BY ParentIDASC

|||

Motley, your query certainly looks cleaner than either of ours. (But you do return an extra recordset... which is not really important nor does it bother me).

Can you perhaps elaborate a little, breaking down the query, as to what pieces is doing what. This looks like sql2005 specific stuff, that I am not familiar with. and perhaps others could use a small break down, which helps guide me as to what to read/study up on in my books and google.

I think the piece that I am confused/not familar with is how do I read this statment to understand what exactly it is doing:

row_number

()over(PARTITIONBY c.ParentIDORDERBY CreatedDESC,ChildIDDESC)AS t

I am also a little thrown by the double From...

SELECT

*
FROM( .... ()AS tFROM @.ChildTable c
INNERJOIN @.ParentTable pON c.ParentId= p.ParentId
) t1
Where t = 1

Im not sure what t1 is either, and what it is supposidly emitting.

|||

Motley:

SELECT *
FROM (
SELECT c.ParentId,p.TitleAS ParentTitle,c.ChildId,c.TitleAS ChildTitle,c.Created,row_number()over(PARTITIONBY c.ParentIDORDER BY CreatedDESC,ChildIDDESC)AS t
FROM @.ChildTable c
INNERJOIN @.ParentTable pON c.ParentId = p.ParentId
) t1
WHERE t=1

ah there it is. awesome. i ran into this problem before 2005 and didn't know how to get around it. thanks. i have used the row_number function before, i don't know why i didn't think of it here|||

The query doesn't actually return another recordset, it's a subquery.

Start with:

SELECT c.ParentId,p.TitleAS ParentTitle,c.ChildId,c.TitleAS ChildTitle,c.Created,row_number()over(PARTITIONBY c.ParentIDORDER BY CreatedDESC,ChildIDDESC)AS tFROM @.ChildTable cINNERJOIN @.ParentTable pON c.ParentId = p.ParentId

the row_number portion of that is telling sqlserver to create a column in the result named "t", and for each row within a "c.ParentID" that is ordered by Created and order by ChildID within that it should represent it's place/rank (It's similiar to rank, but not quite). So the highest Created value for each ParentID will have a value of 1, the second highest is 2, etc etc but when there is a tie with two records of the same ParentID, and Created then it will consider the one with the highest ChildID to be first, basically breaking the tie.

The outer query... SELECT * FROM ( {inner query here} ) t1 WHERE t=1 just says to only take the records where t=1.

t1 is what I am calling the subquery. You always have to give subqueries that are part of the FROM clause an alias.

|||

jminond:

Motley, your query certainly looks cleaner than either of ours. (But you do return an extra recordset... which is not really important nor does it bother me).

Can you perhaps elaborate a little, breaking down the query, as to what pieces is doing what. This looks like sql2005 specific stuff, that I am not familiar with. and perhaps others could use a small break down, which helps guide me as to what to read/study up on in my books and google.

I think the piece that I am confused/not familar with is how do I read this statment to understand what exactly it is doing:

row_number()over(PARTITIONBY c.ParentIDORDERBY CreatedDESC,ChildIDDESC)AS t

I am also a little thrown by the double From...

SELECT*
FROM( .... ()AS tFROM @.ChildTable c
INNERJOIN @.ParentTable pON c.ParentId= p.ParentId
) t1
Where t = 1

Im not sure what t1 is either, and what it is supposidly emitting.

The two froms come from the fact you are selecting from the inner result set.

the partition is basically a "grouping" in the rownumber function, it tells the result set to number the results based on the parentid field, thus autonumbering the children 1-X based on the parentid.

t1 refers to the "temporary result set" from the select statement in the ( ), you could also have done "AS t1" to be more direct about what you are doing

and the where t=1 tells the query you only want the results where your temporary identity is 1, thus the newest child row for each parent row

if any of that makes sense

No comments:

Post a Comment