query assistance: effects

Tuesday, March 20, 2012

Query Execution Plan

I'm studying the effects of indexes on query statements and trying to understand the underlying processes such as BTree's, etc. For the Northwind table 'Orders', I left the Primary Key index alone, but put an index on the field 'ShipVia'. My question is more focused on the theory and mechanical methods behind the scenes more than specific to tuning a particular SQL statement. I want to know why SQL Server would choose to use the Primary Key clustered index and not use my newly created index to execute the following statement. Wouldn't the leaf nodes of the index point to the records located in the data pages and therefore are best retrievable through the new index?

StmtText
-
SELECT *
FROM Orders
where ShipVia = 3

(1 row(s) affected)

StmtText

|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]), WHERE:([Orders].[ShipVia]=Convert([@.1])))

It does use my index when I change the query as such:

SELECT ShipVia
FROM Orders
where ShipVia = 3

(2 row(s) affected)

StmtText
--
|--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[IX_Orders]), SEEK:([Orders].[ShipVia]=Convert([@.1])) ORDERED FORWARD)

(1 row(s) affected)

By the way, I did take into account the size of the table. I do understand that if the table is small enough, the benefits of an index can be lost. I tried a test in which I duplicated the records to generate over a million each. The query execution plan did not change.

The problem is two-fold: that you're issuing a SELECT * & that your predicate is probably not restrictive enough. For SQL Server to use your index in this case, it would have to look up the actual record from the clustered index for each match it finds in the secondary index. Looking up the record in the clustered index involves reading about 2-3 pages (root node and leaf node, with a possible intermediate node). So if enough rows qualify (say more than 100 rows) then it would be much more expensive to use this index because it would involve reading about 300 pages from disk.

You can check these effects yourself by forcing the index (use the with (index='[index name]') hint, see the article on FROM clause in books on-line) and using "set statistics io on". To have SQL Server use the index, either make the filter restrictive enough (very few rows have ShipVia = 3) or modify the query as you have done.

|||

I believe what you just told be totally clicked! Thank you.

I remember reading now that only clustered indexes have the actual record as the leaf node and any other indexes would simply hold an ID of the record (true?). That being said, it makes sense to not use the secondary index in many cases.

Thank you Mostafa

|||My pleasure. And yes what you said is (almost) true: secondary indexes hold the clustering key of the record if it's unique (e.g. when it's the primary key). So in your example if the primary key (and clustering key) of Orders is OrderID, then each record in the ShipVia index would hold (ShipVia, OrderID).

Saturday, February 25, 2012

Query behaviour - (subquery results)

Hi there,
I'm experiencing some very strange effects when I trying to include a
subquery. I shall demonstrate with a simple example...
create table test
(ind int,
seq int,
message varchar(255))
insert into test (ind,seq, message) values
(1,1,'date=01/06/2006')
insert into test (ind,seq, message) values
(2,1,'date=1/12/2005')
insert into test (ind,seq, message) values
(2,2,'test')
insert into test (ind,seq, message) values
(2,3,'date=2/12/2005')
The column IND is theoretically a foreign key, the SEQ is a primary key. A
quick explanation is that this is a comment table from a main table (main
table being an 'order' table and this being a 'order comment' table.. the
relationship being (order) 1:m (comment) But for this example this doesn't
really matter.
So here are 2 queries.
select * from test t1
where t1.ind =1
and convert(datetime, (SUBSTRING(Message, CHARINDEX('=',Message,0)+1,
10)),103) > getdate()
This one simple extracts the date from the text string. It works OK. I've
had to include the IND =1 to avoid the date conversion error. It just shows
that the conversion works.
results
IND SEQ Message
1 1 date=01/06/2006
The second query...
select max(seq) from test t2
where t2.message like 'date=%'
group by ind
This is extrating the highest 'SEQ' for each 'IND'. ie the last comment
(that has got a date component) for each order
results
SEQ
1
3
So thats OK.
Now the fun starts when I try to combine the two...
select * from test t1
where t1.seq in (select max(seq) from test t2
where t2.ind = t1.ind
and t2.message like 'date=%'
group by t2.ind)
where convert(datetime, (SUBSTRING(Message, CHARINDEX('=',Message,0)+1,
10)),103) > getdate()
This causes a
Server: Msg 156, Level 15, State 1, Line 6
Incorrect syntax near the keyword 'where'.
So, its implying the date format is incorrect. If I remove the convert :-
select * from test t1
where t1.seq in (select max(seq) from test t2
where t2.ind = t1.ind
and t2.message like 'date=%'
group by t2.ind)
then the results are OK -
IND SEQ Message
1 1 date=01/06/2006
2 3 date=2/12/2005
Any help please?
thanks
Simon(...)
--Replace the where with and AND -->
AND convert(datetime, (SUBSTRING(Message, CHARINDEX('=',Message,0)+1,
10)),103) > getdate()
HTH, jens Suessmeyer.|||Sorry my bad typo... I meant AND... The query should have been
select * from test t1
where t1.seq in (select max(seq) from test t2
where t2.ind = t1.ind
and t2.message like 'date=%'
group by t2.ind)
and convert(datetime, (SUBSTRING(Message, CHARINDEX('=',Message,0)+1,
10)),103) > getdate()
this results in the error. This isn't just a syntax error.
Thanks though
"Jens" wrote:

> (...)
> --Replace the where with and AND -->
> AND convert(datetime, (SUBSTRING(Message, CHARINDEX('=',Message,0)+1,
> 10)),103) > getdate()
> HTH, jens Suessmeyer.
>|||A flaw in the substring function:
...(SUBSTRING(Message, CHARINDEX('=',Message) + 1, 10)),103) > getdate()
Maybe that's it.
ML
ML
http://milambda.blogspot.com/|||I don't think this is it...
If I do a
select ind , seq, SUBSTRING(message, CHARINDEX('=',message,0)+1, 10)
from test
I get the results..
IND SEQ converted date
1 1 01/06/2006
2 1 1/12/2005
2 2 test
2 3 2/12/2005
So this shows that 3 of the rows can be converted into datetime (103 style).
I think that the problem is that the convert is being done on the whole data
set before approriate rows are excluded.
If I turn this into an inline view then I get the same error.
select * from (
select * from test t1
where t1.seq in (select max(seq) from test t2
where t2.ind = t1.ind
and t2.message like 'date=%'
group by t2.ind)) test
where convert(datetime, (SUBSTRING(Message, CHARINDEX('=',Message,0)+1,
10)),103) > getdate()
<bte this is where my original typo came from ;-) >
help !
"ML" wrote:

> A flaw in the substring function:
> ...(SUBSTRING(Message, CHARINDEX('=',Message) + 1, 10)),103) > getdate()
> Maybe that's it.
>
> ML
>
> ML
> --
> http://milambda.blogspot.com/|||I've had a quick look at this and get the same odd result.
The Where clause is being applied to all the contents of the input table.
You can see this in the showplan and can confirm it by removing the row
without a date.
Regards,
Craig
"s_clarke" wrote:
> I don't think this is it...
> If I do a
> select ind , seq, SUBSTRING(message, CHARINDEX('=',message,0)+1, 10)
> from test
> I get the results..
> IND SEQ converted date
> 1 1 01/06/2006
> 2 1 1/12/2005
> 2 2 test
> 2 3 2/12/2005
>
> So this shows that 3 of the rows can be converted into datetime (103 style
).
> I think that the problem is that the convert is being done on the whole da
ta
> set before approriate rows are excluded.
> If I turn this into an inline view then I get the same error.
> select * from (
> select * from test t1
> where t1.seq in (select max(seq) from test t2
> where t2.ind = t1.ind
> and t2.message like 'date=%'
> group by t2.ind)) test
> where convert(datetime, (SUBSTRING(Message, CHARINDEX('=',Message,0)+1,
> 10)),103) > getdate()
> <bte this is where my original typo came from ;-) >
> help !
>
> "ML" wrote:
>

Tuesday, March 20, 2012

Query Execution Plan

Saturday, February 25, 2012

Query behaviour - (subquery results)

query assistance

Blog Archive

About Me