I'm studying the effects of indexes on query statements and trying to understand the underlying processes such as BTree's, etc. For the Northwind table 'Orders', I left the Primary Key index alone, but put an index on the field 'ShipVia'. My question is more focused on the theory and mechanical methods behind the scenes more than specific to tuning a particular SQL statement. I want to know why SQL Server would choose to use the Primary Key clustered index and not use my newly created index to execute the following statement. Wouldn't the leaf nodes of the index point to the records located in the data pages and therefore are best retrievable through the new index?
StmtText
-
SELECT *
FROM Orders
where ShipVia = 3
(1 row(s) affected)
StmtText
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]), WHERE:([Orders].[ShipVia]=Convert([@.1])))
It does use my index when I change the query as such:
SELECT ShipVia
FROM Orders
where ShipVia = 3
(2 row(s) affected)
StmtText
--
|--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[IX_Orders]), SEEK:([Orders].[ShipVia]=Convert([@.1])) ORDERED FORWARD)
(1 row(s) affected)
By the way, I did take into account the size of the table. I do understand that if the table is small enough, the benefits of an index can be lost. I tried a test in which I duplicated the records to generate over a million each. The query execution plan did not change.
The problem is two-fold: that you're issuing a SELECT * & that your predicate is probably not restrictive enough. For SQL Server to use your index in this case, it would have to look up the actual record from the clustered index for each match it finds in the secondary index. Looking up the record in the clustered index involves reading about 2-3 pages (root node and leaf node, with a possible intermediate node). So if enough rows qualify (say more than 100 rows) then it would be much more expensive to use this index because it would involve reading about 300 pages from disk.
You can check these effects yourself by forcing the index (use the with (index='[index name]') hint, see the article on FROM clause in books on-line) and using "set statistics io on". To have SQL Server use the index, either make the filter restrictive enough (very few rows have ShipVia = 3) or modify the query as you have done.
|||
I believe what you just told be totally clicked! Thank you.
I remember reading now that only clustered indexes have the actual record as the leaf node and any other indexes would simply hold an ID of the record (true?). That being said, it makes sense to not use the secondary index in many cases.
Thank you Mostafa
|||My pleasure. And yes what you said is (almost) true: secondary indexes hold the clustering key of the record if it's unique (e.g. when it's the primary key). So in your example if the primary key (and clustering key) of Orders is OrderID, then each record in the ShipVia index would hold (ShipVia, OrderID).
No comments:
Post a Comment