I have written two queries that provide the same results(using different
types of joins.).
For choosing the better one, I pasted both of them in one script in query
analyzer and executed them.
Here is the summery of the result in execution plan:
First query:
- Query Cost: 55%
- I/O Cost: Scan count 7, logical reads 14
- Execution plan: Looks simple(a few icons..)
Second query:
- Query Cost: 45%
- I/O Cost: Scan count 9, logical reads 34
- Execution plan: Looks more complicated(a lot of icons!)
Which query will have better performance? Should I consider the query
cost(relative to the batch) or the I/O cost is more important? I don't know
that why the second query has lower cost while it has more I/O and more
complicated plan!
Thanks in advance,
Amin
On Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
>Hi all,
>I have written two queries that provide the same results(using different
>types of joins.).
>For choosing the better one, I pasted both of them in one script in query
>analyzer and executed them.
>Here is the summery of the result in execution plan:
>First query:
>- Query Cost: 55%
>- I/O Cost: Scan count 7, logical reads 14
>- Execution plan: Looks simple(a few icons..)
>Second query:
>- Query Cost: 45%
>- I/O Cost: Scan count 9, logical reads 34
>- Execution plan: Looks more complicated(a lot of icons!)
>Which query will have better performance? Should I consider the query
>cost(relative to the batch) or the I/O cost is more important? I don't know
>that why the second query has lower cost while it has more I/O and more
>complicated plan!
>Thanks in advance,
>Amin
Hi Amin,
With only this information to go on, I'd say use the first query. The
reason for the lower query cost might well be that some (or maybe even
all) required data is read from disk for the first query, but still
available in the cache for the second query.
You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
query. This will ensure that the cache is empty.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||Thanks Hugo!
Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
which is confusing for me too!
Now, should I prefer the second query to the first?
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:7jpv9097l4u4bdnm5blhd4ql72oped6f70@.4ax.com... [vbcol=seagreen]
> On Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
know
> Hi Amin,
> With only this information to go on, I'd say use the first query. The
> reason for the lower query cost might well be that some (or maybe even
> all) required data is read from disk for the first query, but still
> available in the cache for the second query.
> You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
> query. This will ensure that the cache is empty.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
>Thanks Hugo!
>Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
>which is confusing for me too!
>Now, should I prefer the second query to the first?
>Amin
Hi Amin,
Hard to say. Some other things you might try:
1. Reverse the order in which you execute the queries in your test
scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
but you never know).
2. Add "select getdate()" before and after each of the two queries, so
you can compare the actual execution time.
3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
your queries, so you can be sure it's not executed with an old
execution plan (that might be based on outdated statistics).
4. Post the queries here. Maybe I (or someone else) can make some
comments based on that. (Do add relevant table definitions, in the
form of CREATE TABLE statements as well!)
I assume you are already aware that you should run your performance
tests on the production system or on a test system with about the same
amount and distribution of data and the same specifications, that you
should try to test when no other users or processes use SQL Server (or
other apps on the same server) and that you should make sure your
table statistics are updated before running your tests.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||I assume you posted the estimations below. I would not base conclusions
on these estimations. Whenever possible I would run the query with I/O
count and CPU count turned on. In the end, the query (plan) with the
lowest elapsed time is the winner...
Gert-Jan
Amin Sobati wrote:
> Hi all,
> I have written two queries that provide the same results(using different
> types of joins.).
> For choosing the better one, I pasted both of them in one script in query
> analyzer and executed them.
> Here is the summery of the result in execution plan:
> First query:
> - Query Cost: 55%
> - I/O Cost: Scan count 7, logical reads 14
> - Execution plan: Looks simple(a few icons..)
> Second query:
> - Query Cost: 45%
> - I/O Cost: Scan count 9, logical reads 34
> - Execution plan: Looks more complicated(a lot of icons!)
> Which query will have better performance? Should I consider the query
> cost(relative to the batch) or the I/O cost is more important? I don't know
> that why the second query has lower cost while it has more I/O and more
> complicated plan!
> Thanks in advance,
> Amin
(Please reply only to the newsgroup)
|||Hugo,
Thank you for the tips!
Actually this is my case:
I have created "Students" table and "StudentExam" to track the exams that
each student has taken. I know that I must have an "Exams" table to keep
exam IDs, but I ignore that for this stage.
This is the script:
create table Students (
SID int,
SName varchar(10))
create table StudentExam (
SID int,
EName varchar(10))
-- inserting some records for each table
insert Students values(1,'David')
insert Students values(2,'Peter')
insert Students values(3,'Sarah')
insert StudentExam values(1,'sql')
insert StudentExam values(1,'vb')
insert StudentExam values(1,'asp')
insert StudentExam values(2,'vb')
insert StudentExam values(2,'asp')
insert StudentExam values(3,'xml')
insert StudentExam values(3,'access')
--Execute this insert statement for 12 times to have around 30,000 records:
insert StudentExam select * from StudentExam
--Now creating indexes:
create clustered index IX1 on StudentExam(sid)
create index IX2 on StudentExam(EName)
-- Now suppose that I need to delete the students who have NOT taken 'sql'
exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
begin tran
delete Students from StudentExam where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
begin tran
delete Students from StudentExam where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
The first delete statement has 12 logical reads, but the second one has 38.
The cost for first one is 54%, that I think it must have lower cost because
of the IO.
If I create the second index as clustered, and the first one as
non-clustered, the logical read for second delete statement will be 14, but
no change in the main cost.
Thanks in advance,
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:r141a0dp9l5ufrlnen38mgpclkih1l6b7d@.4ax.com...
> On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
>
> Hi Amin,
> Hard to say. Some other things you might try:
> 1. Reverse the order in which you execute the queries in your test
> scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
> but you never know).
> 2. Add "select getdate()" before and after each of the two queries, so
> you can compare the actual execution time.
> 3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
> I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
> your queries, so you can be sure it's not executed with an old
> execution plan (that might be based on outdated statistics).
> 4. Post the queries here. Maybe I (or someone else) can make some
> comments based on that. (Do add relevant table definitions, in the
> form of CREATE TABLE statements as well!)
> I assume you are already aware that you should run your performance
> tests on the production system or on a test system with about the same
> amount and distribution of data and the same specifications, that you
> should try to test when no other users or processes use SQL Server (or
> other apps on the same server) and that you should make sure your
> table statistics are updated before running your tests.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||Gert-Jan,
Thank you for your reply. Would you please read my reply to Hugo and write
me your thoughts
Amin
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:40A125B3.AB8FF522@.toomuchspamalready.nl...[vbcol=seagreen]
> I assume you posted the estimations below. I would not base conclusions
> on these estimations. Whenever possible I would run the query with I/O
> count and CPU count turned on. In the end, the query (plan) with the
> lowest elapsed time is the winner...
> Gert-Jan
>
> Amin Sobati wrote:
query[vbcol=seagreen]
know
> --
> (Please reply only to the newsgroup)
|||"Amin Sobati" <amins@.morva.net> wrote in message
news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> Hugo,
> Thank you for the tips!
> Actually this is my case:
> I have created "Students" table and "StudentExam" to track the exams that
> each student has taken. I know that I must have an "Exams" table to keep
> exam IDs, but I ignore that for this stage.
You can't ignore that and expect to get any meaningful performance data.
Here's a more complete example with primary keys, referential integrity a
whole lot of students, 90% of whom have taken the SQL exam.
Both queries use exactly the same plan, and have exactly the same cost.
David
create table Students (
SID int not null identity primary key,
SName varchar(10))
create table Exams(EName varchar(10) not null primary key)
create table StudentExam (
SID int not null references Students on delete cascade,
EName varchar(10) not null references Exams on delete cascade,
constraint pk_StudentExam
primary key (SID,EName)
)
create index IX2 on StudentExam(EName)
-- inserting some records for each table
insert Students values('David')
insert Students values('Peter')
insert Students values('Sarah')
-- let the students start breeding
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Exams(EName) values('sql')
insert Exams(EName) values('asp')
insert Exams(EName) values('vb')
insert Exams(EName) values('xml')
insert Exams(EName) values('access')
--add exams for each SID,EName pair
insert StudentExam(SID,EName)
select SID, EName
from Students cross join Exams
--erase the SQL exam for 10% of the students
delete studentExam
where EName = 'sql'
and sid%10=0
-- Now suppose that I need to delete the students who have NOT taken
'sql'exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
print ' QUERY 1'
begin tran
delete Students where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
print ' QUERY 2'
begin tran
delete Students where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
/*
QUERY 1
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
QUERY 2
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
*/
|||David,
Thank you for the great example.
I admit that I must do everything right to get the best result, but I cannot
recognize that why a query might have more IO but lower cost(in my example).
Can I count on the cost(relative to the batch) as a criteria for choosing a
query?
Thanks,
Amin
"David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...[vbcol=seagreen]
> "Amin Sobati" <amins@.morva.net> wrote in message
> news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
that
> You can't ignore that and expect to get any meaningful performance data.
> Here's a more complete example with primary keys, referential integrity a
> whole lot of students, 90% of whom have taken the SQL exam.
> Both queries use exactly the same plan, and have exactly the same cost.
> David
>
> create table Students (
> SID int not null identity primary key,
> SName varchar(10))
> create table Exams(EName varchar(10) not null primary key)
>
> create table StudentExam (
> SID int not null references Students on delete cascade,
> EName varchar(10) not null references Exams on delete cascade,
> constraint pk_StudentExam
> primary key (SID,EName)
> )
> create index IX2 on StudentExam(EName)
> -- inserting some records for each table
> insert Students values('David')
> insert Students values('Peter')
> insert Students values('Sarah')
> -- let the students start breeding
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
>
> insert Exams(EName) values('sql')
> insert Exams(EName) values('asp')
> insert Exams(EName) values('vb')
> insert Exams(EName) values('xml')
> insert Exams(EName) values('access')
> --add exams for each SID,EName pair
> insert StudentExam(SID,EName)
> select SID, EName
> from Students cross join Exams
> --erase the SQL exam for 10% of the students
> delete studentExam
> where EName = 'sql'
> and sid%10=0
> -- Now suppose that I need to delete the students who have NOT taken
> 'sql'exam.
> --These are two delete statements that i want to choose the perfect one:
> set statistics IO ON
>
> print ' QUERY 1'
> begin tran
> delete Students where Students.sid not in (select
> StudentExam.sid from StudentExam where StudentExam.EName='sql')
> rollback tran
> print ' QUERY 2'
> begin tran
> delete Students where 'sql' not in (select
> StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
> rollback tran
> /*
> QUERY 1
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> QUERY 2
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> */
>
|||The query optimizer's effort depends on the estimated cost of the query.
First, the query optimizer will search for any 'obvious' query plans. If
the estimated time of these solutions is no more than 1 or 2 seconds, it
will not look any further, and simply use that plan. These kinds of
query plans can depend on the way the query is written. However, if
there is any performance difference between different query plans, the
difference will be small.
If there is no obvious fast query plan, the full compilation phase is
started. This means (virtually) all possible query plans will be
evaluated. If this phase starts to take too long, then the it is cut
short, and the fastest plan that was found thus far will be used.
This is why you are seeing that in David's example, SQL-Server will use
the same query plan for both queries. I can add another syntax (see
below), which SQL-Server will probably use the same query plan for.
print ' QUERY 3'
begin tran
delete Students where not exists (
select 1
from StudentExam
where Students.sid=StudentExam.sid
and StudentExam.EName='sql'
)
rollback tran
So what I am basically saying, is that the differences you are seeing
are not significant.
Gert-Jan
Amin Sobati wrote:[vbcol=seagreen]
> David,
> Thank you for the great example.
> I admit that I must do everything right to get the best result, but I cannot
> recognize that why a query might have more IO but lower cost(in my example).
> Can I count on the cost(relative to the batch) as a criteria for choosing a
> query?
> Thanks,
> Amin
> "David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
> message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...
> that
> 0,
> 0,
(Please reply only to the newsgroup)
No comments:
Post a Comment