HAI Friends
I need some info.
I have written a query which was working fine with DB user id and password deticated to this DB which i am using.
Then we change the login mode to Windows Login with Trusted connection. After this change my query is failing.
My query is which working in SQL login mode is
SELECT DATE_TIME,CARDNO, NATIONALID, DOB, STATUS FROM TBL_INFO where DATE_TIME between '29/12/2003 00:00:01' and '30/12/2003 23:59:59' Order by DATE_TIME ASC
Where DATE_TIME is date time Data type.
i change this query to check with Windows login with trusted connection.
SELECT DATE_TIME,CARDNO, NATIONALID, DOB, STATUS FROM TBL_INFO where DATE_TIME between convert(datetime, '29/12/2003', 103 ) and convert(datetime, '30/12/2003', 103 ) Order by DATE_TIME ASC
No Error is mention for this query but no records are fetched. But data is there in DB.
Please help i am new to SQL server.
Expecting your replies ASAP
Signing Off
Venkatesh.VHas the windows login been given access to the database/tables ? Where are you running these queries ? Do you have access to query analyzer ?|||Originally posted by rnealejr
Has the windows login been given access to the database/tables ? Where are you running these queries ? Do you have access to query analyzer ?
Yes i have access to Tables. i have been given DB owner rights.
i have to run these queries form Visual basic. i have to populate these data as reports using Crystal reports. Query is not working even in Query Analyser.
i ma trying it out with 2 diff login id with query analyser|||In query analyzer - what about select * from tbl_info/select count(*) from tbl_info. Also, what message is returned in query analyzer ?|||You connected to the server?
Then you're in...
And I bet you're trying to execute the code in master...|||But he would get invalid object name if that were the case ...|||Originally posted by rnealejr
In query analyzer - what about select * from tbl_info/select count(*) from tbl_info. Also, what message is returned in query analyzer ?
Dear Friend,
I ma able to see data if i put a Select * Statement and i ma able to get count etc..
Only thing this query fails ..|||It shoulds like your where clause is hosed - leave off the where clause and try again. If that returns rows, what values are stored in the date_time field ?|||First:
You need to post the error message. We're just guess right now as to what the problem is.
Second:
You need to post the Query and The DDL of the Tables
Third:
Sample Data would be a BIG help|||Originally posted by Brett Kaiser
First:
You need to post the error message. We're just guess right now as to what the problem is.
Second:
You need to post the Query and The DDL of the Tables
Third:
Sample Data would be a BIG help
Dear Frnd,
I have ran the query with out any condtion. it works. Even with one condtion by checking only first date value it works. when we put the between caluse in the query it returns nothing.
i have attched the table strucure,index,constraints and sample data. please have look into this and help me out.|||You seem to be focusing on the fact that you changed your login method, but your post indicates that you also changed your query, and this is likely the problem. For one thing,
between '29/12/2003 00:00:01' and '30/12/2003 23:59:59'
is NOT the same as
between convert(datetime, '29/12/2003', 103 ) and convert(datetime, '30/12/2003', 103 )
At the very least, the second statement covers 24 fewer hours than the first statement, so any data on 30/12/2003 would not show up.
Why did you switch to using a convert function?
blindman|||Originally posted by blindman
You seem to be focusing on the fact that you changed your login method, but your post indicates that you also changed your query, and this is likely the problem. For one thing,
between '29/12/2003 00:00:01' and '30/12/2003 23:59:59'
is NOT the same as
between convert(datetime, '29/12/2003', 103 ) and convert(datetime, '30/12/2003', 103 )
At the very least, the second statement covers 24 fewer hours than the first statement, so any data on 30/12/2003 would not show up.
Why did you switch to using a convert function?
blindman
As my Previous Query was not working under Trusted conenction it throws me error stating that "The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value."
So to solve this i tired with Convert fucntion. Still after that no records are fetched. But data is there in Table.|||If you show what is returned in the date_time field, I think we can solve this. Also, what is the datatype for the date_time field - datetime or smalldatetime ?|||Originally posted by rnealejr
If you show what is returned in the date_time field, I think we can solve this. Also, what is the datatype for the date_time field - datetime or smalldatetime ?
Datatype for DATE_TME is datetime.
when we select date_time alone we get data as
30/12/2003 11:01:10
30/12/2003 10:34:18
30/12/2003 10:34:01|||I looked at your attachment...and it does look like the column is defined as datetime...
Your data however doesn't...
Do this and tell us what you get
[grasping at straws now]
SELECT COUNT(*) FROM TBL_INFO WHERE ISDATE(DATE_TIME) = 0
If you get anything other than 0, you'll have non date data...
Which is what your error message is telling you...but not what your table def shows
[/grasping at straws now]
Also try DBCC CHECKTABLE ('TBL_INFO')
And let use know what that says...|||Maybe I missed it earlier in the thread but it seems to me as though your date format is wrong.
Try changing your between clause to the following:
between '2003-12-29 00:00:01' and '2003-12-30 23:59:59'
That should work.
Dan|||Not sure if I smell a discrepancy in regional settings here or not, but...
If you run sp_helpdb on your database, what is the collation (under status)? (example: sp_helpdb pubs)
Can you re-write the query without the between, and have it work? As in
where date_time > '29/12/2003 00:00:01'
and date_time < '30/12/2003 23:59:59'
If it works this way, great, but if you get no error and no records, try lowering the bottom constraint ('29/12/2003 00:00:01'), until something gets in range.
If you get no errors, then the query parses and compiles correctly. Maybe you just have a condition like where 1 = 2...|||Oh, and what does the date format look like on the client computer?|||Originally posted by Brett Kaiser
I looked at your attachment...and it does look like the column is defined as datetime...
Your data however doesn't...
Do this and tell us what you get
[grasping at straws now]
SELECT COUNT(*) FROM TBL_INFO WHERE ISDATE(DATE_TIME) = 0
If you get anything other than 0, you'll have non date data...
Which is what your error message is telling you...but not what your table def shows
[/grasping at straws now]
Also try DBCC CHECKTABLE ('TBL_INFO')
And let use know what that says...
When i ran the query
SELECT COUNT(*) FROM TBL_INFO WHERE ISDATE(DATE_TIME) = 0
It returns 0
i rean this query in two different logins such as DB login id and win user
the results are same.
After this i ran the DBCC CHECKTABLE ('TBL_INFO')
it returns
/*----------
DBCC CHECKTABLE ('TBL_INFO')
----------*/
DBCC results for 'TBL_INFO'.
There are 11 rows in 2 pages for object 'TBL_INFO'.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
i ran these queries from Query Analyser.
Can you please tel me wat these commands will do
"DBCC check table".|||Originally posted by danielacroft
Maybe I missed it earlier in the thread but it seems to me as though your date format is wrong.
Try changing your between clause to the following:
between '2003-12-29 00:00:01' and '2003-12-30 23:59:59'
That should work.
Dan
Date Format is same dd-mm-yyyy.|||Originally posted by MCrowley
Not sure if I smell a discrepancy in regional settings here or not, but...
If you run sp_helpdb on your database, what is the collation (under status)? (example: sp_helpdb pubs)
Can you re-write the query without the between, and have it work? As in
where date_time > '29/12/2003 00:00:01'
and date_time < '30/12/2003 23:59:59'
If it works this way, great, but if you get no error and no records, try lowering the bottom constraint ('29/12/2003 00:00:01'), until something gets in range.
If you get no errors, then the query parses and compiles correctly. Maybe you just have a condition like where 1 = 2...
Status is
Status=ONLINE, Updateability=READ_WRITE, UserAccess=MULTI_USER, Recovery=FULL, Version=539, Collation=SQL_Latin1_General_CP1253_CI_AS, SQLSortOrder=114, IsTornPageDetectionEnabled, IsAutoCreateStatistics, IsAutoUpdateStatistics
I even tried that it returns 0 rows. Where data was there in DB.|||What is the result if you do the between using (no convert/no time):
'mm/dd/yyyy' and 'mm/dd/yyyy'
OR this method
'dd/mm/yyyy' and 'dd/mm/yyyy'|||Originally posted by rnealejr
What is the result if you do the between using (no convert/no time):
'mm/dd/yyyy' and 'mm/dd/yyyy'
OR this method
'dd/mm/yyyy' and 'dd/mm/yyyy'
Result is "No rows are returned."|||What the heck is ""???
Those aren't even letter "i's. There a different ASCII character altogether. Could that be some weird regional time format for am/pm? Not that I can find.
Please run this query IN QUERY ANALYZER and post the results:
SELECT distinct top 20
getdate() SystemDate,
DATE_TIME as RawFormat,
convert(varchar(19), DATE_TIME, 120) as Format120,
cast(convert(varchar(19), DATE_TIME, 120) as datetime) as Reconverted
FROM TBL_INFO
blindman|||...and please run this query as well, through QUERY ANALYZER (Crystal frequently mucks things up, especially field formats).
SELECT DATE_TIME
FROM TBL_INFO
where DATE_TIME between '12/29/2003 00:00:01' and '12/30/2003 23:59:59'
blindman|||Originally posted by blindman
What the heck is ""???
Those aren't even letter "i's. There a different ASCII character altogether. Could that be some weird regional time format for am/pm? Not that I can find.
Please run this query IN QUERY ANALYZER and post the results:
SELECT distinct top 20
getdate() SystemDate,
DATE_TIME as RawFormat,
convert(varchar(19), DATE_TIME, 120) as Format120,
cast(convert(varchar(19), DATE_TIME, 120) as datetime) as Reconverted
FROM TBL_INFO
blindman
Sir,
I ran the query it returns
2/1/2004 10:16:10 1/1/2004 7:42:52 2004-01-01 19:42:52 1/1/2004 7:42:52
2/1/2004 10:16:10 1/1/2004 7:46:22 2004-01-01 19:46:22 1/1/2004 7:46:22
2/1/2004 10:16:10 1/1/2004 7:51:06 2004-01-01 19:51:06 1/1/2004 7:51:06
2/1/2004 10:16:10 1/1/2004 7:51:33 2004-01-01 19:51:33 1/1/2004 7:51:33
2/1/2004 10:16:10 1/1/2004 7:55:14 2004-01-01 19:55:14 1/1/2004 7:55:14
2/1/2004 10:16:10 1/1/2004 7:56:40 2004-01-01 19:56:40 1/1/2004 7:56:40
2/1/2004 10:16:10 1/1/2004 7:59:07 2004-01-01 19:59:07 1/1/2004 7:59:07
2/1/2004 10:16:10 1/1/2004 8:19:16 2004-01-01 20:19:16 1/1/2004 8:19:16
2/1/2004 10:16:10 1/1/2004 8:28:54 2004-01-01 20:28:54 1/1/2004 8:28:54
2/1/2004 10:16:10 1/1/2004 8:31:33 2004-01-01 20:31:33 1/1/2004 8:31:33
2/1/2004 10:16:10 1/1/2004 8:41:35 2004-01-01 20:41:35 1/1/2004 8:41:35
2/1/2004 10:16:10 1/1/2004 10:16:58 2004-01-01 22:16:58 1/1/2004 10:16:58
2/1/2004 10:16:10 1/1/2004 10:27:03 2004-01-01 22:27:03 1/1/2004 10:27:03
2/1/2004 10:16:10 1/1/2004 11:14:09 2004-01-01 23:14:09 1/1/2004 11:14:09
2/1/2004 10:16:10 1/1/2004 11:14:14 2004-01-01 23:14:14 1/1/2004 11:14:14
2/1/2004 10:16:10 1/1/2004 11:22:42 2004-01-01 23:22:42 1/1/2004 11:22:42
2/1/2004 10:16:10 2/1/2004 3:11:32 2004-01-02 15:11:32 2/1/2004 3:11:32
2/1/2004 10:16:10 2/1/2004 3:11:47 2004-01-02 15:11:47 2/1/2004 3:11:47
2/1/2004 10:16:10 2/1/2004 3:14:30 2004-01-02 15:14:30 2/1/2004 3:14:30
2/1/2004 10:16:10 2/1/2004 3:22:05 2004-01-02 15:22:05 2/1/2004 3:22:05
"" means PM in Greek. As regional settings is set to Greece,
"" means AM in Greek.|||Originally posted by blindman
...and please run this query as well, through QUERY ANALYZER (Crystal frequently mucks things up, especially field formats).
SELECT DATE_TIME
FROM TBL_INFO
where DATE_TIME between '12/29/2003 00:00:01' and '12/30/2003 23:59:59'
blindman
I ran the query ,
I got the Date time filed value.
Since data has been purged,as it was 30 days old. i have data from 1st alone.
Query is working in both login .. i mean in with DB login and as well as Nt login.
Still i am not clear how come same query was failing last week and now it working in 2 login id woth out nay change?
Last week it was thorwing errro stating that can't convert the char to datetime with NT login id , when i use the convert func.. it was not returning any rows.
but Today query without any convert function is working in both login id .. with DB user id and NT user login
Do u get anything froM this ??/|||There was a change.
The old code was:
between '29/12/2003 00:00:01' and '30/12/2003 23:59:59'
...and the new code is:
between '12/29/2003 00:00:01' and '12/30/2003 23:59:59'
Notice the change in dateformat from dd/mm/yyyy to mm/dd/yyyy.
There is some confusion going on, either within SQL Server or through Crystal reports, when exchanging time formats. Crystal is notoriously unreliable about such things.
If you can use the 'mm/dd/yyyy' format, then that should solve your problem. Better is to use format 120 "yyyy-mm-dd hh:mi:ss(24h)" which is supposedly interpreted correctly across all instances of SQL Server.
blindman|||Originally posted by blindman
There was a change.
The old code was:
between '29/12/2003 00:00:01' and '30/12/2003 23:59:59'
...and the new code is:
between '12/29/2003 00:00:01' and '12/30/2003 23:59:59'
Notice the change in dateformat from dd/mm/yyyy to mm/dd/yyyy.
There is some confusion going on, either within SQL Server or through Crystal reports, when exchanging time formats. Crystal is notoriously unreliable about such things.
If you can use the 'mm/dd/yyyy' format, then that should solve your problem. Better is to use format 120 "yyyy-mm-dd hh:mi:ss(24h)" which is supposedly interpreted correctly across all instances of SQL Server.
blindman
So How does this Query to be changed to format 120
SELECT DATE_TIME, CARDNO, MFLAG, XFLAG, INFO FROM TBL_TRANS where Menuid = 3 and DATE_TIME between '02/01/2004 00:00:01' and '02/01/2004 23:59:59' Order by DATE_TIME ASC
can u pls help me out on this|||SELECT convert(varchar(20),DATE_TIME, 120),CARDNO, MFLAG, XFLAG, INFO FROM TBL_TRANS where Menuid = 3 and DATE_TIME between '02/01/2004 00:00:01' and '02/01/2004 23:59:59' Order by DATE_TIME ASC|||Originally posted by rnealejr
SELECT convert(varchar(20),DATE_TIME, 120),CARDNO, MFLAG, XFLAG, INFO FROM TBL_TRANS where Menuid = 3 and DATE_TIME between '02/01/2004 00:00:01' and '02/01/2004 23:59:59' Order by DATE_TIME ASC
Query is not returning any rows.
But when i run this query with date range from 1 to 2
SELECT convert(varchar(20),DATE_TIME, 120)as DATE_TIME,CARDNO, MFLAG, XFLAG, INFO FROM TBL_TRANS where Menuid = 3 and
DATE_TIME between '01/01/2004 00:00:01' and '02/01/2004 23:59:59' Order by DATE_TIME ASC
it returns all rows of date 1 and 2 ..
And Also when i run it as
SELECT DATE_TIME, CARDNO, MFLAG, XFLAG, INFO FROM TBL_TRANS where Menuid = 3 and
DATE_TIME between '01/01/2004' and '03/01/2004' Order by DATE_TIME ASC
it returns rows which have date 2/1/2004
Any idea wat cud be the problem
Thanking You|||Originally posted by rnealejr
SELECT convert(varchar(20),DATE_TIME, 120),CARDNO, MFLAG, XFLAG, INFO FROM TBL_TRANS where Menuid = 3 and DATE_TIME between '02/01/2004 00:00:01' and '02/01/2004 23:59:59' Order by DATE_TIME ASC
The above query returns all rows of Date 1st alone with
DB login and not returing any rows with NT trusted connection.
Also this query
SELECT DATE_TIME, CARDNO, MFLAG, XFLAG, INFO FROM TBL_TRANS where Menuid = 3 and DATE_TIME between '02/01/2004 00:00:01' and '02/01/2004 23:59:59' Order by DATE_TIME ASC
Is working with DB login id
and not with NT Login id with trusted connection ..
This is my problem ..
This query returns all rows of 2nd alone|||Originally posted by rnealejr
SELECT convert(varchar(20),DATE_TIME, 120),CARDNO, MFLAG, XFLAG, INFO FROM TBL_TRANS where Menuid = 3 and DATE_TIME between '02/01/2004 00:00:01' and '02/01/2004 23:59:59' Order by DATE_TIME ASC
I think i got it
i change the date format to yyyy-mm-dd for the above query..
it is working with nt trusted conenction this wat i need ..
let me check all of my queris and i shall get back to u ASAP..
Thanking you all for ur efforts|||SELECT DATE_TIME, CARDNO, MFLAG, XFLAG, INFO
FROM TBL_TRANS
where Menuid = 3
and
DATE_TIME between '2004-02-01 00:00:01' and '2004-02-01 23:59:59'
Order by DATE_TIME ASC
...should be interpreted by the sql server unambiguously. I don't think you need to do an explicit convert for your situation (maybe you do), but the datestrings in the above example are already in format 120.
The question is, can you get Crystal to submit them in this format?
You haven't talked much about how your process works, but I hope that you put this code into a stored procedure, and then just have crystal call the stored procedure and submit the datetime values as datetime values (not as strings).
blindman|||Can't you call a sproc from Crystal?|||Yes, Crystal will reference stored procedures. They don't appear on the list of data resources by default. You have to go into options on the dialog box and specify that you want the listed. Just one more minor way the crystal sucks.
Have I mentioned before that Crystal sucks?
It does.
blindmansql
Showing posts with label written. Show all posts
Showing posts with label written. Show all posts
Wednesday, March 21, 2012
Wednesday, March 7, 2012
Query cost and execution plan
Hi all,
I have written two queries that provide the same results(using different
types of joins.).
For choosing the better one, I pasted both of them in one script in query
analyzer and executed them.
Here is the summery of the result in execution plan:
First query:
- Query Cost: 55%
- I/O Cost: Scan count 7, logical reads 14
- Execution plan: Looks simple(a few icons..)
Second query:
- Query Cost: 45%
- I/O Cost: Scan count 9, logical reads 34
- Execution plan: Looks more complicated(a lot of icons!)
Which query will have better performance? Should I consider the query
cost(relative to the batch) or the I/O cost is more important? I don't know
that why the second query has lower cost while it has more I/O and more
complicated plan!
Thanks in advance,
Amin
On Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
>Hi all,
>I have written two queries that provide the same results(using different
>types of joins.).
>For choosing the better one, I pasted both of them in one script in query
>analyzer and executed them.
>Here is the summery of the result in execution plan:
>First query:
>- Query Cost: 55%
>- I/O Cost: Scan count 7, logical reads 14
>- Execution plan: Looks simple(a few icons..)
>Second query:
>- Query Cost: 45%
>- I/O Cost: Scan count 9, logical reads 34
>- Execution plan: Looks more complicated(a lot of icons!)
>Which query will have better performance? Should I consider the query
>cost(relative to the batch) or the I/O cost is more important? I don't know
>that why the second query has lower cost while it has more I/O and more
>complicated plan!
>Thanks in advance,
>Amin
Hi Amin,
With only this information to go on, I'd say use the first query. The
reason for the lower query cost might well be that some (or maybe even
all) required data is read from disk for the first query, but still
available in the cache for the second query.
You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
query. This will ensure that the cache is empty.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||Thanks Hugo!
Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
which is confusing for me too!
Now, should I prefer the second query to the first?
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:7jpv9097l4u4bdnm5blhd4ql72oped6f70@.4ax.com... [vbcol=seagreen]
> On Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
know
> Hi Amin,
> With only this information to go on, I'd say use the first query. The
> reason for the lower query cost might well be that some (or maybe even
> all) required data is read from disk for the first query, but still
> available in the cache for the second query.
> You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
> query. This will ensure that the cache is empty.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
>Thanks Hugo!
>Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
>which is confusing for me too!
>Now, should I prefer the second query to the first?
>Amin
Hi Amin,
Hard to say. Some other things you might try:
1. Reverse the order in which you execute the queries in your test
scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
but you never know).
2. Add "select getdate()" before and after each of the two queries, so
you can compare the actual execution time.
3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
your queries, so you can be sure it's not executed with an old
execution plan (that might be based on outdated statistics).
4. Post the queries here. Maybe I (or someone else) can make some
comments based on that. (Do add relevant table definitions, in the
form of CREATE TABLE statements as well!)
I assume you are already aware that you should run your performance
tests on the production system or on a test system with about the same
amount and distribution of data and the same specifications, that you
should try to test when no other users or processes use SQL Server (or
other apps on the same server) and that you should make sure your
table statistics are updated before running your tests.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||I assume you posted the estimations below. I would not base conclusions
on these estimations. Whenever possible I would run the query with I/O
count and CPU count turned on. In the end, the query (plan) with the
lowest elapsed time is the winner...
Gert-Jan
Amin Sobati wrote:
> Hi all,
> I have written two queries that provide the same results(using different
> types of joins.).
> For choosing the better one, I pasted both of them in one script in query
> analyzer and executed them.
> Here is the summery of the result in execution plan:
> First query:
> - Query Cost: 55%
> - I/O Cost: Scan count 7, logical reads 14
> - Execution plan: Looks simple(a few icons..)
> Second query:
> - Query Cost: 45%
> - I/O Cost: Scan count 9, logical reads 34
> - Execution plan: Looks more complicated(a lot of icons!)
> Which query will have better performance? Should I consider the query
> cost(relative to the batch) or the I/O cost is more important? I don't know
> that why the second query has lower cost while it has more I/O and more
> complicated plan!
> Thanks in advance,
> Amin
(Please reply only to the newsgroup)
|||Hugo,
Thank you for the tips!
Actually this is my case:
I have created "Students" table and "StudentExam" to track the exams that
each student has taken. I know that I must have an "Exams" table to keep
exam IDs, but I ignore that for this stage.
This is the script:
create table Students (
SID int,
SName varchar(10))
create table StudentExam (
SID int,
EName varchar(10))
-- inserting some records for each table
insert Students values(1,'David')
insert Students values(2,'Peter')
insert Students values(3,'Sarah')
insert StudentExam values(1,'sql')
insert StudentExam values(1,'vb')
insert StudentExam values(1,'asp')
insert StudentExam values(2,'vb')
insert StudentExam values(2,'asp')
insert StudentExam values(3,'xml')
insert StudentExam values(3,'access')
--Execute this insert statement for 12 times to have around 30,000 records:
insert StudentExam select * from StudentExam
--Now creating indexes:
create clustered index IX1 on StudentExam(sid)
create index IX2 on StudentExam(EName)
-- Now suppose that I need to delete the students who have NOT taken 'sql'
exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
begin tran
delete Students from StudentExam where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
begin tran
delete Students from StudentExam where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
The first delete statement has 12 logical reads, but the second one has 38.
The cost for first one is 54%, that I think it must have lower cost because
of the IO.
If I create the second index as clustered, and the first one as
non-clustered, the logical read for second delete statement will be 14, but
no change in the main cost.
Thanks in advance,
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:r141a0dp9l5ufrlnen38mgpclkih1l6b7d@.4ax.com...
> On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
>
> Hi Amin,
> Hard to say. Some other things you might try:
> 1. Reverse the order in which you execute the queries in your test
> scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
> but you never know).
> 2. Add "select getdate()" before and after each of the two queries, so
> you can compare the actual execution time.
> 3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
> I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
> your queries, so you can be sure it's not executed with an old
> execution plan (that might be based on outdated statistics).
> 4. Post the queries here. Maybe I (or someone else) can make some
> comments based on that. (Do add relevant table definitions, in the
> form of CREATE TABLE statements as well!)
> I assume you are already aware that you should run your performance
> tests on the production system or on a test system with about the same
> amount and distribution of data and the same specifications, that you
> should try to test when no other users or processes use SQL Server (or
> other apps on the same server) and that you should make sure your
> table statistics are updated before running your tests.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||Gert-Jan,
Thank you for your reply. Would you please read my reply to Hugo and write
me your thoughts
Amin
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:40A125B3.AB8FF522@.toomuchspamalready.nl...[vbcol=seagreen]
> I assume you posted the estimations below. I would not base conclusions
> on these estimations. Whenever possible I would run the query with I/O
> count and CPU count turned on. In the end, the query (plan) with the
> lowest elapsed time is the winner...
> Gert-Jan
>
> Amin Sobati wrote:
query[vbcol=seagreen]
know
> --
> (Please reply only to the newsgroup)
|||"Amin Sobati" <amins@.morva.net> wrote in message
news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> Hugo,
> Thank you for the tips!
> Actually this is my case:
> I have created "Students" table and "StudentExam" to track the exams that
> each student has taken. I know that I must have an "Exams" table to keep
> exam IDs, but I ignore that for this stage.
You can't ignore that and expect to get any meaningful performance data.
Here's a more complete example with primary keys, referential integrity a
whole lot of students, 90% of whom have taken the SQL exam.
Both queries use exactly the same plan, and have exactly the same cost.
David
create table Students (
SID int not null identity primary key,
SName varchar(10))
create table Exams(EName varchar(10) not null primary key)
create table StudentExam (
SID int not null references Students on delete cascade,
EName varchar(10) not null references Exams on delete cascade,
constraint pk_StudentExam
primary key (SID,EName)
)
create index IX2 on StudentExam(EName)
-- inserting some records for each table
insert Students values('David')
insert Students values('Peter')
insert Students values('Sarah')
-- let the students start breeding
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Exams(EName) values('sql')
insert Exams(EName) values('asp')
insert Exams(EName) values('vb')
insert Exams(EName) values('xml')
insert Exams(EName) values('access')
--add exams for each SID,EName pair
insert StudentExam(SID,EName)
select SID, EName
from Students cross join Exams
--erase the SQL exam for 10% of the students
delete studentExam
where EName = 'sql'
and sid%10=0
-- Now suppose that I need to delete the students who have NOT taken
'sql'exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
print ' QUERY 1'
begin tran
delete Students where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
print ' QUERY 2'
begin tran
delete Students where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
/*
QUERY 1
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
QUERY 2
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
*/
|||David,
Thank you for the great example.
I admit that I must do everything right to get the best result, but I cannot
recognize that why a query might have more IO but lower cost(in my example).
Can I count on the cost(relative to the batch) as a criteria for choosing a
query?
Thanks,
Amin
"David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...[vbcol=seagreen]
> "Amin Sobati" <amins@.morva.net> wrote in message
> news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
that
> You can't ignore that and expect to get any meaningful performance data.
> Here's a more complete example with primary keys, referential integrity a
> whole lot of students, 90% of whom have taken the SQL exam.
> Both queries use exactly the same plan, and have exactly the same cost.
> David
>
> create table Students (
> SID int not null identity primary key,
> SName varchar(10))
> create table Exams(EName varchar(10) not null primary key)
>
> create table StudentExam (
> SID int not null references Students on delete cascade,
> EName varchar(10) not null references Exams on delete cascade,
> constraint pk_StudentExam
> primary key (SID,EName)
> )
> create index IX2 on StudentExam(EName)
> -- inserting some records for each table
> insert Students values('David')
> insert Students values('Peter')
> insert Students values('Sarah')
> -- let the students start breeding
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
>
> insert Exams(EName) values('sql')
> insert Exams(EName) values('asp')
> insert Exams(EName) values('vb')
> insert Exams(EName) values('xml')
> insert Exams(EName) values('access')
> --add exams for each SID,EName pair
> insert StudentExam(SID,EName)
> select SID, EName
> from Students cross join Exams
> --erase the SQL exam for 10% of the students
> delete studentExam
> where EName = 'sql'
> and sid%10=0
> -- Now suppose that I need to delete the students who have NOT taken
> 'sql'exam.
> --These are two delete statements that i want to choose the perfect one:
> set statistics IO ON
>
> print ' QUERY 1'
> begin tran
> delete Students where Students.sid not in (select
> StudentExam.sid from StudentExam where StudentExam.EName='sql')
> rollback tran
> print ' QUERY 2'
> begin tran
> delete Students where 'sql' not in (select
> StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
> rollback tran
> /*
> QUERY 1
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> QUERY 2
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> */
>
|||The query optimizer's effort depends on the estimated cost of the query.
First, the query optimizer will search for any 'obvious' query plans. If
the estimated time of these solutions is no more than 1 or 2 seconds, it
will not look any further, and simply use that plan. These kinds of
query plans can depend on the way the query is written. However, if
there is any performance difference between different query plans, the
difference will be small.
If there is no obvious fast query plan, the full compilation phase is
started. This means (virtually) all possible query plans will be
evaluated. If this phase starts to take too long, then the it is cut
short, and the fastest plan that was found thus far will be used.
This is why you are seeing that in David's example, SQL-Server will use
the same query plan for both queries. I can add another syntax (see
below), which SQL-Server will probably use the same query plan for.
print ' QUERY 3'
begin tran
delete Students where not exists (
select 1
from StudentExam
where Students.sid=StudentExam.sid
and StudentExam.EName='sql'
)
rollback tran
So what I am basically saying, is that the differences you are seeing
are not significant.
Gert-Jan
Amin Sobati wrote:[vbcol=seagreen]
> David,
> Thank you for the great example.
> I admit that I must do everything right to get the best result, but I cannot
> recognize that why a query might have more IO but lower cost(in my example).
> Can I count on the cost(relative to the batch) as a criteria for choosing a
> query?
> Thanks,
> Amin
> "David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
> message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...
> that
> 0,
> 0,
(Please reply only to the newsgroup)
I have written two queries that provide the same results(using different
types of joins.).
For choosing the better one, I pasted both of them in one script in query
analyzer and executed them.
Here is the summery of the result in execution plan:
First query:
- Query Cost: 55%
- I/O Cost: Scan count 7, logical reads 14
- Execution plan: Looks simple(a few icons..)
Second query:
- Query Cost: 45%
- I/O Cost: Scan count 9, logical reads 34
- Execution plan: Looks more complicated(a lot of icons!)
Which query will have better performance? Should I consider the query
cost(relative to the batch) or the I/O cost is more important? I don't know
that why the second query has lower cost while it has more I/O and more
complicated plan!
Thanks in advance,
Amin
On Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
>Hi all,
>I have written two queries that provide the same results(using different
>types of joins.).
>For choosing the better one, I pasted both of them in one script in query
>analyzer and executed them.
>Here is the summery of the result in execution plan:
>First query:
>- Query Cost: 55%
>- I/O Cost: Scan count 7, logical reads 14
>- Execution plan: Looks simple(a few icons..)
>Second query:
>- Query Cost: 45%
>- I/O Cost: Scan count 9, logical reads 34
>- Execution plan: Looks more complicated(a lot of icons!)
>Which query will have better performance? Should I consider the query
>cost(relative to the batch) or the I/O cost is more important? I don't know
>that why the second query has lower cost while it has more I/O and more
>complicated plan!
>Thanks in advance,
>Amin
Hi Amin,
With only this information to go on, I'd say use the first query. The
reason for the lower query cost might well be that some (or maybe even
all) required data is read from disk for the first query, but still
available in the cache for the second query.
You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
query. This will ensure that the cache is empty.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||Thanks Hugo!
Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
which is confusing for me too!
Now, should I prefer the second query to the first?
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:7jpv9097l4u4bdnm5blhd4ql72oped6f70@.4ax.com... [vbcol=seagreen]
> On Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
know
> Hi Amin,
> With only this information to go on, I'd say use the first query. The
> reason for the lower query cost might well be that some (or maybe even
> all) required data is read from disk for the first query, but still
> available in the cache for the second query.
> You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
> query. This will ensure that the cache is empty.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
>Thanks Hugo!
>Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
>which is confusing for me too!
>Now, should I prefer the second query to the first?
>Amin
Hi Amin,
Hard to say. Some other things you might try:
1. Reverse the order in which you execute the queries in your test
scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
but you never know).
2. Add "select getdate()" before and after each of the two queries, so
you can compare the actual execution time.
3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
your queries, so you can be sure it's not executed with an old
execution plan (that might be based on outdated statistics).
4. Post the queries here. Maybe I (or someone else) can make some
comments based on that. (Do add relevant table definitions, in the
form of CREATE TABLE statements as well!)
I assume you are already aware that you should run your performance
tests on the production system or on a test system with about the same
amount and distribution of data and the same specifications, that you
should try to test when no other users or processes use SQL Server (or
other apps on the same server) and that you should make sure your
table statistics are updated before running your tests.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||I assume you posted the estimations below. I would not base conclusions
on these estimations. Whenever possible I would run the query with I/O
count and CPU count turned on. In the end, the query (plan) with the
lowest elapsed time is the winner...
Gert-Jan
Amin Sobati wrote:
> Hi all,
> I have written two queries that provide the same results(using different
> types of joins.).
> For choosing the better one, I pasted both of them in one script in query
> analyzer and executed them.
> Here is the summery of the result in execution plan:
> First query:
> - Query Cost: 55%
> - I/O Cost: Scan count 7, logical reads 14
> - Execution plan: Looks simple(a few icons..)
> Second query:
> - Query Cost: 45%
> - I/O Cost: Scan count 9, logical reads 34
> - Execution plan: Looks more complicated(a lot of icons!)
> Which query will have better performance? Should I consider the query
> cost(relative to the batch) or the I/O cost is more important? I don't know
> that why the second query has lower cost while it has more I/O and more
> complicated plan!
> Thanks in advance,
> Amin
(Please reply only to the newsgroup)
|||Hugo,
Thank you for the tips!
Actually this is my case:
I have created "Students" table and "StudentExam" to track the exams that
each student has taken. I know that I must have an "Exams" table to keep
exam IDs, but I ignore that for this stage.
This is the script:
create table Students (
SID int,
SName varchar(10))
create table StudentExam (
SID int,
EName varchar(10))
-- inserting some records for each table
insert Students values(1,'David')
insert Students values(2,'Peter')
insert Students values(3,'Sarah')
insert StudentExam values(1,'sql')
insert StudentExam values(1,'vb')
insert StudentExam values(1,'asp')
insert StudentExam values(2,'vb')
insert StudentExam values(2,'asp')
insert StudentExam values(3,'xml')
insert StudentExam values(3,'access')
--Execute this insert statement for 12 times to have around 30,000 records:
insert StudentExam select * from StudentExam
--Now creating indexes:
create clustered index IX1 on StudentExam(sid)
create index IX2 on StudentExam(EName)
-- Now suppose that I need to delete the students who have NOT taken 'sql'
exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
begin tran
delete Students from StudentExam where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
begin tran
delete Students from StudentExam where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
The first delete statement has 12 logical reads, but the second one has 38.
The cost for first one is 54%, that I think it must have lower cost because
of the IO.
If I create the second index as clustered, and the first one as
non-clustered, the logical read for second delete statement will be 14, but
no change in the main cost.
Thanks in advance,
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:r141a0dp9l5ufrlnen38mgpclkih1l6b7d@.4ax.com...
> On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
>
> Hi Amin,
> Hard to say. Some other things you might try:
> 1. Reverse the order in which you execute the queries in your test
> scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
> but you never know).
> 2. Add "select getdate()" before and after each of the two queries, so
> you can compare the actual execution time.
> 3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
> I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
> your queries, so you can be sure it's not executed with an old
> execution plan (that might be based on outdated statistics).
> 4. Post the queries here. Maybe I (or someone else) can make some
> comments based on that. (Do add relevant table definitions, in the
> form of CREATE TABLE statements as well!)
> I assume you are already aware that you should run your performance
> tests on the production system or on a test system with about the same
> amount and distribution of data and the same specifications, that you
> should try to test when no other users or processes use SQL Server (or
> other apps on the same server) and that you should make sure your
> table statistics are updated before running your tests.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|||Gert-Jan,
Thank you for your reply. Would you please read my reply to Hugo and write
me your thoughts
Amin
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:40A125B3.AB8FF522@.toomuchspamalready.nl...[vbcol=seagreen]
> I assume you posted the estimations below. I would not base conclusions
> on these estimations. Whenever possible I would run the query with I/O
> count and CPU count turned on. In the end, the query (plan) with the
> lowest elapsed time is the winner...
> Gert-Jan
>
> Amin Sobati wrote:
query[vbcol=seagreen]
know
> --
> (Please reply only to the newsgroup)
|||"Amin Sobati" <amins@.morva.net> wrote in message
news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> Hugo,
> Thank you for the tips!
> Actually this is my case:
> I have created "Students" table and "StudentExam" to track the exams that
> each student has taken. I know that I must have an "Exams" table to keep
> exam IDs, but I ignore that for this stage.
You can't ignore that and expect to get any meaningful performance data.
Here's a more complete example with primary keys, referential integrity a
whole lot of students, 90% of whom have taken the SQL exam.
Both queries use exactly the same plan, and have exactly the same cost.
David
create table Students (
SID int not null identity primary key,
SName varchar(10))
create table Exams(EName varchar(10) not null primary key)
create table StudentExam (
SID int not null references Students on delete cascade,
EName varchar(10) not null references Exams on delete cascade,
constraint pk_StudentExam
primary key (SID,EName)
)
create index IX2 on StudentExam(EName)
-- inserting some records for each table
insert Students values('David')
insert Students values('Peter')
insert Students values('Sarah')
-- let the students start breeding
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Exams(EName) values('sql')
insert Exams(EName) values('asp')
insert Exams(EName) values('vb')
insert Exams(EName) values('xml')
insert Exams(EName) values('access')
--add exams for each SID,EName pair
insert StudentExam(SID,EName)
select SID, EName
from Students cross join Exams
--erase the SQL exam for 10% of the students
delete studentExam
where EName = 'sql'
and sid%10=0
-- Now suppose that I need to delete the students who have NOT taken
'sql'exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
print ' QUERY 1'
begin tran
delete Students where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
print ' QUERY 2'
begin tran
delete Students where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
/*
QUERY 1
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
QUERY 2
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
*/
|||David,
Thank you for the great example.
I admit that I must do everything right to get the best result, but I cannot
recognize that why a query might have more IO but lower cost(in my example).
Can I count on the cost(relative to the batch) as a criteria for choosing a
query?
Thanks,
Amin
"David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...[vbcol=seagreen]
> "Amin Sobati" <amins@.morva.net> wrote in message
> news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
that
> You can't ignore that and expect to get any meaningful performance data.
> Here's a more complete example with primary keys, referential integrity a
> whole lot of students, 90% of whom have taken the SQL exam.
> Both queries use exactly the same plan, and have exactly the same cost.
> David
>
> create table Students (
> SID int not null identity primary key,
> SName varchar(10))
> create table Exams(EName varchar(10) not null primary key)
>
> create table StudentExam (
> SID int not null references Students on delete cascade,
> EName varchar(10) not null references Exams on delete cascade,
> constraint pk_StudentExam
> primary key (SID,EName)
> )
> create index IX2 on StudentExam(EName)
> -- inserting some records for each table
> insert Students values('David')
> insert Students values('Peter')
> insert Students values('Sarah')
> -- let the students start breeding
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
>
> insert Exams(EName) values('sql')
> insert Exams(EName) values('asp')
> insert Exams(EName) values('vb')
> insert Exams(EName) values('xml')
> insert Exams(EName) values('access')
> --add exams for each SID,EName pair
> insert StudentExam(SID,EName)
> select SID, EName
> from Students cross join Exams
> --erase the SQL exam for 10% of the students
> delete studentExam
> where EName = 'sql'
> and sid%10=0
> -- Now suppose that I need to delete the students who have NOT taken
> 'sql'exam.
> --These are two delete statements that i want to choose the perfect one:
> set statistics IO ON
>
> print ' QUERY 1'
> begin tran
> delete Students where Students.sid not in (select
> StudentExam.sid from StudentExam where StudentExam.EName='sql')
> rollback tran
> print ' QUERY 2'
> begin tran
> delete Students where 'sql' not in (select
> StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
> rollback tran
> /*
> QUERY 1
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> QUERY 2
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> */
>
|||The query optimizer's effort depends on the estimated cost of the query.
First, the query optimizer will search for any 'obvious' query plans. If
the estimated time of these solutions is no more than 1 or 2 seconds, it
will not look any further, and simply use that plan. These kinds of
query plans can depend on the way the query is written. However, if
there is any performance difference between different query plans, the
difference will be small.
If there is no obvious fast query plan, the full compilation phase is
started. This means (virtually) all possible query plans will be
evaluated. If this phase starts to take too long, then the it is cut
short, and the fastest plan that was found thus far will be used.
This is why you are seeing that in David's example, SQL-Server will use
the same query plan for both queries. I can add another syntax (see
below), which SQL-Server will probably use the same query plan for.
print ' QUERY 3'
begin tran
delete Students where not exists (
select 1
from StudentExam
where Students.sid=StudentExam.sid
and StudentExam.EName='sql'
)
rollback tran
So what I am basically saying, is that the differences you are seeing
are not significant.
Gert-Jan
Amin Sobati wrote:[vbcol=seagreen]
> David,
> Thank you for the great example.
> I admit that I must do everything right to get the best result, but I cannot
> recognize that why a query might have more IO but lower cost(in my example).
> Can I count on the cost(relative to the batch) as a criteria for choosing a
> query?
> Thanks,
> Amin
> "David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
> message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...
> that
> 0,
> 0,
(Please reply only to the newsgroup)
Query cost and execution plan
Hi all,
I have written two queries that provide the same results(using different
types of joins.).
For choosing the better one, I pasted both of them in one script in query
analyzer and executed them.
Here is the summery of the result in execution plan:
First query:
- Query Cost: 55%
- I/O Cost: Scan count 7, logical reads 14
- Execution plan: Looks simple(a few icons..)
Second query:
- Query Cost: 45%
- I/O Cost: Scan count 9, logical reads 34
- Execution plan: Looks more complicated(a lot of icons!)
Which query will have better performance? Should I consider the query
cost(relative to the batch) or the I/O cost is more important? I don't know
that why the second query has lower cost while it has more I/O and more
complicated plan!
Thanks in advance,
AminOn Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
>Hi all,
>I have written two queries that provide the same results(using different
>types of joins.).
>For choosing the better one, I pasted both of them in one script in query
>analyzer and executed them.
>Here is the summery of the result in execution plan:
>First query:
>- Query Cost: 55%
>- I/O Cost: Scan count 7, logical reads 14
>- Execution plan: Looks simple(a few icons..)
>Second query:
>- Query Cost: 45%
>- I/O Cost: Scan count 9, logical reads 34
>- Execution plan: Looks more complicated(a lot of icons!)
>Which query will have better performance? Should I consider the query
>cost(relative to the batch) or the I/O cost is more important? I don't know
>that why the second query has lower cost while it has more I/O and more
>complicated plan!
>Thanks in advance,
>Amin
Hi Amin,
With only this information to go on, I'd say use the first query. The
reason for the lower query cost might well be that some (or maybe even
all) required data is read from disk for the first query, but still
available in the cache for the second query.
You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
query. This will ensure that the cache is empty.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks Hugo!
Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
which is confusing for me too!
Now, should I prefer the second query to the first?
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:7jpv9097l4u4bdnm5blhd4ql72oped6f70@.4ax.com...
> On Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
> >Hi all,
> >I have written two queries that provide the same results(using different
> >types of joins.).
> >For choosing the better one, I pasted both of them in one script in query
> >analyzer and executed them.
> >Here is the summery of the result in execution plan:
> >
> >First query:
> >- Query Cost: 55%
> >- I/O Cost: Scan count 7, logical reads 14
> >- Execution plan: Looks simple(a few icons..)
> >
> >Second query:
> >- Query Cost: 45%
> >- I/O Cost: Scan count 9, logical reads 34
> >- Execution plan: Looks more complicated(a lot of icons!)
> >
> >Which query will have better performance? Should I consider the query
> >cost(relative to the batch) or the I/O cost is more important? I don't
know
> >that why the second query has lower cost while it has more I/O and more
> >complicated plan!
> >
> >Thanks in advance,
> >
> >Amin
> Hi Amin,
> With only this information to go on, I'd say use the first query. The
> reason for the lower query cost might well be that some (or maybe even
> all) required data is read from disk for the first query, but still
> available in the cache for the second query.
> You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
> query. This will ensure that the cache is empty.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
>Thanks Hugo!
>Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
>which is confusing for me too!
>Now, should I prefer the second query to the first?
>Amin
Hi Amin,
Hard to say. Some other things you might try:
1. Reverse the order in which you execute the queries in your test
scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
but you never know).
2. Add "select getdate()" before and after each of the two queries, so
you can compare the actual execution time.
3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
your queries, so you can be sure it's not executed with an old
execution plan (that might be based on outdated statistics).
4. Post the queries here. Maybe I (or someone else) can make some
comments based on that. (Do add relevant table definitions, in the
form of CREATE TABLE statements as well!)
I assume you are already aware that you should run your performance
tests on the production system or on a test system with about the same
amount and distribution of data and the same specifications, that you
should try to test when no other users or processes use SQL Server (or
other apps on the same server) and that you should make sure your
table statistics are updated before running your tests.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||I assume you posted the estimations below. I would not base conclusions
on these estimations. Whenever possible I would run the query with I/O
count and CPU count turned on. In the end, the query (plan) with the
lowest elapsed time is the winner...
Gert-Jan
Amin Sobati wrote:
> Hi all,
> I have written two queries that provide the same results(using different
> types of joins.).
> For choosing the better one, I pasted both of them in one script in query
> analyzer and executed them.
> Here is the summery of the result in execution plan:
> First query:
> - Query Cost: 55%
> - I/O Cost: Scan count 7, logical reads 14
> - Execution plan: Looks simple(a few icons..)
> Second query:
> - Query Cost: 45%
> - I/O Cost: Scan count 9, logical reads 34
> - Execution plan: Looks more complicated(a lot of icons!)
> Which query will have better performance? Should I consider the query
> cost(relative to the batch) or the I/O cost is more important? I don't know
> that why the second query has lower cost while it has more I/O and more
> complicated plan!
> Thanks in advance,
> Amin
--
(Please reply only to the newsgroup)|||Gert-Jan,
Thank you for your reply. Would you please read my reply to Hugo and write
me your thoughts:)
Amin
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:40A125B3.AB8FF522@.toomuchspamalready.nl...
> I assume you posted the estimations below. I would not base conclusions
> on these estimations. Whenever possible I would run the query with I/O
> count and CPU count turned on. In the end, the query (plan) with the
> lowest elapsed time is the winner...
> Gert-Jan
>
> Amin Sobati wrote:
> >
> > Hi all,
> > I have written two queries that provide the same results(using different
> > types of joins.).
> > For choosing the better one, I pasted both of them in one script in
query
> > analyzer and executed them.
> > Here is the summery of the result in execution plan:
> >
> > First query:
> > - Query Cost: 55%
> > - I/O Cost: Scan count 7, logical reads 14
> > - Execution plan: Looks simple(a few icons..)
> >
> > Second query:
> > - Query Cost: 45%
> > - I/O Cost: Scan count 9, logical reads 34
> > - Execution plan: Looks more complicated(a lot of icons!)
> >
> > Which query will have better performance? Should I consider the query
> > cost(relative to the batch) or the I/O cost is more important? I don't
know
> > that why the second query has lower cost while it has more I/O and more
> > complicated plan!
> >
> > Thanks in advance,
> >
> > Amin
> --
> (Please reply only to the newsgroup)|||Hugo,
Thank you for the tips!
Actually this is my case:
I have created "Students" table and "StudentExam" to track the exams that
each student has taken. I know that I must have an "Exams" table to keep
exam IDs, but I ignore that for this stage.
This is the script:
----
create table Students (
SID int,
SName varchar(10))
create table StudentExam (
SID int,
EName varchar(10))
-- inserting some records for each table
insert Students values(1,'David')
insert Students values(2,'Peter')
insert Students values(3,'Sarah')
insert StudentExam values(1,'sql')
insert StudentExam values(1,'vb')
insert StudentExam values(1,'asp')
insert StudentExam values(2,'vb')
insert StudentExam values(2,'asp')
insert StudentExam values(3,'xml')
insert StudentExam values(3,'access')
--Execute this insert statement for 12 times to have around 30,000 records:
insert StudentExam select * from StudentExam
--Now creating indexes:
create clustered index IX1 on StudentExam(sid)
create index IX2 on StudentExam(EName)
-- Now suppose that I need to delete the students who have NOT taken 'sql'
exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
begin tran
delete Students from StudentExam where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
begin tran
delete Students from StudentExam where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
----
The first delete statement has 12 logical reads, but the second one has 38.
The cost for first one is 54%, that I think it must have lower cost because
of the IO.
If I create the second index as clustered, and the first one as
non-clustered, the logical read for second delete statement will be 14, but
no change in the main cost.
Thanks in advance,
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:r141a0dp9l5ufrlnen38mgpclkih1l6b7d@.4ax.com...
> On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
> >Thanks Hugo!
> >Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
> >which is confusing for me too!
> >Now, should I prefer the second query to the first?
> >Amin
> Hi Amin,
> Hard to say. Some other things you might try:
> 1. Reverse the order in which you execute the queries in your test
> scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
> but you never know).
> 2. Add "select getdate()" before and after each of the two queries, so
> you can compare the actual execution time.
> 3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
> I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
> your queries, so you can be sure it's not executed with an old
> execution plan (that might be based on outdated statistics).
> 4. Post the queries here. Maybe I (or someone else) can make some
> comments based on that. (Do add relevant table definitions, in the
> form of CREATE TABLE statements as well!)
> I assume you are already aware that you should run your performance
> tests on the production system or on a test system with about the same
> amount and distribution of data and the same specifications, that you
> should try to test when no other users or processes use SQL Server (or
> other apps on the same server) and that you should make sure your
> table statistics are updated before running your tests.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||"Amin Sobati" <amins@.morva.net> wrote in message
news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> Hugo,
> Thank you for the tips!
> Actually this is my case:
> I have created "Students" table and "StudentExam" to track the exams that
> each student has taken. I know that I must have an "Exams" table to keep
> exam IDs, but I ignore that for this stage.
You can't ignore that and expect to get any meaningful performance data.
Here's a more complete example with primary keys, referential integrity a
whole lot of students, 90% of whom have taken the SQL exam.
Both queries use exactly the same plan, and have exactly the same cost.
David
create table Students (
SID int not null identity primary key,
SName varchar(10))
create table Exams(EName varchar(10) not null primary key)
create table StudentExam (
SID int not null references Students on delete cascade,
EName varchar(10) not null references Exams on delete cascade,
constraint pk_StudentExam
primary key (SID,EName)
)
create index IX2 on StudentExam(EName)
-- inserting some records for each table
insert Students values('David')
insert Students values('Peter')
insert Students values('Sarah')
-- let the students start breeding
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Exams(EName) values('sql')
insert Exams(EName) values('asp')
insert Exams(EName) values('vb')
insert Exams(EName) values('xml')
insert Exams(EName) values('access')
--add exams for each SID,EName pair
insert StudentExam(SID,EName)
select SID, EName
from Students cross join Exams
--erase the SQL exam for 10% of the students
delete studentExam
where EName = 'sql'
and sid%10=0
-- Now suppose that I need to delete the students who have NOT taken
'sql'exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
print ' QUERY 1'
begin tran
delete Students where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
print ' QUERY 2'
begin tran
delete Students where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
/*
QUERY 1
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
QUERY 2
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
*/|||David,
Thank you for the great example.
I admit that I must do everything right to get the best result, but I cannot
recognize that why a query might have more IO but lower cost(in my example).
Can I count on the cost(relative to the batch) as a criteria for choosing a
query?
Thanks,
Amin
"David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...
> "Amin Sobati" <amins@.morva.net> wrote in message
> news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> > Hugo,
> > Thank you for the tips!
> > Actually this is my case:
> > I have created "Students" table and "StudentExam" to track the exams
that
> > each student has taken. I know that I must have an "Exams" table to keep
> > exam IDs, but I ignore that for this stage.
> You can't ignore that and expect to get any meaningful performance data.
> Here's a more complete example with primary keys, referential integrity a
> whole lot of students, 90% of whom have taken the SQL exam.
> Both queries use exactly the same plan, and have exactly the same cost.
> David
>
> create table Students (
> SID int not null identity primary key,
> SName varchar(10))
> create table Exams(EName varchar(10) not null primary key)
>
> create table StudentExam (
> SID int not null references Students on delete cascade,
> EName varchar(10) not null references Exams on delete cascade,
> constraint pk_StudentExam
> primary key (SID,EName)
> )
> create index IX2 on StudentExam(EName)
> -- inserting some records for each table
> insert Students values('David')
> insert Students values('Peter')
> insert Students values('Sarah')
> -- let the students start breeding
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
>
> insert Exams(EName) values('sql')
> insert Exams(EName) values('asp')
> insert Exams(EName) values('vb')
> insert Exams(EName) values('xml')
> insert Exams(EName) values('access')
> --add exams for each SID,EName pair
> insert StudentExam(SID,EName)
> select SID, EName
> from Students cross join Exams
> --erase the SQL exam for 10% of the students
> delete studentExam
> where EName = 'sql'
> and sid%10=0
> -- Now suppose that I need to delete the students who have NOT taken
> 'sql'exam.
> --These are two delete statements that i want to choose the perfect one:
> set statistics IO ON
>
> print ' QUERY 1'
> begin tran
> delete Students where Students.sid not in (select
> StudentExam.sid from StudentExam where StudentExam.EName='sql')
> rollback tran
> print ' QUERY 2'
> begin tran
> delete Students where 'sql' not in (select
> StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
> rollback tran
> /*
> QUERY 1
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> QUERY 2
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> */
>|||The query optimizer's effort depends on the estimated cost of the query.
First, the query optimizer will search for any 'obvious' query plans. If
the estimated time of these solutions is no more than 1 or 2 seconds, it
will not look any further, and simply use that plan. These kinds of
query plans can depend on the way the query is written. However, if
there is any performance difference between different query plans, the
difference will be small.
If there is no obvious fast query plan, the full compilation phase is
started. This means (virtually) all possible query plans will be
evaluated. If this phase starts to take too long, then the it is cut
short, and the fastest plan that was found thus far will be used.
This is why you are seeing that in David's example, SQL-Server will use
the same query plan for both queries. I can add another syntax (see
below), which SQL-Server will probably use the same query plan for.
print ' QUERY 3'
begin tran
delete Students where not exists (
select 1
from StudentExam
where Students.sid=StudentExam.sid
and StudentExam.EName='sql'
)
rollback tran
So what I am basically saying, is that the differences you are seeing
are not significant.
Gert-Jan
Amin Sobati wrote:
> David,
> Thank you for the great example.
> I admit that I must do everything right to get the best result, but I cannot
> recognize that why a query might have more IO but lower cost(in my example).
> Can I count on the cost(relative to the batch) as a criteria for choosing a
> query?
> Thanks,
> Amin
> "David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
> message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...
> >
> > "Amin Sobati" <amins@.morva.net> wrote in message
> > news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> > > Hugo,
> > > Thank you for the tips!
> > > Actually this is my case:
> > > I have created "Students" table and "StudentExam" to track the exams
> that
> > > each student has taken. I know that I must have an "Exams" table to keep
> > > exam IDs, but I ignore that for this stage.
> >
> > You can't ignore that and expect to get any meaningful performance data.
> >
> > Here's a more complete example with primary keys, referential integrity a
> > whole lot of students, 90% of whom have taken the SQL exam.
> >
> > Both queries use exactly the same plan, and have exactly the same cost.
> >
> > David
> >
> >
> >
> > create table Students (
> > SID int not null identity primary key,
> > SName varchar(10))
> >
> > create table Exams(EName varchar(10) not null primary key)
> >
> >
> > create table StudentExam (
> > SID int not null references Students on delete cascade,
> > EName varchar(10) not null references Exams on delete cascade,
> > constraint pk_StudentExam
> > primary key (SID,EName)
> > )
> > create index IX2 on StudentExam(EName)
> >
> > -- inserting some records for each table
> > insert Students values('David')
> > insert Students values('Peter')
> > insert Students values('Sarah')
> >
> > -- let the students start breeding
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> >
> >
> > insert Exams(EName) values('sql')
> > insert Exams(EName) values('asp')
> > insert Exams(EName) values('vb')
> > insert Exams(EName) values('xml')
> > insert Exams(EName) values('access')
> >
> > --add exams for each SID,EName pair
> > insert StudentExam(SID,EName)
> > select SID, EName
> > from Students cross join Exams
> >
> > --erase the SQL exam for 10% of the students
> > delete studentExam
> > where EName = 'sql'
> > and sid%10=0
> >
> > -- Now suppose that I need to delete the students who have NOT taken
> > 'sql'exam.
> > --These are two delete statements that i want to choose the perfect one:
> > set statistics IO ON
> >
> >
> > print ' QUERY 1'
> > begin tran
> > delete Students where Students.sid not in (select
> > StudentExam.sid from StudentExam where StudentExam.EName='sql')
> > rollback tran
> >
> > print ' QUERY 2'
> > begin tran
> > delete Students where 'sql' not in (select
> > StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
> > rollback tran
> >
> > /*
> > QUERY 1
> >
> > (1228 row(s) affected)
> >
> > Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
> 0,
> > read-ahead reads 0.
> > Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> > read-ahead reads 0.
> > Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> > read-ahead reads 0.
> > Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> > read-ahead reads 0.
> > QUERY 2
> >
> > (1228 row(s) affected)
> >
> > Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
> 0,
> > read-ahead reads 0.
> > Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> > read-ahead reads 0.
> > Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> > read-ahead reads 0.
> > Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> > read-ahead reads 0.
> > */
> >
> >
--
(Please reply only to the newsgroup)|||Thanks Gert-Jan!
Regardless of this particular scenario, I'd like to find the best way for
comparing the performance of two or more queries.
What I've been doing so far, I executed for example two queries and compared
their cost(relative to the batch). Can this cost be the factor that I must
count on?
What made me to start posting this question was that I encountered with a
query that had more IO cost than the other but less Cost. I need to know
that I must choose the first query(with less IO, more cost) or the second
query(with more IO, less cost).
Thanks,
Amin
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:40A2A554.20A1EDEE@.toomuchspamalready.nl...
> The query optimizer's effort depends on the estimated cost of the query.
> First, the query optimizer will search for any 'obvious' query plans. If
> the estimated time of these solutions is no more than 1 or 2 seconds, it
> will not look any further, and simply use that plan. These kinds of
> query plans can depend on the way the query is written. However, if
> there is any performance difference between different query plans, the
> difference will be small.
> If there is no obvious fast query plan, the full compilation phase is
> started. This means (virtually) all possible query plans will be
> evaluated. If this phase starts to take too long, then the it is cut
> short, and the fastest plan that was found thus far will be used.
> This is why you are seeing that in David's example, SQL-Server will use
> the same query plan for both queries. I can add another syntax (see
> below), which SQL-Server will probably use the same query plan for.
> print ' QUERY 3'
> begin tran
> delete Students where not exists (
> select 1
> from StudentExam
> where Students.sid=StudentExam.sid
> and StudentExam.EName='sql'
> )
> rollback tran
> So what I am basically saying, is that the differences you are seeing
> are not significant.
> Gert-Jan
>
> Amin Sobati wrote:
> >
> > David,
> > Thank you for the great example.
> > I admit that I must do everything right to get the best result, but I
cannot
> > recognize that why a query might have more IO but lower cost(in my
example).
> > Can I count on the cost(relative to the batch) as a criteria for
choosing a
> > query?
> >
> > Thanks,
> > Amin
> >
> > "David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
> > message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...
> > >
> > > "Amin Sobati" <amins@.morva.net> wrote in message
> > > news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> > > > Hugo,
> > > > Thank you for the tips!
> > > > Actually this is my case:
> > > > I have created "Students" table and "StudentExam" to track the exams
> > that
> > > > each student has taken. I know that I must have an "Exams" table to
keep
> > > > exam IDs, but I ignore that for this stage.
> > >
> > > You can't ignore that and expect to get any meaningful performance
data.
> > >
> > > Here's a more complete example with primary keys, referential
integrity a
> > > whole lot of students, 90% of whom have taken the SQL exam.
> > >
> > > Both queries use exactly the same plan, and have exactly the same
cost.
> > >
> > > David
> > >
> > >
> > >
> > > create table Students (
> > > SID int not null identity primary key,
> > > SName varchar(10))
> > >
> > > create table Exams(EName varchar(10) not null primary key)
> > >
> > >
> > > create table StudentExam (
> > > SID int not null references Students on delete cascade,
> > > EName varchar(10) not null references Exams on delete cascade,
> > > constraint pk_StudentExam
> > > primary key (SID,EName)
> > > )
> > > create index IX2 on StudentExam(EName)
> > >
> > > -- inserting some records for each table
> > > insert Students values('David')
> > > insert Students values('Peter')
> > > insert Students values('Sarah')
> > >
> > > -- let the students start breeding
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > >
> > >
> > > insert Exams(EName) values('sql')
> > > insert Exams(EName) values('asp')
> > > insert Exams(EName) values('vb')
> > > insert Exams(EName) values('xml')
> > > insert Exams(EName) values('access')
> > >
> > > --add exams for each SID,EName pair
> > > insert StudentExam(SID,EName)
> > > select SID, EName
> > > from Students cross join Exams
> > >
> > > --erase the SQL exam for 10% of the students
> > > delete studentExam
> > > where EName = 'sql'
> > > and sid%10=0
> > >
> > > -- Now suppose that I need to delete the students who have NOT taken
> > > 'sql'exam.
> > > --These are two delete statements that i want to choose the perfect
one:
> > > set statistics IO ON
> > >
> > >
> > > print ' QUERY 1'
> > > begin tran
> > > delete Students where Students.sid not in (select
> > > StudentExam.sid from StudentExam where StudentExam.EName='sql')
> > > rollback tran
> > >
> > > print ' QUERY 2'
> > > begin tran
> > > delete Students where 'sql' not in (select
> > > StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
> > > rollback tran
> > >
> > > /*
> > > QUERY 1
> > >
> > > (1228 row(s) affected)
> > >
> > > Table 'StudentExam'. Scan count 4914, logical reads 34734, physical
reads
> > 0,
> > > read-ahead reads 0.
> > > Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads
0,
> > > read-ahead reads 0.
> > > Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> > > read-ahead reads 0.
> > > Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> > > read-ahead reads 0.
> > > QUERY 2
> > >
> > > (1228 row(s) affected)
> > >
> > > Table 'StudentExam'. Scan count 4914, logical reads 34734, physical
reads
> > 0,
> > > read-ahead reads 0.
> > > Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads
0,
> > > read-ahead reads 0.
> > > Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> > > read-ahead reads 0.
> > > Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> > > read-ahead reads 0.
> > > */
> > >
> > >
> --
> (Please reply only to the newsgroup)|||Well, the query plan shows estimates of the execution times. The
estimated subtree cost of the left most element is the estimated time in
seconds for the query. So whichever query plan has the lowest estimated
cost is likely to be the fastest. However, things like locking strategy
can also play a big part on a stressed server. This is not really
expressed or taking into consideration in the query plan.
The I/O count translates into the estimated I/O cost, which is (just
like the estimated CPU cost) just one component in the total estimated
cost.
If you can choose between two query plans with the same estimated cost,
I would choose the plan that uses the least amount of resources.
Gert-Jan
Amin Sobati wrote:
> Thanks Gert-Jan!
> Regardless of this particular scenario, I'd like to find the best way for
> comparing the performance of two or more queries.
> What I've been doing so far, I executed for example two queries and compared
> their cost(relative to the batch). Can this cost be the factor that I must
> count on?
> What made me to start posting this question was that I encountered with a
> query that had more IO cost than the other but less Cost. I need to know
> that I must choose the first query(with less IO, more cost) or the second
> query(with more IO, less cost).
> Thanks,
> Amin
I have written two queries that provide the same results(using different
types of joins.).
For choosing the better one, I pasted both of them in one script in query
analyzer and executed them.
Here is the summery of the result in execution plan:
First query:
- Query Cost: 55%
- I/O Cost: Scan count 7, logical reads 14
- Execution plan: Looks simple(a few icons..)
Second query:
- Query Cost: 45%
- I/O Cost: Scan count 9, logical reads 34
- Execution plan: Looks more complicated(a lot of icons!)
Which query will have better performance? Should I consider the query
cost(relative to the batch) or the I/O cost is more important? I don't know
that why the second query has lower cost while it has more I/O and more
complicated plan!
Thanks in advance,
AminOn Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
>Hi all,
>I have written two queries that provide the same results(using different
>types of joins.).
>For choosing the better one, I pasted both of them in one script in query
>analyzer and executed them.
>Here is the summery of the result in execution plan:
>First query:
>- Query Cost: 55%
>- I/O Cost: Scan count 7, logical reads 14
>- Execution plan: Looks simple(a few icons..)
>Second query:
>- Query Cost: 45%
>- I/O Cost: Scan count 9, logical reads 34
>- Execution plan: Looks more complicated(a lot of icons!)
>Which query will have better performance? Should I consider the query
>cost(relative to the batch) or the I/O cost is more important? I don't know
>that why the second query has lower cost while it has more I/O and more
>complicated plan!
>Thanks in advance,
>Amin
Hi Amin,
With only this information to go on, I'd say use the first query. The
reason for the lower query cost might well be that some (or maybe even
all) required data is read from disk for the first query, but still
available in the cache for the second query.
You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
query. This will ensure that the cache is empty.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks Hugo!
Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
which is confusing for me too!
Now, should I prefer the second query to the first?
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:7jpv9097l4u4bdnm5blhd4ql72oped6f70@.4ax.com...
> On Mon, 10 May 2004 21:03:39 +0430, Amin Sobati wrote:
> >Hi all,
> >I have written two queries that provide the same results(using different
> >types of joins.).
> >For choosing the better one, I pasted both of them in one script in query
> >analyzer and executed them.
> >Here is the summery of the result in execution plan:
> >
> >First query:
> >- Query Cost: 55%
> >- I/O Cost: Scan count 7, logical reads 14
> >- Execution plan: Looks simple(a few icons..)
> >
> >Second query:
> >- Query Cost: 45%
> >- I/O Cost: Scan count 9, logical reads 34
> >- Execution plan: Looks more complicated(a lot of icons!)
> >
> >Which query will have better performance? Should I consider the query
> >cost(relative to the batch) or the I/O cost is more important? I don't
know
> >that why the second query has lower cost while it has more I/O and more
> >complicated plan!
> >
> >Thanks in advance,
> >
> >Amin
> Hi Amin,
> With only this information to go on, I'd say use the first query. The
> reason for the lower query cost might well be that some (or maybe even
> all) required data is read from disk for the first query, but still
> available in the cache for the second query.
> You get a beter test if you issue DBCC DROPCLEANBUFFERS before each
> query. This will ensure that the cache is empty.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
>Thanks Hugo!
>Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
>which is confusing for me too!
>Now, should I prefer the second query to the first?
>Amin
Hi Amin,
Hard to say. Some other things you might try:
1. Reverse the order in which you execute the queries in your test
scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
but you never know).
2. Add "select getdate()" before and after each of the two queries, so
you can compare the actual execution time.
3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
your queries, so you can be sure it's not executed with an old
execution plan (that might be based on outdated statistics).
4. Post the queries here. Maybe I (or someone else) can make some
comments based on that. (Do add relevant table definitions, in the
form of CREATE TABLE statements as well!)
I assume you are already aware that you should run your performance
tests on the production system or on a test system with about the same
amount and distribution of data and the same specifications, that you
should try to test when no other users or processes use SQL Server (or
other apps on the same server) and that you should make sure your
table statistics are updated before running your tests.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||I assume you posted the estimations below. I would not base conclusions
on these estimations. Whenever possible I would run the query with I/O
count and CPU count turned on. In the end, the query (plan) with the
lowest elapsed time is the winner...
Gert-Jan
Amin Sobati wrote:
> Hi all,
> I have written two queries that provide the same results(using different
> types of joins.).
> For choosing the better one, I pasted both of them in one script in query
> analyzer and executed them.
> Here is the summery of the result in execution plan:
> First query:
> - Query Cost: 55%
> - I/O Cost: Scan count 7, logical reads 14
> - Execution plan: Looks simple(a few icons..)
> Second query:
> - Query Cost: 45%
> - I/O Cost: Scan count 9, logical reads 34
> - Execution plan: Looks more complicated(a lot of icons!)
> Which query will have better performance? Should I consider the query
> cost(relative to the batch) or the I/O cost is more important? I don't know
> that why the second query has lower cost while it has more I/O and more
> complicated plan!
> Thanks in advance,
> Amin
--
(Please reply only to the newsgroup)|||Gert-Jan,
Thank you for your reply. Would you please read my reply to Hugo and write
me your thoughts:)
Amin
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:40A125B3.AB8FF522@.toomuchspamalready.nl...
> I assume you posted the estimations below. I would not base conclusions
> on these estimations. Whenever possible I would run the query with I/O
> count and CPU count turned on. In the end, the query (plan) with the
> lowest elapsed time is the winner...
> Gert-Jan
>
> Amin Sobati wrote:
> >
> > Hi all,
> > I have written two queries that provide the same results(using different
> > types of joins.).
> > For choosing the better one, I pasted both of them in one script in
query
> > analyzer and executed them.
> > Here is the summery of the result in execution plan:
> >
> > First query:
> > - Query Cost: 55%
> > - I/O Cost: Scan count 7, logical reads 14
> > - Execution plan: Looks simple(a few icons..)
> >
> > Second query:
> > - Query Cost: 45%
> > - I/O Cost: Scan count 9, logical reads 34
> > - Execution plan: Looks more complicated(a lot of icons!)
> >
> > Which query will have better performance? Should I consider the query
> > cost(relative to the batch) or the I/O cost is more important? I don't
know
> > that why the second query has lower cost while it has more I/O and more
> > complicated plan!
> >
> > Thanks in advance,
> >
> > Amin
> --
> (Please reply only to the newsgroup)|||Hugo,
Thank you for the tips!
Actually this is my case:
I have created "Students" table and "StudentExam" to track the exams that
each student has taken. I know that I must have an "Exams" table to keep
exam IDs, but I ignore that for this stage.
This is the script:
----
create table Students (
SID int,
SName varchar(10))
create table StudentExam (
SID int,
EName varchar(10))
-- inserting some records for each table
insert Students values(1,'David')
insert Students values(2,'Peter')
insert Students values(3,'Sarah')
insert StudentExam values(1,'sql')
insert StudentExam values(1,'vb')
insert StudentExam values(1,'asp')
insert StudentExam values(2,'vb')
insert StudentExam values(2,'asp')
insert StudentExam values(3,'xml')
insert StudentExam values(3,'access')
--Execute this insert statement for 12 times to have around 30,000 records:
insert StudentExam select * from StudentExam
--Now creating indexes:
create clustered index IX1 on StudentExam(sid)
create index IX2 on StudentExam(EName)
-- Now suppose that I need to delete the students who have NOT taken 'sql'
exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
begin tran
delete Students from StudentExam where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
begin tran
delete Students from StudentExam where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
----
The first delete statement has 12 logical reads, but the second one has 38.
The cost for first one is 54%, that I think it must have lower cost because
of the IO.
If I create the second index as clustered, and the first one as
non-clustered, the logical read for second delete statement will be 14, but
no change in the main cost.
Thanks in advance,
Amin
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:r141a0dp9l5ufrlnen38mgpclkih1l6b7d@.4ax.com...
> On Tue, 11 May 2004 11:37:02 +0430, Amin Sobati wrote:
> >Thanks Hugo!
> >Actually there was no change in result(by using DBCC DROPCLEANBUFFERS),
> >which is confusing for me too!
> >Now, should I prefer the second query to the first?
> >Amin
> Hi Amin,
> Hard to say. Some other things you might try:
> 1. Reverse the order in which you execute the queries in your test
> scripts (should not make any difference with DBCC DROPCLEANBUFFERS,
> but you never know).
> 2. Add "select getdate()" before and after each of the two queries, so
> you can compare the actual execution time.
> 3. Add DBCC FREEPROCCACHE as well (sorry, forgot to mention that when
> I suggested DBCC DROPCLEANBUFFERS). This will force recompilation of
> your queries, so you can be sure it's not executed with an old
> execution plan (that might be based on outdated statistics).
> 4. Post the queries here. Maybe I (or someone else) can make some
> comments based on that. (Do add relevant table definitions, in the
> form of CREATE TABLE statements as well!)
> I assume you are already aware that you should run your performance
> tests on the production system or on a test system with about the same
> amount and distribution of data and the same specifications, that you
> should try to test when no other users or processes use SQL Server (or
> other apps on the same server) and that you should make sure your
> table statistics are updated before running your tests.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||"Amin Sobati" <amins@.morva.net> wrote in message
news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> Hugo,
> Thank you for the tips!
> Actually this is my case:
> I have created "Students" table and "StudentExam" to track the exams that
> each student has taken. I know that I must have an "Exams" table to keep
> exam IDs, but I ignore that for this stage.
You can't ignore that and expect to get any meaningful performance data.
Here's a more complete example with primary keys, referential integrity a
whole lot of students, 90% of whom have taken the SQL exam.
Both queries use exactly the same plan, and have exactly the same cost.
David
create table Students (
SID int not null identity primary key,
SName varchar(10))
create table Exams(EName varchar(10) not null primary key)
create table StudentExam (
SID int not null references Students on delete cascade,
EName varchar(10) not null references Exams on delete cascade,
constraint pk_StudentExam
primary key (SID,EName)
)
create index IX2 on StudentExam(EName)
-- inserting some records for each table
insert Students values('David')
insert Students values('Peter')
insert Students values('Sarah')
-- let the students start breeding
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Students(SName) select SName from Students
insert Exams(EName) values('sql')
insert Exams(EName) values('asp')
insert Exams(EName) values('vb')
insert Exams(EName) values('xml')
insert Exams(EName) values('access')
--add exams for each SID,EName pair
insert StudentExam(SID,EName)
select SID, EName
from Students cross join Exams
--erase the SQL exam for 10% of the students
delete studentExam
where EName = 'sql'
and sid%10=0
-- Now suppose that I need to delete the students who have NOT taken
'sql'exam.
--These are two delete statements that i want to choose the perfect one:
set statistics IO ON
print ' QUERY 1'
begin tran
delete Students where Students.sid not in (select
StudentExam.sid from StudentExam where StudentExam.EName='sql')
rollback tran
print ' QUERY 2'
begin tran
delete Students where 'sql' not in (select
StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
rollback tran
/*
QUERY 1
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
QUERY 2
(1228 row(s) affected)
Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
read-ahead reads 0.
Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
read-ahead reads 0.
*/|||David,
Thank you for the great example.
I admit that I must do everything right to get the best result, but I cannot
recognize that why a query might have more IO but lower cost(in my example).
Can I count on the cost(relative to the batch) as a criteria for choosing a
query?
Thanks,
Amin
"David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...
> "Amin Sobati" <amins@.morva.net> wrote in message
> news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> > Hugo,
> > Thank you for the tips!
> > Actually this is my case:
> > I have created "Students" table and "StudentExam" to track the exams
that
> > each student has taken. I know that I must have an "Exams" table to keep
> > exam IDs, but I ignore that for this stage.
> You can't ignore that and expect to get any meaningful performance data.
> Here's a more complete example with primary keys, referential integrity a
> whole lot of students, 90% of whom have taken the SQL exam.
> Both queries use exactly the same plan, and have exactly the same cost.
> David
>
> create table Students (
> SID int not null identity primary key,
> SName varchar(10))
> create table Exams(EName varchar(10) not null primary key)
>
> create table StudentExam (
> SID int not null references Students on delete cascade,
> EName varchar(10) not null references Exams on delete cascade,
> constraint pk_StudentExam
> primary key (SID,EName)
> )
> create index IX2 on StudentExam(EName)
> -- inserting some records for each table
> insert Students values('David')
> insert Students values('Peter')
> insert Students values('Sarah')
> -- let the students start breeding
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
> insert Students(SName) select SName from Students
>
> insert Exams(EName) values('sql')
> insert Exams(EName) values('asp')
> insert Exams(EName) values('vb')
> insert Exams(EName) values('xml')
> insert Exams(EName) values('access')
> --add exams for each SID,EName pair
> insert StudentExam(SID,EName)
> select SID, EName
> from Students cross join Exams
> --erase the SQL exam for 10% of the students
> delete studentExam
> where EName = 'sql'
> and sid%10=0
> -- Now suppose that I need to delete the students who have NOT taken
> 'sql'exam.
> --These are two delete statements that i want to choose the perfect one:
> set statistics IO ON
>
> print ' QUERY 1'
> begin tran
> delete Students where Students.sid not in (select
> StudentExam.sid from StudentExam where StudentExam.EName='sql')
> rollback tran
> print ' QUERY 2'
> begin tran
> delete Students where 'sql' not in (select
> StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
> rollback tran
> /*
> QUERY 1
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> QUERY 2
> (1228 row(s) affected)
> Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> read-ahead reads 0.
> Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> read-ahead reads 0.
> Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> read-ahead reads 0.
> */
>|||The query optimizer's effort depends on the estimated cost of the query.
First, the query optimizer will search for any 'obvious' query plans. If
the estimated time of these solutions is no more than 1 or 2 seconds, it
will not look any further, and simply use that plan. These kinds of
query plans can depend on the way the query is written. However, if
there is any performance difference between different query plans, the
difference will be small.
If there is no obvious fast query plan, the full compilation phase is
started. This means (virtually) all possible query plans will be
evaluated. If this phase starts to take too long, then the it is cut
short, and the fastest plan that was found thus far will be used.
This is why you are seeing that in David's example, SQL-Server will use
the same query plan for both queries. I can add another syntax (see
below), which SQL-Server will probably use the same query plan for.
print ' QUERY 3'
begin tran
delete Students where not exists (
select 1
from StudentExam
where Students.sid=StudentExam.sid
and StudentExam.EName='sql'
)
rollback tran
So what I am basically saying, is that the differences you are seeing
are not significant.
Gert-Jan
Amin Sobati wrote:
> David,
> Thank you for the great example.
> I admit that I must do everything right to get the best result, but I cannot
> recognize that why a query might have more IO but lower cost(in my example).
> Can I count on the cost(relative to the batch) as a criteria for choosing a
> query?
> Thanks,
> Amin
> "David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
> message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...
> >
> > "Amin Sobati" <amins@.morva.net> wrote in message
> > news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> > > Hugo,
> > > Thank you for the tips!
> > > Actually this is my case:
> > > I have created "Students" table and "StudentExam" to track the exams
> that
> > > each student has taken. I know that I must have an "Exams" table to keep
> > > exam IDs, but I ignore that for this stage.
> >
> > You can't ignore that and expect to get any meaningful performance data.
> >
> > Here's a more complete example with primary keys, referential integrity a
> > whole lot of students, 90% of whom have taken the SQL exam.
> >
> > Both queries use exactly the same plan, and have exactly the same cost.
> >
> > David
> >
> >
> >
> > create table Students (
> > SID int not null identity primary key,
> > SName varchar(10))
> >
> > create table Exams(EName varchar(10) not null primary key)
> >
> >
> > create table StudentExam (
> > SID int not null references Students on delete cascade,
> > EName varchar(10) not null references Exams on delete cascade,
> > constraint pk_StudentExam
> > primary key (SID,EName)
> > )
> > create index IX2 on StudentExam(EName)
> >
> > -- inserting some records for each table
> > insert Students values('David')
> > insert Students values('Peter')
> > insert Students values('Sarah')
> >
> > -- let the students start breeding
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> > insert Students(SName) select SName from Students
> >
> >
> > insert Exams(EName) values('sql')
> > insert Exams(EName) values('asp')
> > insert Exams(EName) values('vb')
> > insert Exams(EName) values('xml')
> > insert Exams(EName) values('access')
> >
> > --add exams for each SID,EName pair
> > insert StudentExam(SID,EName)
> > select SID, EName
> > from Students cross join Exams
> >
> > --erase the SQL exam for 10% of the students
> > delete studentExam
> > where EName = 'sql'
> > and sid%10=0
> >
> > -- Now suppose that I need to delete the students who have NOT taken
> > 'sql'exam.
> > --These are two delete statements that i want to choose the perfect one:
> > set statistics IO ON
> >
> >
> > print ' QUERY 1'
> > begin tran
> > delete Students where Students.sid not in (select
> > StudentExam.sid from StudentExam where StudentExam.EName='sql')
> > rollback tran
> >
> > print ' QUERY 2'
> > begin tran
> > delete Students where 'sql' not in (select
> > StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
> > rollback tran
> >
> > /*
> > QUERY 1
> >
> > (1228 row(s) affected)
> >
> > Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
> 0,
> > read-ahead reads 0.
> > Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> > read-ahead reads 0.
> > Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> > read-ahead reads 0.
> > Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> > read-ahead reads 0.
> > QUERY 2
> >
> > (1228 row(s) affected)
> >
> > Table 'StudentExam'. Scan count 4914, logical reads 34734, physical reads
> 0,
> > read-ahead reads 0.
> > Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads 0,
> > read-ahead reads 0.
> > Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> > read-ahead reads 0.
> > Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> > read-ahead reads 0.
> > */
> >
> >
--
(Please reply only to the newsgroup)|||Thanks Gert-Jan!
Regardless of this particular scenario, I'd like to find the best way for
comparing the performance of two or more queries.
What I've been doing so far, I executed for example two queries and compared
their cost(relative to the batch). Can this cost be the factor that I must
count on?
What made me to start posting this question was that I encountered with a
query that had more IO cost than the other but less Cost. I need to know
that I must choose the first query(with less IO, more cost) or the second
query(with more IO, less cost).
Thanks,
Amin
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:40A2A554.20A1EDEE@.toomuchspamalready.nl...
> The query optimizer's effort depends on the estimated cost of the query.
> First, the query optimizer will search for any 'obvious' query plans. If
> the estimated time of these solutions is no more than 1 or 2 seconds, it
> will not look any further, and simply use that plan. These kinds of
> query plans can depend on the way the query is written. However, if
> there is any performance difference between different query plans, the
> difference will be small.
> If there is no obvious fast query plan, the full compilation phase is
> started. This means (virtually) all possible query plans will be
> evaluated. If this phase starts to take too long, then the it is cut
> short, and the fastest plan that was found thus far will be used.
> This is why you are seeing that in David's example, SQL-Server will use
> the same query plan for both queries. I can add another syntax (see
> below), which SQL-Server will probably use the same query plan for.
> print ' QUERY 3'
> begin tran
> delete Students where not exists (
> select 1
> from StudentExam
> where Students.sid=StudentExam.sid
> and StudentExam.EName='sql'
> )
> rollback tran
> So what I am basically saying, is that the differences you are seeing
> are not significant.
> Gert-Jan
>
> Amin Sobati wrote:
> >
> > David,
> > Thank you for the great example.
> > I admit that I must do everything right to get the best result, but I
cannot
> > recognize that why a query might have more IO but lower cost(in my
example).
> > Can I count on the cost(relative to the batch) as a criteria for
choosing a
> > query?
> >
> > Thanks,
> > Amin
> >
> > "David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in
> > message news:#k8IHbDOEHA.1160@.TK2MSFTNGP09.phx.gbl...
> > >
> > > "Amin Sobati" <amins@.morva.net> wrote in message
> > > news:Ogbf4BDOEHA.640@.TK2MSFTNGP12.phx.gbl...
> > > > Hugo,
> > > > Thank you for the tips!
> > > > Actually this is my case:
> > > > I have created "Students" table and "StudentExam" to track the exams
> > that
> > > > each student has taken. I know that I must have an "Exams" table to
keep
> > > > exam IDs, but I ignore that for this stage.
> > >
> > > You can't ignore that and expect to get any meaningful performance
data.
> > >
> > > Here's a more complete example with primary keys, referential
integrity a
> > > whole lot of students, 90% of whom have taken the SQL exam.
> > >
> > > Both queries use exactly the same plan, and have exactly the same
cost.
> > >
> > > David
> > >
> > >
> > >
> > > create table Students (
> > > SID int not null identity primary key,
> > > SName varchar(10))
> > >
> > > create table Exams(EName varchar(10) not null primary key)
> > >
> > >
> > > create table StudentExam (
> > > SID int not null references Students on delete cascade,
> > > EName varchar(10) not null references Exams on delete cascade,
> > > constraint pk_StudentExam
> > > primary key (SID,EName)
> > > )
> > > create index IX2 on StudentExam(EName)
> > >
> > > -- inserting some records for each table
> > > insert Students values('David')
> > > insert Students values('Peter')
> > > insert Students values('Sarah')
> > >
> > > -- let the students start breeding
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > > insert Students(SName) select SName from Students
> > >
> > >
> > > insert Exams(EName) values('sql')
> > > insert Exams(EName) values('asp')
> > > insert Exams(EName) values('vb')
> > > insert Exams(EName) values('xml')
> > > insert Exams(EName) values('access')
> > >
> > > --add exams for each SID,EName pair
> > > insert StudentExam(SID,EName)
> > > select SID, EName
> > > from Students cross join Exams
> > >
> > > --erase the SQL exam for 10% of the students
> > > delete studentExam
> > > where EName = 'sql'
> > > and sid%10=0
> > >
> > > -- Now suppose that I need to delete the students who have NOT taken
> > > 'sql'exam.
> > > --These are two delete statements that i want to choose the perfect
one:
> > > set statistics IO ON
> > >
> > >
> > > print ' QUERY 1'
> > > begin tran
> > > delete Students where Students.sid not in (select
> > > StudentExam.sid from StudentExam where StudentExam.EName='sql')
> > > rollback tran
> > >
> > > print ' QUERY 2'
> > > begin tran
> > > delete Students where 'sql' not in (select
> > > StudentExam.EName from StudentExam where Students.sid=StudentExam.sid)
> > > rollback tran
> > >
> > > /*
> > > QUERY 1
> > >
> > > (1228 row(s) affected)
> > >
> > > Table 'StudentExam'. Scan count 4914, logical reads 34734, physical
reads
> > 0,
> > > read-ahead reads 0.
> > > Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads
0,
> > > read-ahead reads 0.
> > > Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> > > read-ahead reads 0.
> > > Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> > > read-ahead reads 0.
> > > QUERY 2
> > >
> > > (1228 row(s) affected)
> > >
> > > Table 'StudentExam'. Scan count 4914, logical reads 34734, physical
reads
> > 0,
> > > read-ahead reads 0.
> > > Table 'Worktable'. Scan count 6140, logical reads 8597, physical reads
0,
> > > read-ahead reads 0.
> > > Table 'Worktable'. Scan count 2, logical reads 1235, physical reads 0,
> > > read-ahead reads 0.
> > > Table 'Students'. Scan count 1, logical reads 4816, physical reads 0,
> > > read-ahead reads 0.
> > > */
> > >
> > >
> --
> (Please reply only to the newsgroup)|||Well, the query plan shows estimates of the execution times. The
estimated subtree cost of the left most element is the estimated time in
seconds for the query. So whichever query plan has the lowest estimated
cost is likely to be the fastest. However, things like locking strategy
can also play a big part on a stressed server. This is not really
expressed or taking into consideration in the query plan.
The I/O count translates into the estimated I/O cost, which is (just
like the estimated CPU cost) just one component in the total estimated
cost.
If you can choose between two query plans with the same estimated cost,
I would choose the plan that uses the least amount of resources.
Gert-Jan
Amin Sobati wrote:
> Thanks Gert-Jan!
> Regardless of this particular scenario, I'd like to find the best way for
> comparing the performance of two or more queries.
> What I've been doing so far, I executed for example two queries and compared
> their cost(relative to the batch). Can this cost be the factor that I must
> count on?
> What made me to start posting this question was that I encountered with a
> query that had more IO cost than the other but less Cost. I need to know
> that I must choose the first query(with less IO, more cost) or the second
> query(with more IO, less cost).
> Thanks,
> Amin
Subscribe to:
Posts (Atom)