A debate is raging in the office over which DB query is the most expensive. Since it's the weekend, time to fire up Fedora in VMWare and tinker with my Sybase installation.
First, you have to set up the DB to take advantage of the statistics. Do this with:
[sybase@vmwareFedoraII ~]$ isql -Usa -SVMWAREFEDORAII
Password:
1> exec sp_configure "allow resource limits", 1
2> go
You need to reboot, so:
1> shutdown
2> go
Server SHUTDOWN by request.
and restart:
[root@vmwareFedoraII henryp]# /etc/rc.d/init.d/sybase start
Login:
[sybase@vmwareFedoraII ~]$ isql -Usa -SVMWAREFEDORAII
Password:
1> use henryp
2> go
1> set statistics io on
2> go
Total actual I/O cost for this command: 0.
Total writes for this command: 0
1> set showplan on
2> select * from tranche t , ioi i where i.fk_tranche = t.id
.
.
Table: tranche (t) scan count 1, logical reads: (regular=1 apf=0 total=1),
physical reads: (regular=1 apf=0 total=1), apf IOs used=0
Table: ioi (i) scan count 2, logical reads: (regular=18 apf=0 total=18),
physical reads: (regular=8 apf=1 total=9), apf IOs used=1
Total actual I/O cost for this command: 288.
Total writes for this command: 0
Execute the same select again and you get:
Table: tranche (t) scan count 1, logical reads: (regular=1 apf=0 total=1),
physical reads: (regular=0 apf=0 total=0), apf IOs used=0
Table: ioi (i) scan count 2, logical reads: (regular=18 apf=0 total=18),
physical reads: (regular=0 apf=0 total=0), apf IOs used=0
Total actual I/O cost for this command: 38.
Total writes for this command: 0
Notice how the physical reads are reduced second time around and indeed the actual I/O cost is an order of magnitude less.
Presumably, this is because the DB is caching this data. Rebooting clears this cache:
[root@vmwareFedoraII henryp]# /etc/rc.d/init.d/sybase restart
Stopping sybase service: [ OK ]
Starting sybase service: [ OK ]
And running the query again with io statistics on gives the same actual I/O as when we first ran the query.
Let's see the showplan again without actually executing it:
1> set noexec on
2> select * from tranche t , ioi i where i.fk_tranche = t.id
3> go
.
.
Total estimated I/O cost for statement 2 (at line 2): 731.
The I/O costs are a unit less number (see Administrator's Guide to Sybase ASE 15).
Why the difference?
"The estimated cost is higher than the actual cost if some pages are already in the cache or if the statistics are incorrect. The estimated cost may be lower than the actual cost if the optimizer chooses 16K I/O, and some of the pages are in 2K cache pools, which requires many 2K I/Os. Also, if a big join forces the cache to flush its pages back to disk, repeated access may require repeated physical I/Os."
(http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.sag2/html/sag2/sag229.htm)
So, by a circuitous route, why were some people in the team seeing different estimated I/O costs for the same query? Actual costs can change for a variety of reasons - but estimated? Well, I don't know for sure but I was also using an IDE to run some of these queries and just happen to limit the number of rows since I didn't want to print out thousands of lines. Showplan knows this and changes the estimated accordingly. Gotcha.
Of course, if the data changes then the estimate will change too.
Saturday, November 13, 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment