Resources
SELECT req.session_id, req.status, req.start_time, req.cpu_time AS 'cpu_time_ms',
req.logical_reads,req.dop,s.login_name,s.host_name,s.program_name,
object_name(st.objectid,st.dbid) 'ObjectName',
REPLACE (REPLACE (SUBSTRING (st.text,(req.statement_start_offset/2) + 1,
((CASE req.statement_end_offset WHEN -1 THEN DATALENGTH(st.text)
ELSE req.statement_end_offset END - req.statement_start_offset)/2) + 1),
CHAR(10), ' '), CHAR(13), ' ') AS statement_text,
qp.query_plan,qsx.query_plan as query_plan_with_in_flight_statistics
FROM sys.dm_exec_requests as req
JOIN sys.dm_exec_sessions as s on req.session_id=s.session_id
CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) as st
OUTER APPLY sys.dm_exec_query_plan(req.plan_handle) as qp
OUTER APPLY sys.dm_exec_query_statistics_xml(req.session_id) as qsx
ORDER BY req.cpu_time desc;
(This was generated by Azure). The output looks something like:
Note the dop column. This indicates the degree of parallelization. One core seems a bit measly so I forced it to use more by adding this at the end of my SELECT statement:
OPTION (USE HINT ('ENABLE_PARALLEL_PLAN_PREFERENCE'));
[See the comment below Paul White's comment in this SO post].
Consequently, my query was 4x faster (at the possible expense of bothering other users in this multi-tenanted machine).
Indexes
When creating an index, you might see a trailing "WITH (...) ON [PRIMARY];" Here, "you can specify options for data storage, such as a partition scheme, a specific filegroup, or the default filegroup. If the ON option isn't specified, the index uses the settings partition or filegroup settings of the existing table." [Microsoft]
The option here is to have the index on a seperate disk to your data. With two disks you can speed things up as each disk will not block the other's access, although the efficacy of this is questioned [SO].
Padding and Fill Factor pertain to how the leaf data is stored and is only relevant if there are lots of random changes [SO].
Casting
Autoboxing can have a negative effect on Java apps but I didn't casting would have such a large effect in SQL. When we removed unnecessary CASTing from a query running on a 12 core SQL Server, performance time dropped from 15 minutes to 3.
It's true that the query was horribly inefficient. One of the CASTs was to turn a date to a VARCHAR where a regex was used to check the date was a September... Another was due to some data being imported incorrectly and its PK being a float when it should have been a VARCHAR. This lead to a CAST when it was JOINed to another table.
Using the SQL at the top of the post revealed the reason for this improvement was that SQL Server decided to parallelize the query so there was no need for ENABLE_PARALLEL_PLAN_PREFERENCE.