Now that it has been a
good five years I have been blogging away here and on SSC, the editors recently thanked
us for our work. They also provided valuable feedback that we should give real-world situations that DBAs encounter. The following has a target of optimising performance, from an actual task that has re-occurred several times since I first wrote on the subject, in various production environments, on an instance that is bogged down by that one
massive query within a stored procedure that has to run all the time, yet is so
huge, important and/or complex everyone is
afraid or unsure how to resolve.
In this post I hope to clearly explain how the combination
of the use of data definition language for your temporary tables and
non-clustered indexes can improve the performance of stored procedures that join data from one or many
large tables by up to seventeen times (at least that was the case
previous time I saw this type of query to optimise) - as I have seen on
stored proc.s that work with tables in the tens of millions.
Temporary tables, if
used frequently or in stored procedures, in turn, end up with significant
input/output disk consumption. To start, one thing we should be aware of is that they are also
created as a heap by default. As experience has shown, if you are cutting
up a very large table and using the temporary database, it is best to first do
your DDL (data definition language) before running through the rest of your
operation with the temporary data - as opposed Select * INTO #temp. Thus, we should avoid Select * into
#temp as much as possible, unless the number of rows is insignificant,
because being in a single statement, it will create great disk
contention within the temp database:
(N.B. the assumed
pre-requisite is that you've identified the worst query from your plan cache or
have seen the code from Recent Expensive queries listed in Activity Monitor, sorted by worst performing resource)
CREATE TABLE
#MyLot -- you’ll see that we only need a few columns join in the
end
(
[LotId] [int] IDENTITY(1,1) NOT NULL,
[LotCode] [nvarchar](10) NOT NULL
)
INSERT into #MyLot ( LotId, LotCode )
-- e.g. you can also avoid NULLs by using ISNULL(col,0)
Select LotId, LotCode
from MyBigLotTable
Where clause / matching variables
-- this is where you found out what joins this massive
table with the others and slice it up
-- horizontally and vertically before (!) making that big join,
-- and
that is where we obtain the significant performance gains
Create NonClustered Index #idx_Lot on #MyLot ([LotCode] ASC )
-- create index on matching column used in the 'big' join (this case it was a 5 table join)
-- the glaring ID field
-- integrate all this preparation of
#MyLot into the main super slow query
INSERT INTO @result
([Number],[LocId],[BinId],[LotCode],[LotId],[PCode],[PId],[Stock],[StatusCode],[UnitCode])
SELECT [BIResult].[Number], [Loc].[LocId], [BLoc].[BILocId],[BIResult].[LotCode], #MyLot.[LotId],[BIResult].[PCode],[P].[PId],[BIResult].[Stock],ISNULL([BIResult].[StatusCode],[BIResult].[UnitCode]
FROM OPENXML (@handle"WITH"
(
[Number]
SMALLINT N'@Number'
[LocID] NVARCHAR(10) N'@LocID'
[PCode]
NVARCHAR(18) N'@PCode'
[LocCode]
NVARCHAR(4) N'@LotCode'
[PCode]
NVARCHAR(10) N'@LotId'
[Stock] NVARCHAR(MAX) N'@Stock'
[StatusCode]
NVARCHAR(3) N'@StatusCode'
[UnitCode]
NVARCHAR(1) N'@UnitCode'
) AS [BIResult]
JOIN [Pt] ON [Pt].[Number] = [BIResult].[Number]
LEFT JOIN #MyLot --[Lot] was here before, the huge table
ON #MyLot.[LotCode] = [BIResult].[LotCode]
JOIN [P] ON [P].[PtId] = [Pt].[PtId]
AND [P].[PCode] = [BIResult].[PCode]
JOIN [SLoc] ON [SLoc].[PtId] = [Pt].[PId]
AND [SLoc].[SLocCode] =
[BIResult].[SLocCode]
JOIN [BLoc] ON [BLoc].[LocId] = [Loc].[LocId]
AND [BLoc].[BLocCode] = [BIResult].[BLocCode]
WHERE CAST([BIResult].[Stock] AS DECIMAL(13)
)
-- always explicitly/clearly drop the
temp at the end of the stored proc.
drop table #MyLot -- and the
respective index is dropped too with it
Happy optimising!
|
Shed the heavy weight of that extra slow query bogging your instance down. |