Tuesday, May 13, 2014

How to Fix that Never-Ending Join Dragging Down the Whole DB Server Instance – Pre-Populate the Largest Bugger in Temp with Indexes


Now that it has been a good five years I have been blogging away here and on SSC, the editors recently thanked us for our work. They also provided valuable feedback that we should give real-world situations that DBAs encounter.  The following has a target of optimising performance, from an actual task that has re-occurred several times since I first wrote on the subject, in various production environments, on an instance that is bogged down by that one massive query within a stored procedure that has to run all the time, yet is so huge, important and/or complex everyone is afraid or unsure how to resolve.


In this post I hope to clearly explain how the combination of the use of data definition language for your temporary tables and non-clustered indexes can improve the performance of stored procedures that join data from one or many large tables by up to seventeen times (at least that was the case previous time I saw this type of query to optimise) - as I have seen on stored proc.s that work with tables in the tens of millions.


Temporary tables, if used frequently or in stored procedures, in turn, end up with significant input/output disk consumption. To start, one thing we should be aware of is that they are also created as a heap by default.  As experience has shown, if you are cutting up a very large table and using the temporary database, it is best to first do your DDL (data definition language) before running through the rest of your operation with the temporary data - as opposed Select * INTO #temp. Thus, we should avoid Select * into #temp as much as possible, unless the number of rows is insignificant, because being in a single statement, it will create great disk contention within the temp database:

(N.B. the assumed pre-requisite is that you've identified the worst query from your plan cache or have seen the code from Recent Expensive queries listed in Activity Monitor, sorted by worst performing resource)

CREATE TABLE #MyLot  -- you’ll see that we only need a few columns join in the end
       (
       [LotId] [int] IDENTITY(1,1) NOT NULL,
       [LotCode] [nvarchar](10) NOT NULL
       )

INSERT into #MyLot ( LotId, LotCode )
 -- e.g. you can also avoid NULLs by using ISNULL(col,0)
Select LotId, LotCode
from MyBigLotTable
Where clause / matching variables
 -- this is where you found out what joins this massive table with the others and slice it up
 -- horizontally and vertically
 b
efore (!) making that big join,
 -- and that is where we obtain the significant performance gains

Create NonClustered Index #idx_Lot on #MyLot ([LotCode] ASC )
-- create index on matching column used in the 'big' join (this case it was a 5 table join)
-- the glaring ID field
-- integrate all this preparation of #MyLot into the main super slow query
INSERT INTO @result
([Number],[LocId],[BinId],[LotCode],[LotId],[PCode],[PId],[Stock],[StatusCode],[UnitCode])

SELECT
[BIResult].[Number], [Loc].[LocId], [BLoc].[BILocId],[BIResult].[LotCode], #MyLot.[LotId],[BIResult].[PCode],[P].[PId],[BIResult].[Stock],ISNULL([BIResult].[StatusCode],[BIResult].[UnitCode]
FROM OPENXML (@handle"WITH"
                        (
                              [Number] SMALLINT N'@Number'
                              [LocID] NVARCHAR(10) N'@LocID'
                              [PCode] NVARCHAR(18) N'@PCode'
                              [LocCode] NVARCHAR(4) N'@LotCode'
                              [PCode] NVARCHAR(10) N'@LotId'
                              [Stock] NVARCHAR(MAX) N'@Stock'
                              [StatusCode] NVARCHAR(3) N'@StatusCode'
                              [UnitCode] NVARCHAR(1) N'@UnitCode'
                        ) AS [BIResult]
                JOIN [Pt] ON [Pt].[Number] = [BIResult].[Number]
                LEFT JOIN #MyLot --[Lot] was here before, the huge table
                            ON #MyLot.[LotCode] = [BIResult].[LotCode]
                JOIN [P] ON [P].[PtId] = [Pt].[PtId]
                 AND [P].[PCode] = [BIResult].[PCode]
                JOIN [SLoc] ON [SLoc].[PtId] = [Pt].[PId]
                 AND [SLoc].[SLocCode] = [BIResult].[SLocCode]
                JOIN [BLoc] ON [BLoc].[LocId] = [Loc].[LocId]
                 AND [BLoc].[BLocCode] = [BIResult].[BLocCode]
               WHERE CAST([BIResult].[Stock] AS DECIMAL(13)
              )

-- always explicitly/clearly drop the temp at the end of the stored proc.
drop table #MyLot -- and the respective index is dropped too with it


Happy optimising!

Shed the heavy weight of that extra slow query bogging your instance down.