Saturday, June 27, 2009

Hunting Down Useless Indexes

Hopefully, by the time we have our annual family vacation in May, the Cherry Blossoms will be out – until then I’ll just dig into some very, very heavily updated Indexes that are causing Quest’s Spotlight, at my latest contract with Transcontinental, to flash warnings for excessive disk activity. Of course, if we were in Utopia, there would be loads of extra disks to isolate the indexes to, however Sir Thomas More’s head was lopped off by Henry VIII and that Utopian dream died rather quickly. Now back to reality…



Prerequisite References - Index Related Dynamic Management Views (SQL 2005/8): sys.dm_db_missing_index_details , sys.dm_db_index_usage_stats



Nobody wants to have indexes that are updated for no reason, so when you see elevated numbers for user_updates, let’s say in the millions, which will need obvious action, while at the same time user_lookups (if clustered idx) and user_scans/seeks are at zero AND the table in question is huge, then you might want to drop those indexes on the table with the goal of gaining system resources. Bonus, you might even have happier users receiving their data quicker too!




Example: I have a table with its respective Clustered Index, and there are five non-clustered indexes based on top of that Clustered Index – since they are dependent. If two of the non-clustered indexes (mostly likely due to the use of a code generator) have no user_scans, nor user_seeks (or only a negligible amount) the other two non-clustered indexes have fair amount of user_scans, so I have decided to Include them in the clustered index if they have a significant number of user_seeks or drop them completely. Testing is the best way, so using profiler I shall see what the results are once applying realistic workloads – one thing for sure, is that there will not be so much updating going on for no reason. There are queries that allow you to determine the useless indexes very quickly, such as MVP Paul Nielsen's Duplicate Indexes post, and the useless index list can be huge, especially if the application generated its objects from an ORM, so we want to focus on the largest tables first as a guide through the over-indexation mess.



Thankfully, with the help of the great details within the DMVs, we have no excuse not to analyse problematic indexes.



Further reading from some great sources:

http://www.sqlservercentral.com/articles/Indexing/64134/ Missing Indexes SQL 2005/8


http://msdn.microsoft.com/en-us/magazine/cc135978.aspx Ian Stirk’s Optimising Application Performance by Index analysis



http://www.codersrevolution.com/index.cfm/2009/2/14/SQL-Server-Performance-How-Are-My-Indexes-Being-Using

Digging into Transaction Log Files

(1linerForward) I originally wrote this in French while working at LaCaisse.com - http://dbhive.blogspot.com/2008/07/fichiers-journaux-de-transactions.html   

Why do you care, as a DBA, or a Developer for that matter, about what is happening in the Transaction Log? Lets rollback a little and see... The raison d’ĂȘtre of a transaction log file is to write all the necessary information we need to recuperate any and all activity happening against the database.  Every SQL Server database must have at least one log file. Here, have a brief look what it looks like inside the log file itself:

SELECT * FROM  ::fn_dblog(DEFAULT, DEFAULT) AS-- more examples, and details, below

 During the Spring of 2008, for an auditing project, our requirement was be able to read the active log files, as well as those that were archived.  The archived logs gave us the option, with the help of third party tools (or thanks to in part the query above), to effectively interrogate the log file as if it were one mega table. The way that Lumigent Log Explorer's documentation describes it: 'to assist you in solving or recovering from problems that may occur in a typical database system' - gets the heart of what could be a potential point-in-time restore (please use Full recovery model if this is your requirement).   


My view is that you may end up with a mystery transaction at one point that no logic can explain, and thus your database integrity is in question - not a place where any DBA wants to be naturally. Empowering yourself to dig into the log file and resolve these types of mysteries is the main point of this post, because you will be able to find out what the values were before/after a change to the database, and who/what application has committed the change, whereas before one typically disregards anomalies (at least I was most of the time before).  This gives us motivation to ensure that log files are archived, since we're following Erasmus' proverb to 'leave no stone unturned,' with respect to resolving such mysteries.

Given that you can query the log file (you will see transactions as BEGIN_XACT , COMMIT_XACT, please see full reference list below), it is therefore possible to raise alerts for undesirable activity, e.g. someone executing data definition language in production. Combined with Database Mail and SQL Server Agent this can be automated too, or in the case of Lumigent Log Explorer, one can configure an alert for each DDL/DML command, which is perhaps useful - to filter out problems in development.  The approach of monitoring objects during manipulation or creation will allow you to take control of an environment progressively and proactively. Here is how to query the log file or some typical unwanted incidents:


USE [master]
GO
ALTER DATABASE [DBname] SET RECOVERY SIMPLE WITH NO_WAIT
GO  -- if you leave something in Simple, the rows after checkpoint 
-- will be recycled, therefore I suggest FULL or at least Bulk_logged
USE [master]
GO
ALTER DATABASE [DBname] SET RECOVERY FULL WITH NO_WAIT
GO
-- if you need to clean up the space quickly for testing
USE [dbname]
GO
DBCC SHRINKFILE (N'DBname, 0, TRUNCATEONLY)
GO
-- truncate a table or perform undesireable activity, etc.
SELECT Operation, Context, [Transaction ID], [Begin Time], [End Time], AllocUnitName, [Description],

[UID], [Server UID], SPID, [Transaction Name], [Number of Locks], [Lock Information]

, * -- shows the rest of the columns, I put the most interesting first

FROM ::fn_dblog(DEFAULT, DEFAULT) AS l

where operation='mark_ddl' -- this will show rows where there is data definition language

-- operation='LOP_MODIFY_ROW' or operation='INSERT_ROWS' or operation='DELETE_ROWS'

-- operation='LOP_BEGIN_XACT' -- means beginning of a transaction

-- operation='LOP_COMMIT_XACT' -- means the end of a transaction

order by [Current LSN] asc


-- for the above Mark_DDL you can create a job step that checks your critical 
-- databases for undesireable activity and if there is an existence of a DDL change (use IF EXISTS with the above)

declare @myfromname varchar(150)

declare @alladdresses varchar(max)

declare @myrecipients varchar(150)

declare @mycurrentaddress varchar(max)

declare @SubjectLocal varchar(200)

declare @databasename varchar(100)

set @databasename=(select top 1 name from sysfiles)

BEGIN-- Name of current sender

SET @myfromname = N'Message regarding Log file activity on ' + @@servername

-- Get e-mail adresses of operators

BEGIN

SET @alladdresses = N''

DECLARE MAILResults_CURSOR CURSOR FORWARD_ONLY READ_ONLY FOR 

SELECT email_address FROM msdb.dbo.sysoperators 

where email_address IS NOT NULL

OPEN MAILResults_CURSOR

FETCH NEXT FROM MAILResults_CURSOR INTO @myrecipients

WHILE @@FETCH_STATUS = 0

BEGIN

SET @mycurrentaddress = @myrecipients + CHAR(59)

SET @alladdresses = @alladdresses + @mycurrentaddress

FETCH NEXT FROM MAILResults_CURSOR INTO @myrecipients

END

CLOSE MAILResults_CURSOR

DEALLOCATE MAILResults_CURSOR

IF @alladdresses <> N'' 

BEGIN

SET @SubjectLocal = 'Log file undesireable activity in the ' + @databasename + ' DB on ' + @@servername

EXEC msdb.dbo.sp_send_dbmail

@profile_name = NULL

,@recipients = @alladdresses

,@copy_recipients = NULL

,@blind_copy_recipients = NULL

,@subject = @SubjectLocal

,@body = 'Please verify the log to find out what happened in '+ @databasename +' using select * FROM ::fn_dblog(DEFAULT, DEFAULT) AS l'

,@body_format = 'TEXT'

,@importance = 'High'

,@sensitivity = 'Normal'

END

END 

END 

**Posting ADDY: for a great similar story on this (added in June 2009, after my talk in Vancouver where I performed a similar demo also) Paul S. Randal takes a similar step here

Internally to the SQL Server Log File you'll have an operation code that is captured by the log record. Here are the most common ones taken from a combination of Microsoft and Lumigent Log Explorer's help files.


§    ABORT_XACT Indicates that a transaction was aborted and rolled back.


§    BEGIN_CKPT A checkpoint has begun.

§    BEGIN_XACT Indicates the start of a transaction.

§    BUF_WRITE Writing to Buffer.

§    COMMIT_XACT Indicates that a transaction has committed.

§    CREATE_INDEX Creating an index.

§    DELETE_ROWS Rows were deleted from a table.

§    DELETE_SPLIT A page split has occurred. Rows have moved physically.

§    DELTA_SYSIND  SYSINDEXES table has been modified.

§    DROP_INDEX Dropping an index.

§    END_CKPT Checkpoint has finished.

§    EXPUNGE_ROWS row physically expunged from a page, now free for new rows.

§    FILE_HDR_MODIF  SQL Server has grown a database file.

§    FORGET_XACT Shows that a 2-phase commit transaction was rolled back.

§    FORMAT_PAGE  Write a header of a newly allocated database page.

§    INSERT_ROWS  Insert a row into a user or system table.

§    MARK_DDL Data Definition Language change - table schema was modified.

§    MARK_SAVEPOINT designate that an application has issued a 'SAVE TRANSACTION' command.

§    MODIFY_COLUMNS  Designates that a row was modified as the result of an Update command.

§    MODIFY_HEADER  A new data page created and has initialized the header of that page.

§    MODIFY_ROW  Row modification as a result of an Update command.

§    PREP_XACT Transaction is in a 2-phase commit protocol.

§    SET_BITS Designates that the DBMS modified space allocation bits as the result of allocating a new extent.

§    SET_FREE_SPACE  Designates that a previously allocated extent has been returned to the free pool.

§    SET_MASK_BITS

§    SORT_BEGIN A sort begins with index creation. - SORT_END end of the sorting while creating an index.

§    SORT_EXTENT Sorting extents as part of building an index.

§    UNDO_DELETE_SPLIT The page split process has been dumped.

§    XACT_CKPT during the Checkpoint, open transactions were detected.


For more info: http://www.sqlservercentral.com/articles/Design+and+Theory/63350/ -- SSC's intro to Log Files http://msdn.microsoft.com/en-us/library/ms189085.aspx - Log behavior under different recovery models http://www.youtube.com/watch?v=ZM44LUuA6hc - video how transaction logs work overview (simplified) http://www.youtube.com/watch?v=nyz0AYCwhtM&feature=related - transactions



http://www.sans.org/reading_room/whitepapers/application/forensic_analysis_of_a_sql_server_2005_database_server_1906?show=1906.php&cat=application page 22


P.S. The above is to give a thorough understanding of Log Files, and is neither developed as a replacement to SQL Server Management Studio's Schema Changes History, nor Paul Nielsen's AutoAudit