Tuesday, December 02, 2008

SQL 2008 Row and Page compression – or SQL 2005 post SP2 vardecimal conversion

One of the great new features in SQL 2008 is Row and/or Page Compression. Plus, still good news for those of you on SQL 2005 in production who might be there for a long while: there’s a decent feature you can take advantage of too that is very expensive to do in SQL 2008. Why? Well, Compression in SQL 2008 requires the Developer or Entreprise edition to be installed, so if you want to benefit from this mega space saver (like 1.3TB down to 650GB as I have seen, plus queries running at as little as 40% the time they took before) you will be happy with VARDECIMAL conversion in SQL 2005 (SP2 required though). Both have stored procedures that you can run to estimate how much space you will save: In 2008 it's sp_estimate_data_compression_savings -- which randomly takes rows and gives you quite an accurate estimation of space saved/gained.
Set your statstics and io on just for details of the page reads while doing the comparison before and after. See Brad's [McGehee] Compression examples here.


In SQL 2005, from sp2 onwards, you can do this:
exec sys.sp_db_vardecimal_storage_format ''DatabaseName'', ''ON''
exec sp_tableoption ''dbo.BigAssTableLoadedWithDecimals'', ''vardecimal storage format'', 1
DBCC SHRINKDATABASE(DatabaseName,0)
-- 0 can be replaced with the amount of space you want to leave free for expansion, normally if you have huge decimal types in tables with millions or rows, you'll see a huge difference in size of the db after the shrink.
References: http://msdn.microsoft.com/en-us/library/bb508963.aspx

Please take care while you are doing your big table conversions and make sure that you have enough space for the whole table to be added onto the MDF file and relatively heavy load on the LDF [log] data file. What you can do is switch to bulk mode (set recovery level) while you are doing the compression, then re-enable Full/Bulk or whatever recovery mode you were using before.
Basically what you are doing is clamping down on the wasted space per line with respect to Row level compression in SQL 2008 (characters even get dictionary compression, and also prefix compression), which is what SQL 2005 SP2 calls vardecimal (limited to that column type only however). Obviously, the 2008 higher-end editions have upped the cost for you to have this functionality, but it goes with the territory – a large organisation has major space to gain usually, and don’t forget that it’s not only the space in the database you are gaining, it’s all the cumulative space in backups over tie you'll save, plus and all the increased time (performance) for execution of queries - one could argue simply that by taking the huge number of times a query costs in CPU time, multiplied by its frequency of execution to get the CPU hours saved. Hey boss, were's my bonus, I just saved the company n,000 hours of CPU execution tme:) Remember that clients will get their data twice in less than half the time!

You could do this in the temp database also, rebuild using the temp while adding the compression to the specific object during creation, but make sure your temp is as big as the object you are rebuilding – such as the index+table data size, with a good fifty percent on top of that to be sure you’re not going to run out for other applications.
The Data Copmression Wizard in SQL Server Management studio demystifies much of this, if you are already lost. Mind you, it’s not very efficient if you have to do this on a whole bunch of objects, in that case you would be better off using T-SQL. Sort your tables by the largest first (data pages should be in the thousands), and evaluate the space savings. If you are in the millions of data pages, use this compression – as mentioned before even in SQL 2005 post Service Pack 2 build you can take advantage of row compression by means of VARDECIMAL (but not page compression sadly).
Another condition you should be aware of, If you are input/output bound on your system, meaning that you are waiting on your disks, then you can benefit from compression – Brad McGehee has stated that he prefers Row compression for OLTP environments (today at SQLTeach.com) – and if the e-DBA guru's mentioning it, then it’s really worth looking into.
If you are using tables with multiple non-clustered indexes, only compress those indexes that are used occasionally. Heavily used indexes are to be avoided therefore – so, as always, to be sure TEST, TEST, TEST....on several servers, your test environment, your disaster recovery environment, your dev., and even your laptop (still enjoying disk performance for databases on Dell’s XPS Samsung SSD – holding back on a 32GB SSD express card for logs/swap/temp in the New Year). Do it on your development environment just to save space on large tables –b/c Dev environments are usually starved for data, and then just watch what happens over the next while....check and see if queries are taking forever to run, and let a few of the developers know – maybe they’ll see something you missed too. SQL DBAs - get out there and compress!

No comments:

Post a Comment