Tuesday, December 02, 2008
Thanks again to Jean-René Roy for organising a great week-long event - never had a chance to meet so many MVPs, such as Brad McGehee and Paul Nielsen, in the photo to the right with local DBAs Paolo de Rosa and Pollus Brodeur. Also had lunch/dinner/beers with Adam Machanic, Scott Stauffer, Roman Rehak, Itzik Ben-Gan all in one week! Actually, to be honest, it felt like a DBA therapy session at times :)
Set your statstics and io on just for details of the page reads while doing the comparison before and after. See Brad's [McGehee] Compression examples here.
In SQL 2005, from sp2 onwards, you can do this:
exec sys.sp_db_vardecimal_storage_format ''DatabaseName'', ''ON''
exec sp_tableoption ''dbo.BigAssTableLoadedWithDecimals'', ''vardecimal storage format'', 1
-- 0 can be replaced with the amount of space you want to leave free for expansion, normally if you have huge decimal types in tables with millions or rows, you'll see a huge difference in size of the db after the shrink.
Please take care while you are doing your big table conversions and make sure that you have enough space for the whole table to be added onto the MDF file and relatively heavy load on the LDF [log] data file. What you can do is switch to bulk mode (set recovery level) while you are doing the compression, then re-enable Full/Bulk or whatever recovery mode you were using before.
Basically what you are doing is clamping down on the wasted space per line with respect to Row level compression in SQL 2008 (characters even get dictionary compression, and also prefix compression), which is what SQL 2005 SP2 calls vardecimal (limited to that column type only however). Obviously, the 2008 higher-end editions have upped the cost for you to have this functionality, but it goes with the territory – a large organisation has major space to gain usually, and don’t forget that it’s not only the space in the database you are gaining, it’s all the cumulative space in backups over tie you'll save, plus and all the increased time (performance) for execution of queries - one could argue simply that by taking the huge number of times a query costs in CPU time, multiplied by its frequency of execution to get the CPU hours saved. Hey boss, were's my bonus, I just saved the company n,000 hours of CPU execution tme:) Remember that clients will get their data twice in less than half the time!
You could do this in the temp database also, rebuild using the temp while adding the compression to the specific object during creation, but make sure your temp is as big as the object you are rebuilding – such as the index+table data size, with a good fifty percent on top of that to be sure you’re not going to run out for other applications.
The Data Copmression Wizard in SQL Server Management studio demystifies much of this, if you are already lost. Mind you, it’s not very efficient if you have to do this on a whole bunch of objects, in that case you would be better off using T-SQL. Sort your tables by the largest first (data pages should be in the thousands), and evaluate the space savings. If you are in the millions of data pages, use this compression – as mentioned before even in SQL 2005 post Service Pack 2 build you can take advantage of row compression by means of VARDECIMAL (but not page compression sadly).
Another condition you should be aware of, If you are input/output bound on your system, meaning that you are waiting on your disks, then you can benefit from compression – Brad McGehee has stated that he prefers Row compression for OLTP environments (today at SQLTeach.com) – and if the e-DBA guru's mentioning it, then it’s really worth looking into.
If you are using tables with multiple non-clustered indexes, only compress those indexes that are used occasionally. Heavily used indexes are to be avoided therefore – so, as always, to be sure TEST, TEST, TEST....on several servers, your test environment, your disaster recovery environment, your dev., and even your laptop (still enjoying disk performance for databases on Dell’s XPS Samsung SSD – holding back on a 32GB SSD express card for logs/swap/temp in the New Year). Do it on your development environment just to save space on large tables –b/c Dev environments are usually starved for data, and then just watch what happens over the next while....check and see if queries are taking forever to run, and let a few of the developers know – maybe they’ll see something you missed too. SQL DBAs - get out there and compress!
Monday, November 03, 2008
Unfortunately, just having a University degree is not enough these days to be competitive in the job market, you have to get loads of experience as well as keep up post–graduate studies in what my father likes to describe as 'waves'. Reading up on new methods to maintain the fast pace of change, especially in the database world, is great, but putting yourself under pressure to pass an exam takes the integration of that knowledge to the next step. You'd be surprised how much one can accomplish. This year, with some fatigue of course, I have taken the equivalent of five certification exams (due to a retake of the MCDBA upgrade exam) and wow; there are so many things I've picked up by studying for them. It will take me months to go through and apply what I've learned while preparing for the tests. I'm the kind of DBA who keeps perhaps too many notes, as a consequence of this studying.
I'd like to diverge to draw a centuries old parallel here: One of the main points of a school founded in the middle of the nineteenth century, the Working Men's College (of London) - was to encourage education for life. In fact the college has been serving local people and employers for over 150 years. Therefore, for the enrichment of the community with respect to professional development, certification is important since these are individuals who are seeking to obtain skills and qualifications that enhance their career prospects.Many of those who study and obtain certification are consultants that pay out of their own companies' money for classes, since reinvesting in your resources shouldn't be ignored (QC govt. here actually encourages 1% global budget for education). Others who take courses receive direct support from a sponsoring employer who sees that one of the best ways to improve their firm’s productivity and customer satisfaction, is by having better skilled, qualified and motivated employees. Therefore, continuing education in this way, is a win-win for both sides.Many reasons for certification are the same as that of the Working Men's College since they are:
-Relevant to the skills needed by local employers
-Easy to access, so that learning can take place at the employer’s premises or at the college’s classroom.
-Flexible, so that learning can fit around the needs and schedules of business. An individual can simply take the exams or courses when things are quiet - often different from semester system schedule.
-Effective in producing, in the shortest possible time, skilled and qualified staff.
P.S. Keep up those practice tests (if you are on your way to certification), since once you're cool with the material, test prep. is your best way to feel relaxed on the big day of your exam.
Tuesday, October 28, 2008
If you have arrived in a new environment and have been asked to check the password strength of the existing database infrastructure, Idera's Password Checker (within the admin toolset) does the job quite well. Using several tools (up to three to be sure) to verify the security, such as Microsoft Baseline Security Analyser, will allow you to reassure the 'powers that be' of how secure your databases are.
Let's say you are configuring SQL Server to handle web server sessions, in a load balancing environment, you will probably have to create a sysadmin level account for the setup to work - in this case (since you cannot get around it as far as I know) an obscurely named account, with a more than 10 alphanumeric characters would be your safest bet.
Wednesday, September 10, 2008
September / Septembre 08
I diverge a little from the typical coding-based/oriented best practice to one that is focused on governance within public or government organisations with respect to the security of the data in databases used for annual reports.
Internal controls fall under the general COSO framework, and are more explicitly defined in the U.S. SOX Act. One of most important points of SOX compliance is the segregation of duties (SOD) - which basically means developers do not have access in production and that specific roles are followed when developing code before placing in production. Territorial as it may seem, it is necessary for Physical DBAs to control production databases, and without fail, follow change management practices such as is defined by the COBIT framework to avoid the risk of data being inadvertently, mistakenly, or maliciously changed. Methods, in which we govern ourselves in a controlled environment, with respect to the production systems, are by following procedure and documentation ( e.g. COBIT or ITIL standards ). Each task has to be validated by a third person (code walk-through/fresh eyes), who is not writing the actual code. The database developer themselves should not be able to execute anything in production directly without an independent review of the documentation/code for the work which being performed. Typically, the role of the developer is to pass on their code to a database administrator, which, knowing this economic climate, might not be at easy reach (at least have a peer review if a DBA is not involved). This division is to ensure that the role of the creator / executor is clearly segregated.
If your organisation is in denial or lacks understanding of internal controls, please remind management that databases which fall under internal controls (that is the data us used for public reporting, annual reports, etc.) are subject to the segregation of duties. Furthermore, to track all changes and have recoverability, a bare minimum for a SQL database recovery mode is Bulk Recovery - so if your respective ‘friendly’ DBA states he can recover all those transactions that developers have gone into production to do a quick 'fix' while under Simple recovery mode, then it's simply not possible. Ceci constitue un risque de ne pas pouvoir récupérer une transaction une fois exécutée - alors le problème est un risque de nature technologique au niveau de la sécurité. Make sure to backup all your transaction logs and keep them indefinitely, the auditors will be very happy that way too.
nother point of internal control is the adherence to the principle of providing the least amount of privileges, especially in production. To allow developers more access to get their work done, it is much safer to use impersonation for exceptions that require elevated privileges (see EXECUTE AS to do that temporarily), instead of being the typical lazy DBA and giving DBO to everyone:) Giving full access means you are not managing security, needless to say. Often developers may dismiss this as 'overhead' while on their path to coding glory, or to shorten the treadmill - I do not want to judge - but please be aware that DBAs are going to do all that is considered responsible since we are the data stewards of the organisation (not to mention comply with regulations and the law).
I shall be updating this posting very shortly with more references thanks to the help of my ex-colleague from Dell, independent Senior SQL Server DBA Pollus Brodeur, since he’s the one who originally taught me all about auditing/SOD/Internal controls a couple of years back.
http://www.mssqltips.com/tip.asp?tip=1300 SOX for SQL Server DBAs
http://www.microsoft.com/downloads/details.aspx?FamilyId=6E1021DD-65B9-41C2-8385-438028F5ACC2&displaylang=en Reaching Complaince (PCI/SOX)
Wednesday, August 13, 2008
Auto retry can save some pretty complicated bottlenecks: once I had to keep a dedicated sql agent job server in check with respect to the temp database use on SQL 2000. For the occasional job scheduling conflict that used the temp db too much at the same time, auto-retry came in very handy. To put this in perspective, before when had no disk space on a dedicated sql agent job server, and the temp db data file was always full. Since I implemented the use of retries, I was timing the auto-retry so that a temp db shrink will have a chance to run during the wait before retry, depending on which job it is running, of course. Often the retry is simply due to a deadlock (does not mean instantly use nolock as a solution btw), so a retry is very appropriate for this. This way, the temp db data files came under control, with a shrink using set deadlock_priority low. I already mentioned this to several clients, and they are profiting from its use for production systems still in SQL 2000.
From a management point of view, if you can program the job to restart itself automatically, why would you wait until someone gets around to it, especially on a server dedicated to SQL Server Jobs? Sometimes we have to watch out for a long wait for retry - maybe for some steps take forever, on a specific job (hence exceptionally long running steps should be split up), since it takes so long we can take the retry option off in this case. However, if we want us to disable auto retry for specific steps we must check to see what the errors were in the past, b/c if they are all deadlock e.g., then auto retry is a very good idea. If there are validation errors on a regular job whether it's on auto-retry or not, the job should be disabled until fixed - and not simply rerun with the hope it'll be a nicer error next time around :)
It's for sure that in the case of jobs those run every hour and have to be timely, that a retry gets to fix the job much faster than human, right? Human intervention is far more costly also. Anyway, most application owners should be monitoring the jobs with SQL Server and will see the retry usually happens (and hopefully read the error log). I'm sure many of the application owners in a job-intensive environments are having less job failures to deal with since retry option has been used, considering their work-stress load is lighter now - and the application owners can get onto more challenging work instead of restarting a job. This being the information [deluge] age, since we're overloaded as it is, we shouldn't be doing restart job work that can be automated. Examples: to deal with and bring down a problematic SQL 2000 database to size with DBCC - it was thanks to a automatic rerun on a dbcc shrinkfile emptyfile several times that we regained huge space inside a filegroup. In this way it is very important to use auto retry when we are trying to get control of our disk space. Shrink temp db is yet another example where the low_priority setting makes sure the job doesn't take priority over critical processes, and thus waits to rerun later - while still keeping that critical database operations going. All potentially long maintenance jobs on a dedicated server running jobs, should have a low priority set when possible – or better yet combine the job step`s use with TRY/CATCH (see previous blog post).
So, to recap, if you are worried about a job that fails frequently, analyse it to see if auto-retry is effective or not - b/c of course, there are exceptions.
There are certainly times when it can be used in production (closely monitored!) as long as the data in which you are querying is not at risk of change, because the database engine will spurt out dirty reads and will even give you duplicate rows or even skip rows. In a Datawarehouse, for example, you'll see a performance enhancement (thanks to Itzak Ben-Gan for the tip) since the data should be set to read only after a daily early-morning load, assuming one would load the data during non business hours. The distaster I wish you all to avoid is placing a view in production, deciding to use the query hint, and then find data movement errors (page splits are occuring due to inserts/updates while you are reading the table). The most important avoidance for NOLOCK is it's potential to flip cluster nodes (error 7105 excessive cpu usage) when used on tables that have NTEXT, TEXT or IMAGE datatypes - at least from what I saw first hand on a SQL 2000 cluster.
Another caveat with respect to nolock use: The Database Engine might have to acquire locks when reading metadata, even when processing a select with a locking hint that prevents requests for shared locks when reading data. For example, a SELECT using the NOLOCK hint does not acquire share locks when reading data, but might sometime request locks when reading a system catalog view. This means it is possible for a SELECT statement using NOLOCK to be blocked! Thus, if you are feeling iffy about a process or specific stored procedure, run the following:
-- Get information about the locks held by
-- the transaction.
WHERE request_spid = @@SPID
It's okay to use NOLOCK when you know the table is being pounded and you're just wanting to check some data without affecting production. From Microsoft literature they say basically that because the SQL Server query optimizer typically selects the best execution plan for a query, they recommend that NOLOCK
Beware the trends in some companies where developers have decided that it's good to use it all the time, which is, for the reasons mentioned above, a very bad idea in production. Whereas in a datawarehouse, it can be set for the entire area, since the data is created to stay static until the next day or month's load anyway.
Ce mois ci, je vous donne deux brefs exemples pour vous inspirer lors de la création de vos objets/requêtes. This month, here are two simple examples to inspire you during the creation of your objects/queries.
Le premier exemple concerne l’utilisation de UNION au lieu de CASE (se retrouve très souvent). The first example is how you can use a UNION select instead of CASE (which I see over-used in many environments).
transformez vos CASE... /tranform the CASE...
WHEN 'RDC' THEN 0
WHEN '1er' THEN 1
WHEN '2e' THEN 2
END AS ETAGE, CHB_COUCHAGE
ORDER BY ETAGE, CHB_COUCHAGE
...en UNION / ...into UNION
SELECT CHB_NUMERO, 0 AS ETAGE, CHB_COUCHAGE
WHERE CHB_ETAGE = 'RDC'
UNION SELECT CHB_NUMERO, 1 AS ETAGE, CHB_COUCHAGE
WHERE CHB_ETAGE = '1er'
UNION SELECT CHB_NUMERO, 2 AS ETAGE, CHB_COUCHAGE
WHERE CHB_ETAGE = '2e'
ORDER BY ETAGE, CHB_COUCHAGE
La performance sera nettement améliorée si la colonne CHB_ETAGE a un index.
Of course, you want to make sure there is an index on CHB_ETAGE field to obtain the best performance (unless your table is insignificant in size).
Le deuxième exemple concerne le remplissage d’une table temporaire :
The second example involves the desired use of temporary tables (avoiding into #temp as much as possible) :
Create table #tempZ
,col2 -- éviter les NULLs avec ISNULL(col,0)
EXECUTE NomBD.dbo.cp_MonPrpc parm1, param2
Afin d’éviter des procédures stockées qui dépassent 500 lignes. On met le Select en procédure stockée.
To avoid very long stored procedures I recommend placing the select statement that matches the insert by using execution of a stored procedure to populate temporary table.
Tuesday, July 29, 2008
La raison d’être du fichier journal (transaction log file) est d’écrire toutes les informations nécessaires pour récupérer les transactions de la base de données. Chaque b.d. doit posséder au moins un fichier journal. Récemment, pour un projet, nous avons eu le besoin de lire ces fichiers journaux et les archiver.
L’archivage des fichiers journaux donne l’option, à l’aide d’un logiciel comme Lumigent Log Explorer, de rechercher à l’intérieur de n`importe quelle fichier transaction log afin de récupérer des données (point-in-time restore). Ceci nous donne une idée exactement les valeurs avant/après un changement, qui/quel application a fait ces changements et permet aussi de récupérer des valeurs ou modifications spécifiques.
Why do you care, as a developer, about what is happening in the transaction log? Un atout des informations dans le fichier journal, qui est de plus en plus exploité par des logiciels tiers personne, est le coté audit – donc il est possible à déboguer ce qui s’est passé sur telle journée au passé (often just disregarded as a mystery) parce que nous interrogeons le fichier journal directement, de même si cela est archivé. Ceci nous est la même principale qu’Érasme nous a expliqué lors de sa collection de Proverbes. ‘Leave no stone unturned’ – litterally we can read everything that has happened. Mais, attention : ne courrez pas toutes à la fois demander votre friendly DBA de lire les fichiers journaux, parce qu’il faut que la b.d. soit dans le bon mode de récupération (recovery model) déjà. Souvent les b.d.s sont en mode de récupération élémentaire (simple recovery mode). Il est même possible, grâce a Log Explorer de mettre toute les transactions par groupe dans une table afin de faire une requête.
Le fichier Transaction Log nous aide aussi à voir quand et quoi change – en plus l’outil Lumigent Log Explorer nous permet d’envoyer un courriel à chaque fois vous faites une command DDL, alors au fur et à mesure nous, en tant que DBA, regarde ces courriels afin de filtrer ce que n’est pas conforme aux normes de la programmation, au bien mieux, nous allons envoyer un exemple pertinent qui vas vous aider durant le développement. Cette approche aidera d’améliorer votre environnent progressivement, un objet à la fois.
Pour en savoir plus / more info :
http://msdn.microsoft.com/en-us/library/ms189085.aspx - Log behavior under different recovery models
http://www.youtube.com/watch?v=ZM44LUuA6hc - video how transaction logs work overview (simplifié)
http://www.youtube.com/watch?v=nyz0AYCwhtM&feature=related - transactions
Merci et n’hésitez pas de me revenir avec vos questions – there are no dumb questions :)
Thursday, July 24, 2008
Meilleures pratiques SQL Serveur - try/catch - the best way to handle exceptions in SQL 2005/8
Bonjour à tous / Hi SQL error handlers,
L’objectif de ce posting est de vous expliquer comment gérer les erreurs en langage Transact-SQL. Ceci est similaire à la gestion des exceptions dans l’environnement Visual studio.
If you are used to using @@error raiserror in previous versions of SQL Server, then the best thing for you to use is TRY/CATCH. Not to be used everywhere, just when you have to run multiple inserts/updates that are critical and that you anticipate errors for that code. For just a single insert/update within a proc, then just use begin / end and don't go crazy with it everywhere.
Ainsi, nous recommandons d’imbriquer dans le bloc Begin/End Try du code T-SQL, le bloc Begin/End Catch :
Bloc du code T-SQL
--En cas d’erreur dans le bloc Begin / End Try,
--le système entre dans le bloc Begin/End Catch (as soon as there is an error):
Begin Catch --
Bloc du code T-SQL -- à l’intérieur de cette partie du code, sysfuntions
-- ici Error_Number/Severity/State/Procedure/Line/Message() sont utiles
La méthode de contrôle typique est le Begin Transaction/Commit, mais aussi nous vous recommandons la méthode par bloc Try/Catch (que vous voyez déjà en V.S.). An exception will not automatically cause the transaction to rollback - this is a common myth...things will keep running afterwards and a commit will happen with the part of code you really needed to run before. Use XACT_ABORT whenever you are not using TRY/CATCH and are dealing with transactions (thanks to Adam Machnic MVP for clarifying this). On utilise seulement un des deux: set XACT_ABORT ON ou Try/Catch
N.B. Les erreurs de compilation ou de syntaxe ne sont pas comprises/enregistrées lors du try/catch. Il faut donc s’assurer de corriger les erreurs de ce type avant d’imbriquer dans les blocs Try/Catch - it's used to handle transactions that are prone to exceptions. If you have logical errors, then TRY/CATCH is not your saviour.
Références / references :
http://www.sql-server-performance.com/articles/per/deadlock_sql_2005_p1.aspx TRY/CATCH Helps to Resolve Deadlocks
Meilleures pratiques SQL Server 2005/8 query hints - utilisation With (Nolock)
Afin de continuer dans la lignée de mes communications précédentes, vous trouverez ci-dessous de l’information au sujet de WITH (NOLOCK) :
Nous avons tous besoin d’exécuter nos requêtes plus rapidement – certainement pour les départements qui pense Rapid Development avant tout. Voila la raison pour laquelle ma recommandation de ce mois est d’utiliser le query hint WITH (NOLOCK).
Mes collègues et moi sommes d’accord pour dire qu’il existe des raisons d’avoir WITH (NOLOCK) en production, à condition d’avoir des données qui ne risquent pas d’être changées (tables en lecture seulement). C’est important que vous voyiez les données comme elles sont – uncommitted. S’il y a une possibilité que les données changent pendant que vous exécutez vos requêtes, ne surtout pas utiliser les résultats en production parce que vous allez avoir des résultats non valides (Dirty Reads - so your data risks to be inconsistent, and might even skips rows or show duplicates in the results).
Pour en savoir plus :
http://www.sqlservercentral.com/Forums/Topic330889-192-1.aspx vous trouverez dans ce site beaucoup d’informations.
FROM db.schema.Orders WITH (NOLOCK)
-- sur une grande table on verra les lignes plus rapidement
INNER JOIN db.schema.Products WITH (NOLOCK)
Attention – Ne jamais utiliser WITH (NOLOCK) sur des tables qui contiennent les types de données en colonnes NTEXT, TEXT, ou IMAGE – vous aurez alors l’erreur 7105 système d’exploitation, processeur en sollicitation excessive (excessive cpu usage, which I have seen flip cluster nodes).
Pour continuer avec les meilleures pratiques sur le thème du mois dernier :
Vous-avez probablement remarqué le changement depuis la version SQL Serveur 2000 à propos de Bd.schema.objet - anciennement Bd.ObjectOwner.Objet. Le Object Owner posait un problème lorsque l’utilisateur quittait la compagnie. Désormais, avec le schéma, qui est un regroupement d’objets, il est possible de bénéficier de certains avantages.
Grant select, insert, update, execute on SchemaName :: to joblow
Quand on crée l’objet, on peut l’associer à un schéma et la sécurité à l’intérieur peut-être appliqué facilement.
· Ceci permet d’obtenir plus de détails sur l’objet (lisibilité facile). Si cette pratique est correctement utilisée, on sait à quel groupe ou à quelle application l’objet appartient.
· Il est alors plus facile de gérer la sécurité sur ces objets. En effet, donner DBO n’est pas toujours la bonne solution (schéma dbo existe encore) – si vous voulez donner plus de droits en exception voir EXECUTE AS
· Granularity like sugar would be sweeter than the same L’article démontre que le monde n’est pas parfait après la utilisation des schémas, mais il est one step on the right direction for db security administration and readability
Bonjour cher(s) lecteurs du Database Hive,
Pour le bienêtre de vos systèmes SGBDR SQL Serveur, veuillez s’il vous plaît indiquer le nom complet des objets dans le code Transact-SQL (Full qualified object names).
SELECT DISTINCT c.CustomerID, s.Name
FROM Sales.dbo.Customer c
JOIN Sales.dbo.Store s ON ( c.CustomerID = s.CustomerID)
WHERE c.TerritoryID = 1
Voici quelques raisons pour vous motiver à suivre cette pratique :
· Clarté du code (on voit clairement d’où vient cet objet)
· Performance réf : http://www.sqlservercentral.com/articles/T-SQL/62061/
· Ne pas le faire est l’une des pires pratiques http://www.sqlservercentral.com/articles/Miscellaneous/worstpracticenotqualifyingobjectswiththeowner/1309/ )
· Le producteur du SGBDR le recommande aussi fortement : “In SQL Server Books Online, many Transact-SQL examples are simplified by not using qualified names. Although these elements are left out of the examples to help readability, we recommend that you use qualified names in Transact-SQL statements in production systems”