This article provides how-to information and code for two different monitoring systems you can set up centrally to manage reporting of SQL Server 2000 maintenance and disk utilization. The first is a scheme I have used in my role as a DBA, which has helped simplify and speed up one of the more mundane tasks we as DBAs are likely to face: checking overnight backups and DBCC checks. The second is a simple system which centrally collects disk drive utilization data from target SQL Servers. This data can be checked regularly to ensure none of your servers are about to run out of disk space, and it is also retained so you can analyze it for trends and use the results for the prediction of your future disk requirements.
Central Maintenance Monitoring
The objective of this system is to provide an automated, central view of your server’s maintenance (backup and DBCC) results for you to review. The information is collected on a central server using a scheduled job and you can interrogate it using a stored procedure. I embed this in a web page which calls the SP on-demand whenever the web page is browsed, but this is a personal preference.
How Does It Work?
The system relies on two very useful system stored procedures, xp_readerrorlog, which reads the SQL Server error log from a server; and sp_cycle_errorlog, which recycles the SQL Server error log on a server. By using the latter to keep a server’s error log containing only recent, relevant information; and the former to capture the contents of a server’s error log to a central location, logged information about the backups and DBCC checks that have run on the server can be kept and stored centrally for reporting purposes.
Throughout this article I will assume you have a central server with an admin-type DBA database, in which you can put the tables needed for the purposes of the two systems described in this article. In my case I call this database centraldb; you can call yours whatever you want (or use an existing one you may already have).
First, create the tables in centraldb that will be used to hold the data.
CREATE TABLE [dbo].[ServerErrorLog] (
[ErrorLogText] [varchar] (256) NULL ,
[ContinuationRow] [bit] NULL ,
[Servername] [sysname] NULL ,
[InsertDate] [datetime] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[ServerHistory] (
[SHID] [int] IDENTITY (1, 1) NOT NULL ,
[ErrorLogText] [varchar] (256) NULL ,
[Servername] [sysname] NULL ,
[EventDate] [datetime] NOT NULL ,
[InsertDate] [datetime] NOT NULL
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [SFSH_01] ON [dbo].[ServerFreeSpaceHistory]([CheckDate]) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [Event_Date_cnu] ON [dbo].[ServerHistory]([EventDate]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ServerErrorLog] ADD
CONSTRAINT [DF_Serv__Inser__182C9B23] DEFAULT (getdate()) FOR [InsertDate]
GO
ALTER TABLE [dbo].[ServerHistory] ADD
CONSTRAINT [DF__Serv__Inser__24927208] DEFAULT (getdate()) FOR [InsertDate],
PRIMARY KEY NONCLUSTERED
(
[SHID]
) ON [PRIMARY]
GO
These are a worktable, ServerErrorLog; and a history table, ServerHistory. The target servers’ error logs are collected by a stored procedure, prGetErrorLog.
CREATE procedure prGetErrorLog
@servername sysname
as
delete from ServerErrorLog
declare @sql varchar(255)
set @sql = @servername + '.master.dbo.xp_readerrorlog '
insert into ServerErrorLog
(ErrorLogText, ContinuationRow)
exec @sql
update ServerErrorLog
set Servername = @servername
where Servername IS NULL
insert into ServerHistory
(ErrorLogText, Servername, EventDate)
select
CASE WHEN ErrorLogText LIKE '%Database backed up%' THEN SUBSTRING(ErrorLogText,34,20) + SUBSTRING(ErrorLogText,64,CHARINDEX(',',ErrorLogText,64)-64)
WHEN ErrorLogText LIKE '%DBCC CHECKDB%' THEN RTRIM(SUBSTRING(ErrorLogText,34,221))
WHEN ErrorLogText like '%BACKUP FAILED%' THEN SUBSTRING(ErrorLogText,34,256)
END,
ServerName,
SUBSTRING(ErrorLogText,1,22)
from ServerErrorLog
where ErrorLogText like '%Database backed up%'
or ErrorLogText like '%DBCC CHECKDB%'
or ErrorLogText like '%BACKUP FAILED%'
GO
When called, this procedure takes a parameter, servername. Firstly it clears out the worktable ServerErrorLog. Then, using dynamic sql, it inserts into this worktable the results of running xp_readerrorlog on the target server. Then, it collects relevant rows from this worktable into the ServerHistory table for permanent storage (and later querying). It filters out only error log rows which contain data about backups and DBCC checks, as these are the only ones we are interested in.
Set Up Scheduled Tasks
Now, you need to set up the scheduled tasks required for the collection of the error log data into your centraldb. First, define a list of target servers. In my case I chose all live servers, as it is only on these that I consider backups and DBCC checks to be of vital importance. Clearly, if you have differing requirements, you could extend the system to any servers you wish to monitor, including test and development ones.
On each target server you need to set up a local scheduled task in SQL Server Agent to run the stored procedure sp_cycle_error_log. You should set these up on each server to run at the same time; for example, 0730, assuming all your overnight maintenance will be complete by then. The error logs will then be collected centrally before this time so that it captures the error log information before the log is recycled. So in this example we would schedule, on our central server in centraldb, at 0700 each day, a job with a single step, which would contain one entry for each target server with the execution of prGetErrorLog for that server, so your job step will look like this:
EXEC prGetErrorLog ‘SERVER1’
EXEC prGetErrorLog ‘SERVER2’
And so on, one for each of the target servers in your list.
Set Up Security
The central job step is going to be executing distributed transactions from the central server to all the others so you need to add, on your central server, each target server as a linked server. You can use whichever security context for this that you prefer, but my recommendations for simplicity are these:
1. Run SQL Server and SQL Server Agent on ALL servers using the same single domain account
2. Set up the linked server from the central server to the target servers as:
-
General tab, Server Type – chose “SQL Server”
-
On the security tab, chose “use existing security context“
-
Leave all other options as default
If you have differing security requirements you will need to adjust these as required. Essentially, it is simplest if the account running SQL Server and SQL Server Agent on the central server is an administrator of each target server.
Reporting on the Central Data for Checking Backups and DBCC Checks
For this I use three stored procedures: one for checking all maintenance, both backup and DBCC, and the one for just checking backups, and another for just checking DBCCs. These are fairly straightforward procedures, and you could easily adapt them for your individual requirements.
CREATE procedure prssr
as
SET NOCOUNT ON
SELECT EventDate, Servername, ErrorLogText AS 'Events for all servers'
FROM ServerHistory
WHERE (DATEDIFF(HOUR, EventDate, GETDATE()) < 24)
ORDER BY Servername, EventDate
SET NOCOUNT OFF
GO
CREATE procedure prssr_backup
as
SET NOCOUNT ON
SELECT EventDate, Servername, ErrorLogText AS 'Backup Events for all servers'
FROM ServerHistory
WHERE (DATEDIFF(HOUR, EventDate, GETDATE()) < 24)
AND ErrorLogText LIKE '%back%'
ORDER BY Servername, EventDate
SET NOCOUNT OFF
GO
CREATE procedure prssr_DBCC
as
SET NOCOUNT ON
SELECT EventDate, Servername, ErrorLogText AS 'DBCC Events for all servers'
FROM ServerHistory
WHERE (DATEDIFF(HOUR, EventDate, GETDATE()) < 24)
AND ErrorLogText LIKE '%DBCC%'
ORDER BY Servername, EventDate
SET NOCOUNT OFF
GO
So these stored procedures will give you everything that’s happened on your server in terms of backups and DBCC checks in the past 24 hours.
The first, prssr (short for Procedure Show Server Results) shows all events, both backups and dbss checks. The second, prssr_backup, shows only backups, and the third, shows only DBCC checks. I set up 3 web pages, one for executing each of these 3 procedures, so that I can choose whether I want to see everything or just backups, or just DBCC checks.
The value 24 (hours ago) can be set to whatever you require, or you could rewrite the stored procedure to take it as a parameter which you can set each time you run it. In fact, in my work, I use 28 hours, not 24, because my data warehouse backs up during the morning, and I need to include it – if I’m running the stored procedure in the afternoon I don’t want my warehouse’s bits to drop off the end of the results because they are longer ago than 24 hours by then (and remember, the job that collects the data only runs once a day, so the procedure won’t pick up that morning’s output yet, only the output from the day before).
I’d recommend you running this with a parameter of 24. But, say on a Monday morning, you might want to see what’s happened over the entire weekend – just adapt it and change the time criteria to be 72 hours, for example.
How you execute this procedure is totally up to you. You can run it in Query Analyzer if you like. I have embedded mine in an Active Server Page, so I can just pull up this page in my browser and see the results there. At this point, non-programmer DBA people mustn’t panic! It isn’t that hard at all. In fact I used the wizard in Microsoft FrontPage to do mine. Here is a screenshot with sample output.
To use an ASP to run this, which is quite a good way of impressing the boss (“Hey, look at my automated maintenance checker embedded in a web page” etc.), you’ll need a server running IIS, and you need to set up a folder for your pages, and a Data Source Name connection to point your pages at (if you’re using the FrontPage wizard). More experienced ASP people can easily do this in code. You also have to create an “application” in IIS to enable it to run the active content in the pages.
I am not including detailed or comprehensive information in this article about how to set up IIS and how to construct ASP pages to query the data you’ve collected, because it’s beyond the scope of what I am trying to show you how to achieve. But it’s worth investing some time in. It’s particularly handy if you have a different person than yourself (e.g. a shift operator) checking your maintenance. You can simply send them a link to the page that runs the stored procedure and get them to check that for errors. So you can save someone else (or yourself if applicable) the time-consuming task of connecting to each server in turn and looking at job outputs or text files.
Central Disk Space Utilization Monitoring
This is a similar system to the one outlined in the part of this article, but for a different purpose. Earlier on in my current job, I’d occasionally be embarrassed by finding out too late in the day that a critical system was running low on drive space. People really lose confidence in you if you aren’t continually on top of your disk space management. It also helps greatly if you can warn people that servers are likely to run out of disk space and take preventative measures in advance of critical problems occurring.
If you are going to make predictions as to when drives will fill up, while also being able to assure yourself and your colleagues that your servers aren’t running out of disk, it can help to implement an automated system to collect drive space utilization information from your servers and locate it in a central place for alert generation and trend analysis.
Now, I expect people will write to me after this article is published and say “you can do this in Perfmon, you can do this using 3rd party software”, etc. Well that may be true. But I don’t like spending money on things I can do myself, and I believe my system has advantages over Perfmon’s logging and alerting mechanisms.
Enough of That, How Does It Work?
The system relies on the system extended stored procedure:
xp_fixeddrives
This SP exists in the master database and simply tells you disk free space for each drive in MB. My system runs from a central server. It executes xp_fixeddrives remotely on all the predefined target servers and stores the output in a table. This data can then be queried against a lookup table containing each server’s drive capacities. You need to maintain the capacity information manually, as xp_fixeddrives does not report capacity, just free space. So, for example, if you add disks to an array or change a drive’s capacity in any way, you must remember to manually change the capacity information you store in your lookup table. This is crucial because the key measure used in reporting is % free space, which needs the obviously requires the total capacity of a drive in order to be able to calculate the value.
Once again, I recommend that you locate all your tables and procedures in a central, dba/admin-type database. I use my centraldb database, the same one I used before.
The Tables
The master table is "Server" and is defined as:
CREATE TABLE [dbo].[Server] (
[ServerID] [int] IDENTITY (1, 1) NOT NULL ,
[Servername] [sysname] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Server] WITH NOCHECK ADD
PRIMARY KEY CLUSTERED
(
[ServerID]
) ON [PRIMARY]
GO
This has a one-to-many relationship to the lookup table "ServerDriveCapacity."
CREATE TABLE [dbo].[ServerDriveCapacity] (
[ServerID] [int] NOT NULL ,
[Drive] [char] (1) NOT NULL ,
[Capacity] [int] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ServerDriveCapacity] WITH NOCHECK ADD
CONSTRAINT [pk_01] PRIMARY KEY CLUSTERED
(
[ServerID],
[Drive]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ServerDriveCapacity] ADD
CONSTRAINT [fk_01] FOREIGN KEY
(
[ServerID]
) REFERENCES [dbo].[Server] (
[ServerID]
)
GO
And this table contains the static lookup data of drive letters and their capacities (all units are MB).
The table for holding the drive usage collected from xp_fixeddrives is called
ServerFreeSpaceHistory."
CREATE TABLE [dbo].[ServerFreeSpaceHistory] (
[ServerID] [int] NOT NULL ,
[Drive] [char] (1) NOT NULL ,
[FreeSpace] [int] NOT NULL ,
[CheckDate] [datetime] NOT NULL
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [SFSH_01] ON [dbo].[ServerFreeSpaceHistory]([CheckDate]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ServerFreeSpaceHistory] ADD
CONSTRAINT [DF__Serv__Check__59063A47] DEFAULT (getdate()) FOR [CheckDate]
GO
CREATE NONCLUSTERED INDEX [SFSH_02] ON [dbo].[ServerFreeSpaceHistory]([ServerID], [Drive]) ON [PRIMARY]
GO
Collecting Free Space Data Stored Procedure
The all-important procedure for going and getting the data using xp_fixeddrives is called GetFreeSpace.
CREATE procedure GetFreeSpace @servername sysname
as
set nocount on
declare @sql varchar(1000),
@serverid int
select @serverid = ServerID
from Server
where Servername = @servername
select @sql = 'EXEC ' + @servername + '.master.dbo.xp_fixeddrives'
create table #t1
(drive char(1), freespace int)
insert into #t1 exec(@sql)
insert into ServerFreeSpaceHistory
(ServerID, Drive, Freespace)
select @serverid, Drive, freespace
from #t1
drop table #t1
GO
You now need to create a job on your central server to execute this procedure in turn for each of your target servers, e.g.:
EXEC GetFreeSpace ‘SERVER1’
EXEC GetFreeSpace ‘SERVER2’
and so on. I run this once a day. As in part 1, you need to ensure your intra-server security is set up and works -- and don’t assume that if it when you’re running it in Query Analyzer, it’ll work in SQL Server Agent. Test it first!
Creating a View
To check your most recent free space data, and report on it, you create a view and select from this, so you can see where your problems are.
The view is called "FreeSpaceSummary."
CREATE VIEW FreeSpaceSummary
AS
SELECT os.Servername,
fs.Drive,
convert(decimal,100 * fs.FreeSpace / dc.Capacity) as PercentFree,
fs.FreeSpace,
dc.Capacity
FROM OxfamServerFreeSpaceHistory fs
JOIN OxfamServer os
ON fs.ServerID=os.ServerID
JOIN OxfamServerDriveCapacity dc
ON dc.Drive=fs.Drive
AND dc.ServerId = fs.ServerID
WHERE Checkdate>Dateadd(day,-1,getdate())
If you select from this view like this:
SELECT ServerName, Drive, PercentFree, FreeSpace, Capacity
FROM dbo.FreeSpaceSummary
ORDER BY PercentFree asc
You will see right at the top of the list server drives with the lowest percent of free space. I could have embedded the order by in the view, but I left it out as you may wish to order some other way, e.g., by servername.
As before, I like to embed this SELECT from this view in an active server page for ease of use and ease of sharing. This can be done with ease using the FrontPage wizard in the same way as in the first part of this article, if you are not a confident ASP developer. Sample output is shown here.
For doing trend analysis, you can query the data directly or export to Excel for charting, etc. I will leave this up to you as it’s often a matter of personal taste or specific work requirements how you analyze it or predict what’s going to happen in the future by extrapolating historical trends.
Summary Table Diagram
I have included this picture of a database diagram to illustrate how the tables used in these 2 schemes fit together.
Conclusion
The purpose of this article is to help you reduce time spent doing mundane, but highly important monitoring work as a DBA or system administrator. With a little work and initiative, you really can help yourself and your organization save time on routine tasks, and make your checking routines sleeker and more efficient.
Tom Pullen has been a DBA for 6 years. He currently works for Oxfam, a large UK charity. He has experience in SQL Server 6.5, 7.0 and 2000. He is married with two children. Contact him at: tpullen@oxfam.org.uk
Copyright 2005 by the author.
No comments:
Post a Comment