This month’s T-SQL Tuesday, “Aggregate Functions” is hosted by Jes Schultz Borland (Blog/Twitter) aka grrlgeek.
The Task
One of my duties as a development DBA is to verify our databases are set up the same in each of our environments. One of the tasks of this duty is to verify our static data/look-up tables are all the same in each environment. I’ve got a good method for writing the insert/update statements – see this post – but I still need to verify the values are the same in our static data tables. My tool of choice is the CHECKSUM_AGG function.
How do I do this?
The easiest way I’ve found to do this is to take the BINARY_CHECKSUM of each row in a table that has static data in it (i.e. no datetimes, timestamps, etc). Then I take the CHECKSUM_AGG of the each row’s binary_checksum – thus giving me a Checksum for the entire table. I do this on the same table in two different environments and I compare the value. They will be equal if the values in the tables are the equal. For a more in depth explanation of this, see this post.
How this grew…
This method described above is easy to do for one table manually, but I’m responsible for approximately 10 to 50 static data tables in each of our 15 databases – in each of our five environments per project, so I needed to automate it. The good news is that we’ve got good naming standards for our tables, so all the static data tables end in “Type”. So I created a script that would calculate a CHECKSUM_AGG of all the BINARY_CHECKSUMs of each of the tables ending in “type” in a database, I could export that to excel, run the same script on a different server and compare the results. This worked for a bit, but I was doing the same thing on each database, so I added a bit of logic to include specified databases to my original script. Now, I run this script on each server and I compare the results.
The script:
A couple of highlights, the vSQLCall is included in the final output, so if there is a difference in values, you can easily run the single table script and compare the values of each row. (I had to do this one time as an update statement that was missing a period and the values were different.) Also, forgive the combination of temp tables and table value parameters – I like a temp table for the final results, so I can work with it and not drop it.
Create table #TestTable
(ID int NOT NULL IDENTITY (1, 1),
ServerName varchar(255),
DBName varchar(255),
TableName varchar(255),
vSQLCall nvarchar(2550),
ChecksumVal varchar(255))
--Add default for blank column
ALTER TABLE #TestTable ADD CONSTRAINT [DF_testtable] DEFAULT (' ') FOR [vSQLCall]
-- A work table to track the databases and the server name
Declare @TestTableDBs Table
(ID int NOT NULL IDENTITY (1, 1),
ServerName varchar(255),
DBName varchar(255))
Declare @icount int,
@TopNum int,
@sql1 nvarchar(2000)
--Initialize variables
Set @icount = 1
--Fill up the databases to do...Change DB list here if scope changes.
Insert @TestTableDBs
Select @@ServerName, Name from master.sys.databases
where name in (<<DBNAME1 goes here>>, <<DBNAME2 goes here>>, <<DBNAME3 goes here>>, ....)
Select @TopNum = Max(ID) from @TestTableDBs
--Fill in main table with all the tables needed to check
--Note any table name with '%type%' in the name
While @icount <= @TopNum
Begin
Set @sql1 = 'Insert #TestTable (ServerName, DBName, TableName) '
Select @sql1 = @sql1 + 'Select @@ServerName, Table_catalog, Table_Name from '+ DBName
+ '.information_schema.tables where table_type = ''BASE TABLE'' and table_name like ''%Type%'''
from @TestTableDBs where id = @icount
Exec (@sql1)
Set @icount = @icount + 1
End
---Now figure out the colums to of the tables and the SQL call to find the checksum
Declare @TableID int,
@CurDBName varchar(255),
@CurTableName varchar(255),
@Start int,
@loopStart int,
@end int,
@maxEnd int,
@vColumnName varchar(2000),
@sql nvarchar(2000)
Create table #TestTableColumns
(ID int NOT NULL IDENTITY (1, 1),
ServerName varchar(255),
DBName varchar(255),
TableName varchar(255),
columnNames varchar (2000))
--Initialize the variables
Select @Start = 1
Set @TableID = 1
Set @loopStart = 1
Select @maxEnd = Max(ID) from #TestTable
--Loop for each table to get the columns
While @Start <= @maxEnd
Begin
Select @CurDBName = DBName,
@CurTableName = TableName
from #TestTable where ID = @TableID
Set @vColumnName = ''
Select @sql =
'Insert #TestTableColumns
(ServerName,
DBName,
TableName,
columnNames)
Select @@serverName,
Table_Catalog,
Table_Name,
Column_Name
from ' + @CurDBName + '.information_schema.columns
Join #TestTable on DBName = Table_Catalog and TableName = Table_Name
where data_type <> ''timestamp''
and #TestTable.ID = ' + STR(@TableID)
Exec (@sql)
Select @end = Max(ID) from #TestTableColumns
--Loop through to get the column name for each table
While @loopStart <= @end
Begin
Select @vColumnName = @vColumnName + columnNames + ', ' from #TestTableColumns where id = @loopStart
Set @loopStart = @loopStart + 1
End --End of loop for all columns for each table
--Update #TestTable with the SQL call to get the checksum
update #TestTable
Set vSQLCall = 'Select CHECKSUM_AGG(BINARY_CHECKSUM(' + Left(@vColumnName, Len(@vColumnName) - 1)
+ ')) from ' + @CurDBName + '.dbo.' + @CurTableName
where ID = @TableID
Set @TableID = @TableID + 1
Set @Start = @Start + 1
End --End of loop for all tables
--No longer needed
DROP TABLE #TestTableColumns
---Finally execute the SQL in the column
Declare @Finalcount int,
@TopNum1 int,
@SQL2 nvarchar(2550)
Set @Finalcount = 1
Select @TopNum1 = Max(ID) from #TestTable
While @Finalcount <= @TopNum1
begin
Select @SQL2 = 'Update #TestTable Set CheckSumVal = (' + vSQLCall + ') where ID = ' + STR(@Finalcount) from #TestTable where ID = @Finalcount
EXEC (@SQL2)
Set @Finalcount = @Finalcount + 1
End
--Output the results
Select * from #TestTable
--Clean up
Drop table #TestTable
The Output

The Future
I’ve got plans to have Powershell run this on each server I specify, then combine and compare the results in a file and save all the supporting data. Thus when I need the documentation, it’ll take about 3 minutes (2.5 of which will be getting coffee), instead of the 1 hour it’s currently listed in the schedule to take.