Twitter

Follow SQLQuill on Twitter

Subscribe

MN Pass

BINARY_CHECKSUM and CHECKSUM_AGG – Combo platter

Recently I got an email from a tester claiming that the IDs of values in static data tables are different in one database than they are in another database.  To check this out, I used a combo of the HandsBINARY_CHECKSUM and CHECKSUM_AGG.  Here’s a breakdown of each:
  • The BINARY_CHECKSUM function returns a checksum value for a row of a table or over a list of expressions.
    • A value is computed for each row, this makes it easy to find rows that are different.
    • This function returns a group of values.
    • Will return a different value for most, but not all, changes to the row, and can be used to detect most row modifications.
  • The CHECKSUM_AGG function returns a checksum of the values in a group.
    • If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change.

So, I figured that I could first find the BINARY_CHECKSUM of each row and then find the CHECKSUM_AGG of all the BINARY_CHECKSUMs. If the tables were different, most of the time (as noted in bold above) these values would be different.

There are two situations that I find this very useful for (I’ll post about these later):

  1. Static data  – data that needs to be the same from one environment to another
  2. Data migration – verifying all the columns get migrated correctly.

Note – there are a few data types that BINARY_CHECKSUM does not support.  They are text, ntext, image, cursor, xml, and noncomparable CLR user-defined types.  In my experience, some of the time update dates and insert dates are not important as well as timestamp data types, so I usually ignore these columns.

So to investigate the tester’s claims, I ran the following on each database:

Select CHECKSUM_AGG(BINARY_CHECKSUM(Col1, Col2, Col3)) from Table1

The values were different, so I removed Col1 from the criteria and ran the following on each database and the values were the same.

Select CHECKSUM_AGG(BINARY_CHECKSUM(Col2, Col3)) from Table1

Therefore, Col2 and Col3 had the same values for both Col2 and Col3 **most likely** but Col1 definitely had different values.  The tester was right.

I then removed the CHECKSUM_AGG to find and then compare the values of each row to find the ones that didn’t match.

Select Col1,
BINARY_CHECKSUM(Col1, Col2, Col3)
from Table1
Order by Col1

Here’s an example that shows that this method doesn’t always work – note that the same distinct values are in each table, but each row is not the same.

CREATE TABLE [dbo].[table1](
[col1] [int],
[col2] [varchar](200))

CREATE TABLE [dbo].[table2](
[col1] [int],
[col2] [varchar](200))

Insert table1 Values
(1, 'Row #1 in Table1, Row #2 in Table2'),
(2, 'Row #2 in Table1, Row #1 in Table2')

Insert table2 Values
(2, 'Row #1 in Table1, Row #2 in Table2'),
(1, 'Row #2 in Table1, Row #1 in Table2')

---These are equal
Select CHECKSUM_AGG(BINARY_CHECKSUM(col1, col2)) from table1
Select CHECKSUM_AGG(BINARY_CHECKSUM(col1, col2)) from table2

---These are NOT equal
Select BINARY_CHECKSUM(col1, col2) from table1
Select BINARY_CHECKSUM(col1, col2) from table2

Despite the small chance of a checksum being equal, this method makes it very easy to quickly identify data inconsistencies in different databases or environments.  If you’re ever need to quickly find inconsistent data, give it a try.