- The BINARY_CHECKSUM function returns a checksum value for a row of a table or over a list of expressions.
- A value is computed for each row, this makes it easy to find rows that are different.
- This function returns a group of values.
- Will return a different value for most, but not all, changes to the row, and can be used to detect most row modifications.
- The CHECKSUM_AGG function returns a checksum of the values in a group.
- If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change.
So, I figured that I could first find the BINARY_CHECKSUM of each row and then find the CHECKSUM_AGG of all the BINARY_CHECKSUMs. If the tables were different, most of the time (as noted in bold above) these values would be different.
There are two situations that I find this very useful for (I’ll post about these later):
- Static data – data that needs to be the same from one environment to another
- Data migration – verifying all the columns get migrated correctly.
Note – there are a few data types that BINARY_CHECKSUM does not support. They are text, ntext, image, cursor, xml, and noncomparable CLR user-defined types. In my experience, some of the time update dates and insert dates are not important as well as timestamp data types, so I usually ignore these columns.
So to investigate the tester’s claims, I ran the following on each database:
Select CHECKSUM_AGG(BINARY_CHECKSUM(Col1, Col2, Col3)) from Table1
The values were different, so I removed Col1 from the criteria and ran the following on each database and the values were the same.
Select CHECKSUM_AGG(BINARY_CHECKSUM(Col2, Col3)) from Table1
Therefore, Col2 and Col3 had the same values for both Col2 and Col3 **most likely** but Col1 definitely had different values. The tester was right.
I then removed the CHECKSUM_AGG to find and then compare the values of each row to find the ones that didn’t match.
Select Col1, BINARY_CHECKSUM(Col1, Col2, Col3) from Table1 Order by Col1
Here’s an example that shows that this method doesn’t always work – note that the same distinct values are in each table, but each row is not the same.
CREATE TABLE [dbo].[table1]( [col1] [int], [col2] [varchar](200)) CREATE TABLE [dbo].[table2]( [col1] [int], [col2] [varchar](200)) Insert table1 Values (1, 'Row #1 in Table1, Row #2 in Table2'), (2, 'Row #2 in Table1, Row #1 in Table2') Insert table2 Values (2, 'Row #1 in Table1, Row #2 in Table2'), (1, 'Row #2 in Table1, Row #1 in Table2') ---These are equal Select CHECKSUM_AGG(BINARY_CHECKSUM(col1, col2)) from table1 Select CHECKSUM_AGG(BINARY_CHECKSUM(col1, col2)) from table2 ---These are NOT equal Select BINARY_CHECKSUM(col1, col2) from table1 Select BINARY_CHECKSUM(col1, col2) from table2
Despite the small chance of a checksum being equal, this method makes it very easy to quickly identify data inconsistencies in different databases or environments. If you’re ever need to quickly find inconsistent data, give it a try.