TableDifference: a solution to SCD handling

In SSIS the SCD component does a lot of work for you but it works so slowly that it is quite always convenient to create an “ad hoc” solution to handle the slowly changing dimensions.

As the main problem is that of determining the difference between the last snapshot of the SCD and the current data we decided to write an SSIS component that has two inputs, one for the “old” and one for the “new” flow of data. It compares all the rows and sends each one to a different output for new, deleted and user configurable updated data. You can then decide how to handle the variations executing the correct UPDATE sequences to the dimension table or whatever you need to handle variations.

The following picture illustrates the component functionality better than thousand words.


Using TableDifference two tables with 5.5 millions of records and 25 columns each were compared, using an AMD dual core processor with 4Gb of RAM and standard SATA disks running both SSIS and SQL Server in 7 minutes. The complexity of the algorithm is linea r, so you can expect to take more or less 15 minutes for a 10 millions record tables with the same hardware.

You can find the full article that describes TableDifference here. Full source code is available at