The clustered index, the bulk insert and the sort operation

A few months ago I wrote about a SSIS setting you have to use just to get better performance when you (fast) load data into a table with a clustered key. I and Alberto met Stefano Stefani at SQL Server Conference 2007 (an Italian conference where we were all speakers) and we got the most direct Microsoft support service you can imagine, provided in our own native language, by one of the guy who writes the query optimizer.

The story is: when you have a clustered index, a bulk insert operation is really fast only when you insert data sorted with the same order as the clustered index expression. The same is true for fast load operations with SSIS (and DTS too). We noted that with an identity column used as clustered index, you don’t get really fast insert. In this case, SQL Server doesn’t realize that data are already sorted and it sorts them one more time.

Alberto explains very well this behavior in his new post. The bad news is that today you have to rely on some trick and/or workaround, as the dummy ORDER BY operation showed by Alberto. The good news is that a future CTP of Katmai will work without requiring these tricks.